Abstract
Neuropsychiatric disorders often involve dysregulation of serotonergic and dopaminergic pathways. This study applied machine learning (ML) with quantitative structure-activity relationship (QSAR) methods to predict the bioactivity (-log half-maximal inhibitory concentration (pIC50)) of compounds targeting both receptor families, aiming to identify multitarget inhibitors among US Food and Drug Administration (FDA)-approved drugs. A dataset of 5,628 compounds with experimental IC50 values was obtained from ChEMBL and encoded with PubChem fingerprints. Random Forest and Extreme Gradient Boosting models were trained, optimized, and evaluated with 5-fold cross-validation, and Shapley Additive Explanations (SHAP) values were used for interpretation. After outlier removal and descriptor selection, models achieved coefficient of determination (R2) test ca. 0.69 and were used to screen over 1,500 approved drugs. A total of 162 were predicted to have dual bioactivity (pIC50 > 6, coefficient of variation < 1%), including antipsychotics, adrenergic agonists (e.g., epinephrine), dopamine agonists (e.g., levodopa), antihistamines (e.g., cyproheptadine), antiemetics (e.g., droperidol), ergot alkaloids (e.g., ergotamine), antibiotics (e.g., penicillin G), and lipid-lowering agents (e.g., pravastatin). Key molecular descriptors indicated the relevance of nitrogen-containing fragments and conjugated aromatic substructures for dual receptor binding. These results provide a computational framework for repurposing drugs and guiding experimental validation in neuropsychiatric research.
Keywords:
multitargets; machine learning; multitarget; neuropsychiatric disorders; polypharmacology
Introduction
Neuropsychiatric disorders, including schizophrenia, bipolar disorder, and autism spectrum conditions, involve multifactorial pathophysiological mechanisms characterized by dysregulation of central neurotransmitter systems. According to the World Health Organization (WHO), in 2022, nearly 1 billion people worldwide were living with a mental disorder, yet access to adequate treatment remains critically low. The report highlighted that only a small proportion of individuals receive the necessary care, with a particularly alarming treatment gap for severe conditions such as psychosis, where most patients globally still did not receive the treatment required.1,2
Despite advancements in neuropharmacology, the precise modulation of mood, cognition, and behavior continues to present significant challenges, with the dopaminergic and serotonergic systems playing fundamental roles. Current pharmacological interventions frequently act through modulation of dopamine (D2-like) and serotonin (5-HT2A) receptors; however, their clinical efficacy is often limited by incomplete target engagement and off-target effects resulting from non-selective receptor binding profiles.3-6
From a medicinal chemistry perspective, the design of ligands capable of simultaneously modulating both receptor classes-known as multitarget or dual-acting compounds-has emerged as a rational strategy to enhance therapeutic outcomes and reduce adverse effects in complex psychiatric conditions. Nevertheless, identifying such compounds remains a significant challenge due to the need to balance multiple physicochemical, structural, and pharmacophoric requirements within a single molecular framework.7
Computational approaches such as quantitative structure-activity relationship (QSAR) modelling offer a powerful means to correlate molecular descriptors with biological activity, enabling the prioritization of compounds with favorable profiles. When integrated with machine learning (ML) algorithms, QSAR models can enhance predictive accuracy and identify nonlinear relationships across complex chemical spaces. This is particularly relevant in polypharmacology, where small structural changes can differentially affect activity across multiple targets.8-10 Despite advances in ligand-based modelling for mono-target prediction, the application of QSAR and ML to systematically identify multitarget ligands acting on dopaminergic and serotonergic receptors remains limited. Moreover, few studies11-15 have explored interpretable ML algorithms capable of elucidating the structural features most associated with multitarget activity, a crucial step toward rational compound design.
In this context, the present study aimed to develop and interpret QSAR-based machine learning models to predict the dual bioactivity (-log half-maximal inhibitory concentration (pIC50)) of compounds targeting dopamine and serotonin receptors. Using a curated dataset from the ChEMBL database, the models were trained and applied to screen US Food and Drug Administration (FDA)-approved drugs, aiming to identify key chemical features and propose candidate compounds for further experimental validation in neuropsychiatric drug discovery.
Methodology
The study focused on developing ML models using Shapley Additive Explanations (SHAP) values and QSAR methods to predict and analyze bioactive compounds targeting serotonin and dopamine receptors in humans, including FDA-approved drugs and those in clinical trials, for the treatment of neuropsychiatric disorders. The research followed Organisation for Economic Co operation and Development (OECD) guidelines16 and comprised the following steps: data selection, exploration of data analysis, application of different ML algorithms, evaluation of model performance through various metrics, and detailed interpretation of the most relevant model features. The algorithms were developed in Python by Cobre et al.,17 adapted,18 and implemented through Jupyter Notebook and Google Colaboratory.19 Google Colaboratory (Google Colab) is a free cloud-based platform, built on the Jupyter Notebook environment, that allows the execution of Python codes without the need for local configuration. This approach provides access to computational resources, including Central Processing Unit (CPUs) and Graphics Processing Unit (GPUs), and eliminates the need to install complex software and enables the reproduction of experiments across different devices with only internet access and a web browser. Next, the flowchart of this process is illustrated in Figure 1.
Description of the databases used
To develop QSAR-based machine learning models, data were collected from the ChEMBL database. ChEMBL is a publicly available European database containing bioactivity data for over 2.4 million drug-like compounds with drug-like properties, distributed across 29,311 deposited datasets. The database provides extensive chemical information, bioactivity data, and genomic resources, facilitating the translation of genomic discoveries into actionable drug candidates.20
After validating the models using compound data from ChEMBL, the open-access ZINC-22, a curated database of commercially available compounds, was employed as the external dataset for screening potential drug candidates.21
Datasets used for study development
For the development of the study, dopamine and serotonin inhibitors compounds were extracted from the ChEMBL database. This resulted in 7 datasets: CHEMBL217, CHEMBL234, CHEMBL238, CHEMBL224, CHEMBL3155, CHEMBL225, and CHEMBL214 (Table S1, Supplementary Information section). From these datasets, an initial dataframe was generated, including 119,361 bioactivity data points corresponding to 71,537 compounds in total.
The initial dataset contained various bioactivity parameters, including half-maximal effective concentration (EC50), minimum inhibitory concentration (MIC), activity percentage, inhibition constant (Ki), inhibition percentage, and half-maximal inhibitory concentration (IC50). ChEMBL does not always specify inhibition mechanisms, so our dataset may include both orthosteric and allosteric inhibitors. IC50 was chosen as the reference endpoint for QSAR modeling because it is the most abundant and comparable measure for the majority of the compounds (ca. 6,200 entries). Although IC50 values can vary across protocols, this endpoint is widely used due to its wider availability.22
To reduce heterogeneity, we standardized units, converted IC50 to pIC50, curated structures, and cleaned the data. However, as values originate from heterogeneous assays in ChEMBL, some residual variability remains a limitation. Therefore, the final dataset used for the next study phase, involving data preprocessing, consisted of 5,628 bioactive compounds with bioactivity data.
Preprocessing and exploratory analysis of data
The preprocessing and modeling workflow of all 71,537 compounds for neuropsychiatric disorders was carried out in Python, integrating a set of open-source scientific libraries. Data handling and tabular organization were performed with Pandas23 and NumPy,24 while Seaborn25 and SciPy26 were employed for exploratory data analysis. Data preprocessing steps, including handling of missing values, normalization, and label encoding, were implemented with Scikit-learn.27 This process involved removing salt, standardizing tautomers, and the units to the molar scale, and converting IC50 to pIC50 (logarithmic scale), to later combine the compound fingerprint descriptors (via RDkit and Padelpy libraries) and the biological activity datasets (pIC50).28,29
In the context of this study on bioactive compounds against serotonergic and dopaminergic receptors, we conducted a detailed analysis of the molecular characteristics and bioactivity of these compounds. Initially, in a univariate analysis, we compared two categories of bioactivity, defined by a threshold at pIC50 ≥ 6. For simplicity, these categories were termed ‘active’ (pIC50 ≥ 6) and ‘inactive’ (pIC50 < 6). In addition, Lipinski’s Rule of Five parameters were included in the univariate analysis to evaluate the drug-likeness of the compounds, since these molecular descriptors (molecular weight, lipophilicity, hydrogen bond donors and acceptors) are classical predictors of oral bioavailability and overall pharmacokinetic suitability.30 Next, Mann-Whitney test was used to identify significant differences between these classes (p < 0.05), seeking insights into the chemical properties that may influence biological activity.
A multivariate analysis was performed using the Principal Component Analysis (PCA) model, considering the 881 PubChem molecular descriptors. Through PCA, we aimed to classify and understand patterns in active and inactive compounds, identifying clusters or characteristics that might be associated with the desired biological activity. This analysis provided a more detailed view of the chemical space of the compounds studied and generated valuable insights that guided subsequent investigations.
The assessment of differences between classes was conducted based on the first two principal components (PC1 and PC2). Initially, the assumptions of normality (Shapiro Wilk) and homogeneity of variances (Levene) were tested. Given the violation of the normality assumption in all groups, the nonparametric Kruskal-Wallis test was applied, followed by Dunn’s post-hoc analysis for multiple comparisons. This procedure aimed to verify whether the classes presented were statistically significant differences and to identify the components that most contributed to the separation between them.
Feature training of Random Forest and XGBoost models, and residual analysis
In this study, we selected ML models - Random Forest and XGBoost - and implemented in Python language (Google Colab environment) using the Scikit-learn25 and XGBoost31 libraries, respectively. Preprocessing and descriptor generation were performed with the Padelpy library.28 Random Forest, based on an ensemble of decision trees, is particularly effective in reducing overfitting and capturing non-linear relationships. In contrast, XGBoost, a gradient boosting algorithm, excels in optimizing performance through iterative learning and handling high-dimensional data.32-34 In this analysis, the response variable was pIC50, while the predictor variables were PubChem fingerprint descriptors. The selection was based on the constant variance method, excluding descriptors with variance less than 0.1, considered insignificant.
Initially, the models were trained (70%) and tested (30%), after data splitting (Sklearn library), using the complete dataset to identify potential outliers, which were considered compounds whose residuals exceeded ± 2 standard deviations (SD) from the regression line. After removing these outliers after residual analysis (NumPy library), we performed new training and testing with the adjusted dataset.17 The removal was carried out to prevent extreme values from compromising model learning, resulting in unstable performance or unrealistic predictions. This approach was applied with caution and documented to minimize the exclusion of potentially relevant data. The comparison between the models before and after outlier removal was performed using metrics such as the coefficient of determination (R2), mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE). There was an improvement in model performance, with an increase in R2 values and a decrease in MSE, MAE, and RMSE values after outlier removal, indicating that these compounds were indeed outliers. This process ensured that the models could yield more reliable predictions of the bioactivity of dopamine and serotonin receptors for new compounds.
Hyperparameter tuning and cross-validation
The hyperparameter optimization for the ML models (Random Forest and XGBoost) was conducted to improve predictive power by fine-tuning the parameters that define the structure of the model and learning processes. Three optimization techniques were investigated: Random Search, Bayesian Optimization, and Grid Search (NumPy and SciPy libraries in Python). These methods were evaluated for their ability to identify the optimal configurations that enhance model generalization. Random Search involves randomly sampling hyperparameter values, making it an efficient approach for high-dimensional spaces. In contrast, Grid Search exhaustively evaluates all possible parameter combinations, ensuring a thorough exploration of the search space. Bayesian Optimization adopts a more strategic approach, using a probabilistic model of the objective function to focus on the most promising regions of the hyperparameter space, striking a balance between exploration and exploitation, and often achieving optimal results with fewer iterations.35,36
For model evaluation, k-fold cross-validation was applied, where the dataset is split into equally sized folds. In each iteration, one-fold is used for validation while the remaining folds are used for training. This process continues until every fold has been used as the validation set, providing a comprehensive evaluation of model performance and reducing the risk of overfitting. In this study, 5-fold cross-validation was chosen, dividing the dataset into five subsets. This approach strikes an effective balance between computational efficiency and a rigorous evaluation, making it well-suited for moderate-sized datasets while ensuring more reliable performance estimates.37,38
Interpretation of machine learning models via SHAP values
The interpretation of the important features of the ML models was performed using the SHAP values method (SHAP library).39,40 To apply SHAP values, the model was first created to assist in analyzing an isolated compound or a set of bioactive compounds. Next, the average values of each descriptor were calculated using the SoftMax function. The global effect of the fingerprint descriptors (features) was analyzed through SHAP summary and violin plots. Moreover, to investigate which parts of a specific molecule (features) are important (or not) for biological activity (pIC50), bar plots and waterfall plots were constructed. Prior to applying SHAP, a variance filter was applied to select only descriptors with significant variance, reducing the number of features from 881 to 207, which allowed for a more focused and interpretable analysis of the models.
Results
Exploratory analysis of the multivariate chemical space
Principal Component Analysis (PCA) was employed to explore the chemical space defined by the 881 PubChem fingerprint descriptors and to assess the distribution of compounds based on their bioactivity profiles. The first two principal components accounted for 90% of the total variance, enabling a clear separation between active (pIC50 ≥ 6), inactive, and intermediate compounds with respect to dopaminergic and serotonergic receptor inhibition. Although we used the labels ‘active’, ‘intermediate’ and ‘inactive’ in the PCA, this classification was based on the threshold pIC50 ≥ 6, and thus more appropriately represents ‘more active’ versus ‘less active’ compounds.
Normality tests (Shapiro-Wilk) indicated significant deviations in all groups for PC1 and PC2 (p < 0.001). Regarding homoscedasticity, Levene’s test revealed no significant difference for PC1 (p = 0.78), while heteroscedasticity was found for PC2 (p < 0.001). Given the non-normality of the data, the Kruskal-Wallis test was applied, which revealed significant differences between the three classes, both for PC1 (p < 0.001) and PC2 (p = 0.001). Dunn’s post-hoc analysis confirmed these differences, showing that the classes are statistically distinct when projected onto the first two principal components, which account for 60.5 and 39.5% of the explained variance, respectively (Tables 1 and S2, Supplementary Information section).
Principal component analysis (PCA) with Kruskal-Wallis and Dunn post-hoc test between classes, active, intermediate and inactive
Model performance and residual analysis
The predictive performance of the Random Forest and XGBoost models was evaluated before and after the exclusion of statistical outliers. Residual analysis of the initial models identified compounds with deviation exceeding > ± 2 standard deviations from the regression line, which were removed to mitigate potential distortion in model training.
After outlier removal, both models demonstrated improved predictive accuracy, particularly on the test set. The coefficient of determination (R2 test) increased from 0.553 to 0.691 for Random Forest and from 0.538 to 0.687 for XGBoost. Corresponding reductions were observed in RMSE and MAE values (Table 2). These results indicate that model refinement enhanced the consistency of bioactivity predictions across structurally diverse compounds, although the overall predictive power remains moderate. Considering that the dataset encompasses multiple targets related to two different neurotransmitter systems-thus introducing additional variability-these models still represent valuable tools for initial screening of candidate compounds. Figure 2 presents the graphical analysis of the Random Forest and XGBoost models before (Figure 2a) and after (Figure 2b) residual analysis and outlier removal.
Residual profile of each model (test sets) before outlier removal. (a) Random Forest and (b) XGBoost.
Hyperparameter optimization and cross-validation
To enhance predictive accuracy and reduce overfitting, model hyperparameters were tuned using three optimization strategies: Random Search, Grid Search, and Bayesian Optimization. Bayesian Optimization, which utilizes a probabilistic surrogate model to guide exploration of the hyperparameter space, yielded the best overall performance and was selected for final model training. Among the three methods tested, Bayesian optimization delivered the best results and was selected for model tuning in the following analyses, since this approach provided the best trade-off between performance and computational cost for Random Forest (Table 3) and XGBoost (Table 4) models.
A 5-fold cross-validation protocol was applied within the training set to estimate generalization error. In each fold, one subset served as the validation set while the remaining subsets were used for training. This approach ensured performance estimation and minimized bias due to data partitioning.
Machine learning-based identification of multitarget inhibitors for neuropsychiatric disorders
After training and validating the Random Forest and XGBoost models, we applied these models to screen a database of FDA-approved drugs (ZINC-22) to identify compounds that act as simultaneous inhibitors of serotonergic and dopaminergic receptors, targeting the treatment of neuropsychiatric disorders. As a result, 162 compounds from various pharmaceutical classes with a mean pIC50 above the threshold of 6 were predicted by both models.
The top compounds with multitarget inhibitory bioactivity against dopaminergic and serotonergic receptors were identified (Table 5), including epinephrine (pIC50 = 8.723), phenylephrine (pIC50 = 8.362), fluphenazine (pIC50 = 8.366), ziprasidone (pIC50 = 8.399), chlorpromazine (pIC50 = 8.278), and haloperidol (pIC50 = 7.940). As expected, several antipsychotics and drugs acting on the central nervous system emerged among the top-ranked candidates. Interestingly, the models also retrieved molecules from unrelated therapeutic classes, such as penicillin G (mean pIC50 = 7.517), ampicillin (mean pIC50 = 7.477), and pravastatin (mean pIC50 = 7.350). While these are unlikely to act as dual inhibitors in vivo due to pharmacokinetic constraints (e.g., poor blood-brain barrier penetration), their identification highlights potential structural motifs that could inspire the design of novel compounds with improved neuropharmacological profiles. In this sense, although they may not represent direct repositioning opportunities, such scaffolds could contribute to future drug design efforts targeting both dopaminergic and serotonergic pathways, particularly in the context of neuroinflammation.
Interpretation of SHAP values and feature importance
To understand SHAP value plots in a regression model where the dependent variable is pIC50, it is essential to grasp how these plots illustrate the contribution of individual molecular descriptors to the predicted pIC50 values for each compound. SHAP values reveal the directional impact of each descriptor, indicating whether its presence increases or decreases the predicted bioactivity. Figures 3 and S1 (Supplementary Information section) plot with SHAP values highlight the main molecular descriptors and relevant substructures, aiming to simultaneously inhibit serotonergic and dopaminergic receptors.
Violin plots of SHAP values for each model. (a) Random Forest; (b) Extreme Gradient Boosting (the interpretability analysis using SHAP values allowed for the identification of molecular descriptors with the greatest influence on the predicted pIC50 values for dopaminergic and serotonergic receptor inhibition). Among the descriptors with the highest SHAP impact, PubChemFP484 (N-C:N:C), PubChemFP643 ([#1]-C-C-N-[#1]), and PubChemFP340 (C(~C)(~C)(~N)) were particularly prominent.
The descriptor PubChemFP484 represents nitrogen-carbon double bond fragments commonly found in conjugated heterocycles or aromatic imines. These substructures are frequently associated with π-π interactions and hydrogen bonding within the orthostatic binding sites of G protein-coupled receptors (GPCRs), particularly dopamine D2 and serotonin 5-HT2A receptors. These interactions are essential for stabilizing ligand-receptor complexes and are characteristic of pharmacologically active scaffolds in several antipsychotic drugs.
PubChemFP643 denotes aliphatic chains containing nitrogen atoms (e.g., primary or secondary alkylamines), which are well-established features in compounds with affinity for monoaminergic receptors. Basic amine groups are known to form strong ionic interactions with conserved acidic residues in GPCRs, such as Asp3.32 in D2R. The SHAP values associated with this descriptor exhibited both positive and negative contributions, suggesting that the surrounding molecular context (e.g., distance to aromatic rings, rigidity) modulates its effect.
PubChemFP340 captures motifs with a central carbon attached to two alkyl groups and one nitrogen atom, a common pattern in substituted piperidines, morpholines, and other tertiary amine-containing rings. These fragments contribute to conformational stability and are often involved in fitting into hydrophobic subpockets while engaging in polar interactions via the nitrogen functionality.
Overall, the SHAP analysis reveals that nitrogen-containing functionalities and conjugated aromatic systems are the most relevant features driving multitarget bioactivity, consistent with known pharmacophore models for both receptor families.
The structural insights derived from SHAP values offer practical guidance for the rational optimization of multitarget ligands. The consistent relevance of basic nitrogen-containing groups supports the maintenance or introduction of tertiary amines or heterocyclic scaffolds in lead compounds to enhance electrostatic interactions with GPCR binding sites. Incorporating conjugated aromatic systems-such as indoles, pyridines, or triazoles-may further improve binding affinity through π-π stacking and hydrogen bonding with aromatic or polar residues in the receptor.
Moreover, the presence of flexible aliphatic linkers adjacent to polar groups, as indicated by descriptors like PubChemFP643, suggests that fine-tuning chain length and branching can help adjust the spatial arrangement of pharmacophores, maximizing simultaneous engagement with both dopaminergic and serotonergic targets. Scaffold modifications based on high-impact descriptors may also support fragment growing or hopping strategies to explore new chemotypes.
Discussion
This study demonstrates the feasibility of integrating QSAR and ML models to predict dual inhibitory activity (pIC50 values) against dopaminergic and serotonergic receptors-two central therapeutic targets in neuropsychiatric pharmacology. By leveraging binary molecular fingerprints and ensemble-based models (Random Forest and XGBoost), the predictive framework yielded high consistency, particularly after the exclusion of statistical outliers, with R2 values reaching ≅ 0.69 and reduced RMSE and MAE across test sets. These results validate the utility of 2D representations as efficient initial proxies for identifying multitarget molecular patterns in large-scale chemical datasets.17,41
SHAP analysis eluded the most important chemical features influencing predicted bioactivity. Descriptors such as PubChemFP484 (N=C structures), PubChemFP643 (aliphatic amines), and PubChemFP340 (substituted tertiary amines) had dominant predictive influence across models. These features are pharmacologically consistent with known interaction modes of G protein-coupled receptors GPCR ligands, supporting ionic and π-π interactions at orthosteric binding sites.42,43 Notably, PubChemFP643 shows context-dependent contributions (positive and negative), indicating that not just presence, but molecular environment (e.g., spatial arrangement, ring proximity) modulates bioactivity.
It is important to highlight that our SHAP-based analysis indicates molecular context can be a crucial factor: some descriptors (e.g., PubChemFP643) exhibited dual behavior depending on adjacent functionalities or molecular topology. This suggests that not merely the presence but also the spatial organization of pharmacophores modulates bioactivity-a nuance often overlooked in traditional QSAR models. This insight aligns with recent advances in dynamic ligand binding theories, where induced fit and conformational entropy are recognized as critical modulators of receptor engagement.
The virtual screening of over 1,500 FDA-approved drugs yielded 162 candidate compounds with predicted dual-target activity. Among these candidates, several established antipsychotics-such as risperidone, ziprasidone, fluphenazine, and haloperidol-ranked highest, which internally validates the alignment of the model with known multitarget agents. Furthermore, our models identified non CNS (Central Nervous System) drugs (e.g., clevidipine, droperidol, cyproheptadine) with plausible off-target binding potential, raising hypotheses for future repurposing. While some compounds like risperidone are already well-documented dual modulators of D2 and 5-HT2A receptors, others require caution due to distinct clinical indications and ADME (absorption, distribution, metabolism and excretion) profiles, necessitating confirmatory experimental assays before considering therapeutic repositioning.44-46
Despite these findings being consistent with established pharmacology, the virtual screening also retrieved drugs with limited clinical plausibility, such as penicillin G, pravastatin, and aspirin. These compounds present relevant pharmacokinetic constraints, including poor blood-brain barrier penetration, in addition to possessing chemical scaffolds that are largely unrelated to typical ligands of dopaminergic and serotonergic receptors. Thus, their identification most likely reflects experimental noise inherent to bioactivity databases and limitations of the molecular representation employed (2D binary descriptors). Although they do not constitute viable candidates for direct repositioning, these molecules may serve as starting points for the design of new derivatives with properties more compatible with the desired neuropharmacological profile. This consideration simultaneously reinforces the value of the model as a hypothesis-generating tool and the importance of a critical interpretation of its results.
From a methodological standpoint, this study presents some limitations that should be acknowledged. All IC50 values (> 6,000) were retrieved from ChEMBL and were not harmonized across assay formats, meaning they were unlikely obtained under identical experimental protocols. Despite preprocessing steps to reduce heterogeneity, differences in assay design, readouts, and experimental conditions may still introduce variability. Reported discrepancies in IC50 values across methodologies can reach 2-3 fold, and in some cases up to 10-fold; thus, although IC50 is a common and practical unified endpoint in QSAR analyses, methodological variability likely contributes to residual noise.47-50 Furthermore, molecular descriptors were restricted to 2D fingerprints, which do not capture conformational flexibility or electrostatic surface features relevant to ligand-receptor recognition, and the mechanism of inhibition (orthosteric vs. allosteric) was not systematically distinguished, potentially integrating compounds acting through different inhibitory modes into the same models. Finally, no external biological validation was performed: while known inhibitors were successfully recovered, experimental screening in vitro and in vivo will be essential to confirm novel predictions.
From a pharmacological standpoint, the identification of molecular fragments associated with dual activity reinforces established neurochemical principles. The prevalence of nitrogen-containing moieties (e.g., alkylamines, imines) and aromatic systems in active compounds reflects their crucial role in forming electrostatic interactions (e.g., salt bridges with conserved residues such as Asp3.32 in dopamine D2 receptors) and within the orthosteric pockets of G protein-coupled receptors (GPCRs). This is consistent with known pharmacophores of atypical antipsychotics and serotonin-dopamine stabilizers.42,51
Beyond the identification of already known multitarget drugs, such as risperidone, ziprasidone, and haloperidol, our model flagged compounds outside the classical CNS domain, such as penicillin G and pravastatin. While these results warrant caution due to likely assay artifacts or off-target binding unrelated to clinical relevance, they also echo the broader phenomenon of “promiscuous binding” seen in drug repurposing literature.52,53 Notably, the identification of adrenergic agonists such as epinephrine with high predicted affinity supports emerging discussions about the intersection between adrenergic modulation and neuropsychiatric symptoms, including arousal regulation, stress response, and cognitive flexibility. This opens provocative hypotheses about the contribution of peripheral neurotransmitter systems to central neuropsychiatric phenotypes-a topic increasingly explored in neuroimmunology and psychoneuroendocrinology.
In summary, the findings corroborate the growing consensus that multitarget-directed ligands represent a rational strategy for addressing the multifactorial etiology of neuropsychiatric disorders. The computational framework presented herein offers a useful framework for guiding experimental design and accelerating hypothesis-driven discovery of neuroactive agents.
Conclusions
This study presents a QSAR-based machine learning approach for the identification of compounds with predicted dual inhibitory activity against dopaminergic and serotonergic receptors. Through systematic preprocessing, model optimization and validation, and SHAP-based interpretability, the Random Forest and XGBoost algorithms demonstrated consistent performance in predicting bioactivity across a curated dataset. The subsequent virtual screening of FDA-approved drugs led to the identification of 162 candidate molecules with potential multitarget profiles.
Although these findings provide a relevant in silico contribution, they require experimental validation to confirm their translational potential. Overall, this integrative strategy reinforces the utility of cheminformatics tools in the early-stage screening of multitarget drug candidates for complex neuropsychiatric conditions.
Supplementary Information
Supplementary Information
Acknowledgments
The authors would like to thank the Coordination for the Improvement of Higher Education Personnel - Brazil (CAPES) for their support in carrying out this work - Finance Code 001, and National Council for Scientific and Technological Development - Brazil (CNPq). The authors gratefully acknowledge Moisés Maia Neto for their assistance in the statistical analysis of this study.
Data Availability Statement
The codes used in the study are available on the GitHub repositories of Caroline Mensor Folchini [Link].18
References
-
1 World Health Organization (WHO); World Mental Health Report: Transforming Mental Health for All; WHO: Geneva, 2022. [Link] accessed in September 2025
» Link -
2 Bishop, J. R.; Pavuluri, M. N.; Neuropsychiatr. Dis. Treat. 2008, 4, 55. [Crossref]
» Crossref -
3 Sampogna, G.; Di Vincenzo, M.; Giuliani, L.; Menculini, G.; Mancuso, E.; Arsenio, E.; Cipolla, S.; Della Rocca, B.; Martiadis, V.; Signorelli, M. S.; Fiorillo, A.; Brain Sci. 2023, 13, 1577. [Crossref]
» Crossref -
4 McCracken, J. T.; McGough, J.; Shah, B.; Cronin, P.; Hong, D.; Aman, M. G.; Arnold, L. E.; Lindsay, R.; Nash, P.; Hollway, J.; McDougle, C. J.; Posey, D.; Swiezy, N.; Kohn, A.; Scahill, L.; Martin, A.; Koenig, K.; Volkmar, F.; Carroll, D.; Lancor, A.; Tierney, E.; Ghuman, J.; Gonzalez, N. M.; Grados, M.; Vitiello, B.; Ritz, L.; Davies, M.; Robinson, J.; McMahon, D.; N. Engl. J. Med. 2002, 347, 314. [Crossref]
» Crossref -
5 Siafis, S.; Çıray, O.; Wu, H.; Schneider-Thoma, J.; Bighelli, I.; Krause, M.; Rodolico, A.; Ceraso, A.; Deste, G.; Huhn, M.; Fraguas, D.; San José Cáceres, A.; Mavridis, D.; Charman, T.; Murphy, D. G.; Parellada, M.; Arango, C.; Leucht, S.; Mol. Autism 2022, 13, 10. [Crossref]
» Crossref -
6 Nestsiarovich, A.; Gaudiot, C. E. S.; Baldessarini, R. J.; Vieta, E.; Zhu, Y.; Tohen, M.; Eur. Neuropsychopharmacol. 2022, 54, 75. [Crossref]
» Crossref -
7 Ramsay, R. R.; Popovic-Nikolic, M. R.; Nikolic, K.; Uliassi, E.; Bolognesi, M. L.; Clin. Transl. Med. 2018, 7, e3. [Crossref]
» Crossref -
8 Ching, T.; Himmelstein, D. S.; Beaulieu-Jones, B. K.; Kalinin, A. A.; Do, B. T.; Way, G. P.; Ferrero, E.; Agapow, P. M.; Zietz, M.; Hoffman, M. M.; Xie, W.; Rosen, G. L.; Lengerich, B. J.; Israeli, J.; Lanchantin, J.; Woloszynek, S.; Carpenter, A. E.; Shrikumar, A.; Xu, J.; Cofer, E. M.; Lavender, C. A.; Turaga, S. C.; Alexandari, A. M.; Lu, Z.; Harris, D. J.; Decaprio, D.; Qi, Y.; Kundaje, A.; Peng, Y.; Wiley, L. K.; Segler, M. H. S.; Boca, S. M.; Swamidass, S. J.; Huang, A.; Gitter, A.; Greene, C. S.; J. R. Soc. Interface 2018, 15, 20170387. [Crossref]
» Crossref -
9 Goh, G. B.; Hodas, N. O.; Vishnu, A.; J. Comput. Chem. 2017, 38, 1291. [Crossref]
» Crossref -
10 Parvatikar, P. P.; Patil, S.; Khaparkhuntikar, K.; Patil, S.; Singh, P. K.; Sahana, R.; Kulkarni, R. V.; Raghu, A. V.; Antiviral Res. 2023, 220, 105740. [Crossref]
» Crossref -
11 Yang, F.; Zhang, Q.; Ji, X.; Zhang, Y.; Li, W.; Peng, S.; Xue, F.; Interdiscip. Sci.: Comput. Life Sci. 2022, 14, 15. [Crossref]
» Crossref -
12 Abdolmaleki, A.; Ghasemi, J.; Ghasemi, F.; Curr. Drug Targets 2017, 18, 556. [Crossref]
» Crossref -
13 Napolitano, F.; Zhao, Y.; Moreira, V. M.; Tagliaferri, R.; Kere, J.; D’Amato, M.; Greco, D.; J. Cheminform. 2013, 5, 30. [Crossref]
» Crossref -
14 Bhargava, K.; Nath, R.; Seth, P. K.; Pant, K. K.; Dixit, R. K.; Bioinformation 2014, 10, 8. [Crossref]
» Crossref -
15 Batool, M.; Ahmad, B.; Choi, S.; Int. J. Mol. Sci. 2019, 20, 2783. [Crossref]
» Crossref -
16 OECD; (Q)SAR Assessment Framework: Guidance for the regulatory assessment of (Quantitative) Structure Activity Relationship models and predictions, Second Edition; 2024. [Link] accessed in October 2025
» Link -
17 Cobre, A. F.; Ara, A.; Alves, A. C.; Maia Neto, M.; Fachi, M. M.; Beca, L. S. A. B.; Tonin, F. S.; Pontarolo, R.; Chemom. Intell. Lab. Syst. 2024, 250, 105145. [Crossref]
» Crossref -
18 GitHub. [Link] accessed in September 2025
» Link -
19 Google Colaboration. [Link] accessed in January 2025
» Link -
20 Zdrazil, B.; Felix, E.; Hunter, F.; Manners, E. J.; Blackshaw, J.; Corbett, S.; de Veij, M.; Ioannidis, H.; Lopez, D. M.; Mosquera, J. F.; Magarinos, M. P.; Bosc, N.; Arcila, R.; Kizilören, T.; Gaulton, A.; Bento, A. P.; Adasme, M. F.; Monecke, P.; Landrum, G. A.; Leach, A. R.; Nucleic Acids Res. 2023, 52, 1180. [Crossref]
» Crossref -
21 Tingle, B. I.; Tang, K. G.; Castanon, M.; Gutierrez, J. J.; Khurelbaatar, M.; Dandarchuluun, C.; Moroz, Y. S.; Irwin, J. J.; J. Chem. Inf. Model. 2023, 63, 1166. [Crossref]
» Crossref -
22 Kumar, A.; Loharch, S.; Kumar, S.; Ringe, R. P.; Parkesh, R.; Comput. Struct. Biotechnol. J. 2021, 19, 424. [Crossref]
» Crossref -
23 McKinney, W.; Proc. 9th Python Sci. Conf. 2010, 1, 56. [Crossref]
» Crossref -
24 Harris, C. R.; Millman, K. J.; van der Walt, S. J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N. J.; Kern, R.; Picus, M.; Hoyer, S.; van Kerkwijk, M. H.; Nature 2020, 585, 357. [Crossref]
» Crossref -
25 Waskom, M. L.; J. Open Source Softw. 2021, 6, 1. [Crossref]
» Crossref -
26 Virtanen, P.; Gommers, R.; Oliphant, T. E.; Haberland, M.; Reddy, T.; Walt, S. J. Van Der; Brett, M.; Wilson, J.; Millman, K. J.; Mayorov, N.; Nelson, A. R. J.; Jones, E.; Kern, R.; Larson, E.; Carey, C. J.; Polat, I.; Feng, Y.; Moore, E. W.; VanderPlas, J.; Laxalde, D.; Perktold, J.; Cimrman, R.; Henriksen, I.; Quintero, E. A.; Harris, C. R.; Archibald, A. M.; Ribeiro, A. H.; Pedregosa, F.; van Mulbregt, P.; Nat. Methods 2020, 17, 261. [Crossref]
» Crossref - 27 Pedregosa, F.; Weiss, R.; Brucher, M.; J. Mach. Learn. Res. 2011, 12, 2825. [Crossref]
-
28 Scikit-learn. [Link] accessed in January 2025
» Link -
29 Yap, C. W.; J. Comput. Chem. 2011, 32, 1466. [Crossref]
» Crossref -
30 Lipinski, C. A.; Adv. Drug Delivery Rev. 2016, 101, 34. [Crossref]
» Crossref -
31 Chen, T.; Guestrin, C.; Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: San Francisco California, USA, 2016, 8, 785. [Crossref]
» Crossref -
32 Domingues, K. Z. A.; Cobre, A. F.; Fachi, M. M.; Lazo, R. E. L.; Ferreira, L. M.; Pontarolo, R.; J. Braz. Chem. Soc. 2025, 36, e-20250028. [Crossref]
» Crossref -
33 Singh, S.; Kumar, R.; Payra, S.; Singh, S. K.; Cureus 2023, 15, e44359. [Crossref]
» Crossref -
34 Boldini, D.; Grisoni, F.; Kuhn, D.; Friedrich, L.; Sieber, S. A.; J. Cheminform. 2023, 15, 73. [Crossref]
» Crossref -
35 Meaney, C.; Wang, X.; Guan, J.; Stukel, T. A.; BMC Med. Res. Methodol. 2025, 25, 134. [Crossref]
» Crossref -
36 Kim, S.-H.; Geem, Z. W.; Han, G.-T.; Sensors 2020, 20, 3697. [Crossref]
» Crossref -
37 Azar, A. S.; Samimi, T.; Tavassoli, G.; Naemi, A.; Rahimi, B.; Hadianfard, Z.; Wiil, U. K.; Nazarbaghi, S.; Bagherzadeh Mohasefi, J.; Lotfnezhad Afshar, H.; Eur. J. Med. Res. 2024, 29, 547. [Crossref]
» Crossref -
38 Jung, Y.; Hu, J.; J. Nonparametr. Stat. 2015, 27, 167. [Crossref]
» Crossref -
39 Lundberg, S. M.; Lee, S. I.; Proceedings of the 31st International Conference on Neural Information Processing Systems; Long Beach California, USA, 2017, 4768. [Crossref]
» Crossref -
40 Tang, T.; Song, D.; Chen, J.; Chen, Z.; Du, Y.; Dang, Z.; Lu, G.; Processes 2024, 12, 384. [Crossref]
» Crossref -
41 Chtita, S.; Ghamali, M.; Ousaa, A.; Aouidate, A.; Belhassan, A.; Taourati, A. I.; Masand, V. H.; Bouachrine, M.; Lakhlifi, T.; Heliyon 2019, 5, 3. [Crossref]
» Crossref -
42 Yang, D.; Zhou, Q.; Labroska, V.; Qin, S.; Darbalaei, S.; Wu, Y.; Yuliantie, E.; Xie, L.; Tao, H.; Cheng, J.; Liu, Q.; Zhao, S.; Shui, W.; Jiang, Y.; Wang, M. W.; Signal Transduction Targeted Ther. 2021, 6, 7. [Crossref]
» Crossref -
43 Velloso, J. P. L.; Ascher, D. B.; Pires, D. E. V.; Bioinforma. Adv. 2021, 1, 1. [Crossref]
» Crossref -
44 Lüscher Dias, T.; Schuch, V.; Beltrão-Braga, P. C. B.; Martins-de-Souza, D.; Brentani, H. P.; Franco, G. R.; Nakaya, H. I.; Transl. Psychiatry 2020, 10, 141. [Crossref]
» Crossref -
45 Jourdan, J. P.; Bureau, R.; Rochais, C.; Dallemagne, P.; J. Pharm. Pharmacol. 2020, 72, 1145. [Crossref]
» Crossref -
46 Pushpakom, S.; Iorio, F.; Eyers, P. A.; Escott, K. J.; Hopper, S.; Wells, A.; Doig, A.; Guilliams, T.; Latimer, J.; McNamee, C.; Norris, A.; Sanseau, P.; Cavalla, D.; Pirmohamed, M.; Nat. Rev. Drug Discovery 2018, 18, 41. [Crossref]
» Crossref -
47 Matsumoto, K.; Miyao, T.; Funatsu, K.; ACS Omega 2021, 6, 11964. [Crossref]
» Crossref -
48 Beheshti, A.; Pourbasheer, E.; Nekoei, M.; Vahdani, S.; J. Saudi Chem. Soc. 2016, 20, 282. [Crossref]
» Crossref -
49 Rifaioglu, A. S.; Atas, H.; Martin, M. J.; Cetin-Atalay, R.; Atalay, V.; Doǧan, T.; Brief. Bioinform. 2019, 20, 1878. [Crossref]
» Crossref -
50 Kalliokoski, T.; Kramer, C.; Vulpetti, A.; Gedeck, P.; PLoS One 2013, 8, e61007. [Crossref]
» Crossref -
51 Zhang, M.; Chen, T.; Lu, X.; Lan, X.; Chen, Z.; Lu, S.; Signal Transduct. Target. Ther. 2024, 9, 88. [Crossref]
» Crossref -
52 Lounkine, E.; Keiser, M. J.; Whitebread, S.; Mikhailov, D.; Hamon, J.; Jenkins, J. L.; Lavan, P.; Weber, E.; Doak, A. K.; Côté, S.; Shoichet, B. K.; Urban, L.; Nature 2012, 486, 361. [Crossref]
» Crossref -
53 Zell, L.; Bretl, A.; Temml, V.; Schuster, D.; Biomedicines 2023, 11, 1468. [Crossref]
» Crossref
Edited by
-
Editor handled this article:
Paulo Augusto Netz (Associate)
Publication Dates
-
Publication in this collection
01 Dec 2025 -
Date of issue
2025
History
-
Received
08 Aug 2025 -
Accepted
30 Oct 2025






