Open-access Drug Repurposing for Trypanosomiasis: Using Machine Learning Models and Polypharmacology to Identify Multitarget Candidates

Abstract

Chagas disease and African sleeping sickness are neglected tropical diseases (NTD) caused by Trypanosoma parasites, with current treatments facing challenges like toxicity and resistance. This study integrates machine learning and Quantitative Structure-Activity Relationship (QSAR) models to repurpose Food and Drug Administration (FDA)-approved drugs as potential treatments for these diseases. A dataset of 21,608 compounds with inhibitory activity against Trypanosoma cruzi and Trypanosoma brucei was analyzed using PubChem fingerprints. Random Forest and Extreme Gradient Boosting models were trained and applied to screen the ZINC-22 database for new therapeutic options. Posaconazole was predicted as the top candidate for multitarget activity against both Trypanosoma species, followed by pentamidine, a drug already approved for sleeping sickness. Additionally, 40 other drug candidates were identified by the models (pIC50 > 6 and coefficient of variation < 0.05), mainly antineoplastics (32%) and antifungals (19%). This approach demonstrates the potential of computational techniques in accelerating the discovery of drug candidates for neglected infectious diseases.

Keywords:
QSAR; machine learning; repurposing; multitarget; Trypanosoma


Introduction

American and African trypanosomiases, caused by protozoa of the Trypanosoma genus, are characterized by the World Health Organization (WHO) as neglected tropical diseases (NTDs), representing serious public health challenges. American trypanosomiasis, or Chagas disease, caused by Trypanosoma cruzi, is endemic in Latin America but has also affected other regions due to international migration and climate change.1,2,3 In this disease, T. cruzi, transmitted by triatomine insects, is capable of invading muscle and nerve cells, leading to severe chronic cardiac and digestive complications if not adequately treated during the acute phase.4,5 Sleeping sickness, confined to sub-Saharan Africa, is transmitted by the tsetse fly and caused by Trypanosoma brucei. The parasite affects the central nervous system, causing severe neurological disorders that are fatal if left untreated.6,7

Human African trypanosomiasis has recently been controlled through the joint efforts of stakeholders and the National Sleeping Sickness Control Programs (NSSCPs) of affected countries. These efforts include the distribution of fexinidazole to the population infected with the T. b. gambiense subspecies, the most common form of the disease, in its early stages, as well as monitoring and vector control activities.8,9,10 In contrast, it is estimated that an average of seven million individuals worldwide are infected with T. cruzi, and 75 million people are at risk of infection. Annually, Chagas disease kills more than 6,000 people globally due to its clinical complications in the chronic phase.11,12, 13

The approved medications for both Chagas disease and sleeping sickness are limited and often associated with severe side effects, leading to low patient adherence and compromising treatment efficacy. Additionally, the pharmaceutical industry faces significant challenges in developing new drugs, as this process requires substantial time and effort. These challenges are further compounded by diagnostic complications and the difficulties in monitoring for possible parasitological cure during the chronic phase of these diseases.11,12,14,15 The WHO recognizes the severity of these diseases and the neglect in research and investment. As a result, a ten-year plan (2021-2030) has been established, aiming at the eradication and control of both NTDs.15 The urgency to seek new therapies and more effective solutions also stems from the inherent problems of available treatments, which include toxicity, emerging resistance, and limited efficacy in advanced stages of the diseases.3,12,15,16,17

The application of technologies such as machine learning (ML) algorithms, and Big Data in the QSAR (Quantitative Structure-Activity Relationship) technique, focusing on polypharmacology, can be a valuable tool in the process of selecting new drug candidates. These approaches enable a comprehensive analysis of interactions between chemical compounds and their biological targets, accelerating the screening process of molecules and identifying promising candidates more efficiently than traditional methods.18,19,20,21 Through these technologies, it is possible to extract insights from large datasets of genomics, proteomics, and chemistry, identifying patterns and correlations that can guide the discovery of new therapies with a higher likelihood of success.22,23,24,25 Additionally, polypharmacology, which considers multiple biological targets within organisms, enables the identification of compounds that are more effective, safer, have a broader spectrum of action, and exhibit a lower rate of treatment resistance.20,26,27

Among the ligand-based drug discovery computational techniques, QSAR stands out for quantitatively correlating the physicochemical structure of compounds with their biological activity.20,28 In this context, drug repositioning for other conditions proves to be a promising strategy, especially for neglected tropical diseases, as it leverages the safety and efficacy already established for drugs used in clinical settings, thereby accelerating the availability of effective therapies.15,17,29,30 In this context, this present study aimed to identify approved compounds as potential repurposing candidates through virtual screening using QSAR-based ML methods, targeting multitarget activity against Trypanosoma brucei and Trypanosoma cruzi.

Experimental

As illustrated in Figure 1, this study utilized machine learning models (sci-kit-learn packages) in conjunction with QSAR analyses, incorporating Principal Component Analysis (PCA) in multivariate exploratory analysis, and SHAP (e.g., Shapley Additive Explanations) values evaluation to assess the most important features of the model. The algorithms, developed in Python by Cobre et al.,18 were adapted and implemented through Google Colab and Jupyter Notebook.31

Figure 1
Flowchart of the steps in the present study.

Description of database and datasets used

For applying ML models to predict the bioactivity of compounds, datasets from the publicly accessible ChEMBL database were used. This database includes a wide variety of information on over 2.4 million drug-like molecules, such as chemical nature, genomic data, and bioactivity, which have been manually curated by experts.32

From the datasets obtained via ChEMBL, data on bioactive compounds with simultaneous inhibitory activity against Trypanosoma cruzi and Trypanosoma brucei were compiled and preprocessed (Table 1). Among all the bioactivity-related information, such as the median inhibitory concentration (IC50), the concentration of the drug that induces half of the maximum effect (EC50), the minimum inhibitory concentration (MIC), the inhibition constant (Ki), and other data provided by the database, IC50 values were selected as quantitative variables for subsequent analyses.

Table 1
Datasets of Trypanosoma spp. organisms selected from ChEMBL

For screening candidate drugs using QSAR and machine learning models, the ZINC-22 database was utilized.33 This database is a public and updated collection of commercially available chemical compounds, containing millions of molecular structures in 3D format, ready for use in virtual screening studies. This process facilitates the rapid identification of repositionable molecules as potential new drug candidates.33 In total, 1,576 ZINC-22 IDs obtained from this database were evaluated using predictive bioactivity models against T. cruzi and T. brucei (see Supplementary Information (SI) section).

Dataset pre-processing and exploratory analysis

After processing the initial datasets (n = 31,190) by removing irrelevant information, such as compounds lacking SMILES (e.g., Simplified Molecular Input Line Entry System) code annotations, preprocessing was performed to reduce dimensionality and enhance data quality. This made the data usable and standardized for subsequent analysis by the models. The preprocessing of bioactivity data from the ChEMBL database included removing salts, standardizing tautomers, and categorizing the compounds as active, inactive, or intermediate in relation to their bioactivity against Trypanosoma spp.

For exploratory analysis and visualization of the chemical space, Lipinski’s Rule of Five was used. This set of parameters is employed in medicinal chemistry to assess the viability of compounds as oral drugs based on their physicochemical characteristics. These rules help predict the absorption and permeability of a molecule, which are essential properties for its effectiveness as a medication.34,35 Additionally, a multivariate exploratory analysis was conducted using PCA, employing 881 fingerprint descriptors obtained from PubChem.36 PCA can reveal patterns among molecules, indicating similarities based on variations in chemical characteristics.37,38

After classifying the compounds according to their drugability, IC50 values were standardized and categorized (in molar scale, M). Compounds with IC50 values of < 10-5 M were classified as active, based on criteria for hit compounds for Chagas disease (IC50 <10 μM),39 those between 10-5 M and 10-4 M as intermediate, and those with IC50 values > 10-4 M as inactive against Trypanosoma spp. Subsequently, the data were normalized by transforming IC50 into pIC50 logarithmic scale, to facilitate better interpretation and comparison of the information. To reduce potential bias effects arising from feature intercorrelation in regression models, data with low variance (variance < 0.1) were removed from the dataset.

Multitarget QSAR-based machine learning models

In developing QSAR models for predicting multitarget bioactivity against Trypanosoma cruzi, Trypanosoma brucei, Trypanosoma brucei brucei, Trypanosoma brucei rhodesiense, and Trypanosoma brucei gambiense, 244 descriptors from PubChem were used as independent variables for each bioactive compound available in the ChEMBL database after data filtering. The dependent variable was the bioactivity measured by pIC50. In model development, Random Forest and Extreme Boosting Regression (XGBoost) algorithms were applied to train the data and identify potential outliers. The final dataset was randomly divided into 70% for training the models and 30% for testing. This strategy ensures data representativity in both groups, providing a more robust evaluation of the model’s performance in realistic scenarios.18,40,41

The Random Forest algorithm combines multiple decision trees to make more robust predictions, reducing variability and mitigating the risk of overfitting by creating independent trees that are combined to form the final prediction. One of its main advantages is its ability to handle large volumes of data efficiently without the need for intensive preprocessing. The ability of Random Forest to work well with noisy data and missing values is also a major advantage in drug discovery contexts, where databases may be incomplete or imprecise.22,42,43,44

Gradient Boosting builds a sequence of decision trees, where each new tree attempts to correct the errors made by the previous ones, with the ability to model complex nonlinear relationships between data features. One of the main advantages of this model is its continuous optimization capability: being trained sequentially, the model progressively adjusts its predictions, which is useful for constantly refining bioactivity estimates of new molecules. However, one challenge of Gradient Boosting, especially with large datasets, is the training time and the risk of computational overload. To overcome this, optimized implementations like Extreme Gradient Boosting (XGBoost) have emerged, offering better computational performance. XGBoost is known not only for its efficiency in terms of time and memory but also for its ability to parallelize and optimize the training process. Additionally, it features advanced regularization techniques that help prevent overfitting even with a large number of trees and high data volume in predicting bioactivity data of compounds.44,45,46,47

Outliers were defined as compounds whose residuals relative to the regression line exceeded ± two standard deviations.48 Subsequently, these outliers were removed using a supervised outlier detection approach, which involves excluding test data points with the highest residuals-the difference between experimental and predicted values. The models were then retrained to evaluate changes in the regression model’s performance metrics, including R-squared (R2), mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE). Using the results from each model (Random Forest and XGBoost), means of the pIC50 values for each compound, standard deviation (SD), and coefficient of variation (CV%) were calculated across the models to determine the best of all possible active candidate compounds identified in terms of potential trypanocidal activity (CV < 5%).

Hyperparameters optimization and cross-validation

Hyperparameter optimization was performed for both machine learning models (Random Forest and XGBoost), with the goal of improving predictive performance by adjusting the parameters that control the model’s structure and learning. For this, three optimization approaches were tested: Random Search, Bayesian Approach, and Grid Search. Each of these techniques was evaluated based on its ability to efficiently explore the hyperparameter space, aiming to find the best combinations to maximize the models’ generalization. Random Search explores hyperparameters randomly, making it time-efficient for high-dimensional spaces, while Grid Search exhaustively evaluates all possible combinations. Bayesian Optimization stands out by building a probabilistic model of the objective function to identify promising regions of the hyperparameter space, balancing exploration and exploitation to achieve optimal results in fewer iterations.49,50,51

For cross-validation, k-fold is an approach that partitions the dataset into k equally sized subsets, or folds. In each iteration, one-fold is reserved for testing while the remaining folds are used for training, cycling through until every fold has been used as the test set. This iterative process provides a comprehensive evaluation of model performance, reducing the risk of overfitting and ensuring reliable insights into how well the model generalizes to unseen data. 5-fold cross-validation was applied, where the dataset is split into five folds. This approach balances computational efficiency with thorough assessment, making it particularly suitable for datasets of moderate size while maintaining accuracy and reliability in performance estimation.49,50,52

Identification of important features

To identify the important variables for the models that determine potential inhibitory activity against Trypanosoma spp. parasites, the chemical variations present in active, intermediate, and inactive compounds were evaluated using loadings plots from PCA (visualization in exploratory analysis) and SHAP values for each model. SHAP is an explanation technique used in machine learning models to understand how individual features contribute to predictions.53,54 In this study, SHAP values quantitatively assess the contribution of physicochemical descriptors (e.g., PubChem fingerprints) to the bioactivity of compounds. Positive SHAP values indicate that the presence of features increases the model’s prediction, while negative values signify the opposite. In plots, features are arranged vertically, with the first feature having the greatest magnitude of impact. The intensity of the color also indicates the significance of the contributions. In this case, blue represents features that decrease the predicted contribution, while red indicates a greater contribution to bioactivity prediction.

Results

Ensemble algorithm selection for prediction of trypanocidal compounds

For the prediction of compounds with inhibitory activity against T. cruzi and T. brucei, we selected the Random Forest and XGBoost Regression algorithms. These ensemble methods are highly applicable and perform well in drug discovery studies due to their ability to handle high variability in training data and capture nonlinear relationships between variables by combining predictions from multiple base estimators. These models are known for deliver robust predictive performance in QSAR studies when dealing with complex and high-dimensional data.45,46,55,56

Exploratory analysis of univariate chemical space

According to the results from the exploratory analysis of the ChEMBL database datasets, using Lipinski descriptors (Figure S1, SI section), all four descriptors (e.g., molecular weight, LogP, number of hydrogen bond donors, and number of hydrogen bond acceptors), along with pIC50 values, were statistically different between active and inactive groups (p < 0.0001), with an observed threshold of pIC50 > 6 for active compounds. It is noted that inactive compounds also tend to comply with Lipinski’s parameters, and the median values of each parameter (Mann-Whitney test) suggest that the selected compounds meet Lipinski’s criteria, generally exhibiting good hydrogen bonding profiles, suitable molecular dimensions, and good cell membrane permeability, confirming their drug-like properties.

Exploratory analysis of the multivariate chemical space

The multivariate analysis of the chemical space was used to distinguish between active compounds (pIC50 > 6), inactive compounds (pIC50 < 5), and those with intermediate bioactivity (pIC50 between 5 and 6) against Trypanosoma species and subspecies (e.g., Trypanosoma cruzi, Trypanosoma brucei, Trypanosoma brucei brucei, Trypanosoma brucei rhodesiense, and Trypanosoma brucei gambiense). PCA was performed using the 881 PubChem fingerprint descriptors. This method identified three principal components sufficient for capturing the chemical variation among the molecular structures of the compounds, with a total cumulative variance of 88.31%. These principal components allowed an effective separation of groups of compounds with different levels of bioactivity, as evidenced by the score plot (Figure 2) of PC1, PC2, and PC3 in terms of parasite inhibition.

Figure 2
Multivariate exploration of the chemical space and bioactivity of compounds against different Trypanosoma species. Each colored symbol (e.g., red triangles, green circles, and blue squares) represents the class of active, intermediate, and inactive compounds, respectively.

To verify the variables that most contributed to the differentiation between compound classes concerning bioactivity against T. cruzi and T. brucei, a loading plot of the PCA model was constructed. In this plot, fingerprint descriptors related to PubChem structures such as PubChem11 (≥ 8 carbons), PubChem14 (≥ 1 nitrogen), PubChem18 (≥ 1 oxygen), PubChem143 (≥ 1 any ring size), PubChem146 (≥ 1 saturated or aromatic heteroatom-containing ring size), PubChem178 (≥ 1 any ring size), PubChem179 (≥ 1 saturated or aromatic carbon-only ring size), PubChem185 (≥ 2 any ring size), PubChem255 (≥ 1 aromatic ring), PubChem284 (C-C), PubChem300 (N-N), and those in the range from PubChem320 (Si-H) to PubChem700 (O-C-C-C-C-C-O-C) were identified as most important in distinguishing active compounds (Figure 3).

Figure 3
Loading plot of the PCA model, showing the most important variables responsible for the discrimination of the three bioactivity classes against T. cruzi and T. brucei.

QSAR-based machine learning models

The evaluation metrics for the regression predictive models of Random Forest and XGBoost, before and after the removal of outliers, are summarized in Table 2. Initially, descriptor selection was performed to optimize the construction of the models; out of the 881 descriptors calculated from PubChem, only 244 were selected as relevant based on a variance threshold of less than 0.1, aiming to reduce multicollinearity effects.

Table 2
Metrics of the regression models

Residual analysis in ML is a technique used to evaluate the quality and performance of a regression model (see residual profiles of each model in Figure S2 of the SI section). The removal of outliers showed improvements in the predictive model parameters. The test MAE decreases after the removal of outliers (RF: 0.3952 vs. 0.4872; XGBoost: 0.4278 vs. 0.5056), indicating that the average errors are smaller and more consistent. Both models improve significantly, but the RF without outliers shows a slightly better performance than XGBoost, with marginally better R2 and MSE. This may indicate that RF benefits more from the removal of outliers.

Hyperparameters optimization and cross-validation

Three hyperparameter optimization techniques-random search, Bayesian optimization, and grid search-were evaluated for both machine learning models (Random Forest and XGBoost), and 5-fold cross-validation was performed. Among the three approaches, Bayesian optimization outperformed the others, and was chosen for model tuning in subsequent analyses (Tables S1 and S2, SI section).

Identification of important features using SHAP values

SHAP values are useful for interpreting and ranking the importance of variables in ML models, especially in tree-based models such as Random Forest and XGBoost. The SHAP values analysis, presented in Figure 4, revealed that PubChemFP418 (C=N) has the most significant impact on the Random Forest model. Its SHAP value is primarily distributed towards the positive side of the x-axis, indicating that higher values of this feature are associated with an increase in the model’s predicted output. On the other hand, PubChemFP16 (≥ 4 nitrogens) has the highest feature value for XGBoost.

Figure 4
Violin plots of SHAP values for each model. (a) Random Forest; (b) Extreme Gradient Boosting (XGBoost).

PubChemFP12 (≥ 16 carbons) emerged as the most significantly positive feature in predicting bioactivity for both machine learning models. Additionally, PubChemFP24 is ranked as one of the most important features in both models. For both PubChemFP12 and PubChemFP24, the SHAP values are distributed across both the positive and negative sides of the x-axis, suggesting that variations in these features can have both positive and negative effects on the model’s output, depending on the specific feature values.

Application of machine learning models for drug repurposing

Optimized machine learning algorithms (Random Forest and XGBoost) were used to screen an external database (ZINC-22 database) containing Food and Drug Administration (FDA)-approved drugs, to identify candidates for repositioning against T. cruzi and T. brucei. In the screening, pentamidine, a drug already used clinically for the treatment of sleeping sickness (see Table S3 in the SI section), was identified by the models as the second most active candidate against both parasites (mean pIC50 = 7.87).

In addition to pentamidine, the models also predicted that 40 other drugs, indicated for treatment of different diseases, have potential multitarget activity for treating Chagas disease and sleeping sickness (Table 3). Among these are primarily some antineoplastic agents (32%), used for various types of cancer, such as vincristine (mean pIC50 = 7.04) and paclitaxel (mean pIC50 = 6.75); and azole antifungal medications (19%), such as posaconazole, which is the compound with the highest predicted bioactivity (pIC50 = 8.79), and clotrimazole (pIC50 = 6.27).

Table 3
41 drugs with potential for repositioning for multitarget treatment against T. cruzi and T. brucei

Discussion

The multitarget approach in drug discovery provides several advantages for addressing complex and infectious diseases. By targeting multiple biological sites, it reduces the likelihood of resistance and minimizes side effects, thereby improving therapeutic response, especially for chronic and progressive diseases. Furthermore, drugs that target multiple sites can have synergistic effects, where the combined actions lead to greater therapeutic efficacy than the sum of individual effects.26,57,58

In drug discovery, QSAR, a ligand-based computational method, allows prediction of biological activity of new promising chemical compounds even before they are synthesized and experimentally tested. The application of artificial intelligence (AI) and techniques related to the QSAR approach, unlike traditional methods, enables multivariate analysis of multiple models simultaneously and the management of large volumes of complex data with high efficiency. This approach saves time and resources in the development of new drugs, being particularly useful in the context of neglected diseases. Moreover, these studies, along with other in silico approaches, have been conducted to enable rapid screening of new potential treatment candidates for various diseases, as well as to search for new trypanocidal compounds.20,59,60,61,62

This study focused on using supervised machine learning models combined with QSAR techniques to explore drug repurposing candidates for treatment of American and African trypanosomiases. The performance of predictive models is directly related to the quality of the dataset. Despite the intrinsic variability from different in vitro assays that produced the bioactivity data, ensemble methods based on decision trees are effective in identifying complex patterns and interactions between features, even compared to more advanced deep learning models.22,42,43,44,63,64

Our QSAR-based ML models were developed using a comprehensive experimental dataset that includes molecular descriptors extracted from the PubChem database. Compared to other sources of descriptors, PubChem provides reliable and accessible information, capturing a wide range of molecular characteristics, including molecular structure, atomic connectivity, functional groups, and substructure patterns. This detailed representation of molecules offers robustness and ease of use, making it highly applicable for large-scale modeling and QSAR studies.32,36

The multivariate exploratory analysis through PCA provided insights into the chemical diversity of the molecules. By capturing 88.31% of the chemical variance with three principal components, PCA facilitated the differentiation of compounds with varying levels of biological activity against different Trypanosoma spp. The application of SHAP values further elucidated molecular features with significant predictive impact in both ML models. Identifying chemical descriptors as features is valuable in drug discovery with ensemble models based on decision trees, as it offers a comprehensive understanding of the factors influencing bioactivity prediction. These descriptors are useful for further experimental validation and for developing drug optimization strategies, contributing to significant advances in the discovery of new medications.44,65,66

Using the developed Random Forest and XGBoost models, 41 FDA-approved drug candidates with potential multitarget activity against both T. cruzi and T. brucei were identified, in addition to pentamidine, through the ZINC-22 database. Pentamidine is a medication already used in clinical settings for the treatment of sleeping sickness. This drug, which is also a therapeutic option for leishmaniasis, inhibits the synthesis of nucleic acids and proteins in Trypanosoma brucei gambiense, particularly during the early stages of infection.17,67,68

Among the pharmacological classes of the 41 identified drugs (Table S4, SI section), approximately 32% are anticancer agents used for various types of cancer. Cytostatic compounds function by inhibiting cell division in a non-specific manner, thereby preventing cellular growth and proliferation.69,70 In this context, compounds with kinase activity, genetic material repair ability, and epigenetic regulation, used in antitumor therapy, have been evaluated for trypanosomiasis, as the parasites proliferate similarly to cancer cells.17,29,67,71

The second most represented pharmacological class among the screened drugs is antifungals (19%), mainly azoles, such as posaconazole (mean pIC50 = 8.79), which showed the highest predicted activity against both T. cruzi and T. brucei. The mechanism of action of these compounds is well-known and similar to their antifungal effect, involving the inhibition of the enzyme lanosterol 14-α-demethylase. This enzyme is crucial in the biosynthesis of ergosterol, an essential component of the cell membrane in fungi and some protozoa. In trypanosomatids, inhibition of this enzyme leads to ergosterol depletion and accumulation of toxic sterol intermediates, resulting in membrane dysfunction. This impairs membrane integrity, cellular homeostasis, and parasite proliferation, compromising its survival.29,72,73 The ability of these medications to target different species of trypanosomatids, including Leishmania spp., has already been the focus of repositioning studies, as it suggests a comprehensive therapeutic solution that could act synergistically with existing trypanocidal therapies, particularly in endemic areas.6,29,74,75,76,77

Although our study shows satisfactory and promising results, several limitations warrant consideration. The reliance on computational predictions necessitates subsequent experimental validation of identified drug candidates. Additionally, the generalization of our models to different populations and geographic regions requires further investigation. Future research efforts should focus on refining QSAR models with larger and more diverse datasets, in line with advancements in AI, integrating additional biological and pharmacological data, and conducting comprehensive in vitro and in vivo studies to validate the efficacy and safety of repositioned drugs for individuals infected with trypanosomiasis.

The findings of this work have several implications for the continuation of studies, particularly in the context of public health policies and the initial screening of new therapeutic options for Chagas disease and sleeping sickness.78,79 Identifying new applications for existing medications, especially those with established safety profiles, can accelerate the availability of new treatments for the affected population. Additionally, the integration of machine learning with traditional drug discovery approaches, such as QSAR, enhances the efficiency and cost-effectiveness of drug development pipelines.18,61,80

Conclusions

Through the application of Random Forest and Extreme Gradient Boosting models, our study identified 41 FDA-approved compounds with potential multitarget activity against Trypanosoma cruzi and Trypanosoma brucei, which are the causative agents of two neglected tropical diseases included in the WHO’s goal plan for eradication and control by 2030. The identification of pentamidine, along with other repositioned drugs, demonstrates not only the robustness of our predictive models, but also the innovative potential of polypharmacological approaches in repurposing existing therapies for new indications, leveraging existing knowledge about the safety and clinical efficacy profiles of these drugs.

As the next step, it is crucial to experimentally validate the results of this study to confirm the efficacy of the drug candidates under conditions closer to the biological and clinical environment. Moreover, future studies could explore drug combinations and further optimizations of predictive models to maximize therapeutic efficacy for neglected diseases.

Supplementary Information

Supplementary data are available free of charge at http://jbcs.sbq.org.br as PDF file.

Data Availability Statement

The codes used in the study are available on the GitHub repositories of https://github.com/KarimeZeraik/QSAR-and-ML.

Acknowledgments

The authors express their gratitude to the Brazilian National Council of Technological and Scientific Development (CNPq) and CAPES (Brazilian Federal Agency for Support and Evaluation of Graduate Education within the Ministry of Education of Brazil) for research funding - Finance Code 001.

References

  • 1 Martín-Escolano, J.; Marín, C.; Rosales, M. J.; Tsaousis, A. D.; Medina-Carmona, E.; Martín-Escolano, R.; ACS Infect. Dis. 2022, 8, 1107. [Crossref] [PubMed]
    » Crossref» PubMed
  • 2 Lidani, K. C. F.; Andrade, F. A.; Bavia, L.; Damasceno, F. S.; Beltrame, M. H.; Messias-Reason, I. J.; Sandri, T. L.; Front. Public Health 2019, 49, 166. [Crossref] [PubMed]
    » Crossref» PubMed
  • 3 World Health Organization (WHO); Ending the Neglect to Attain the Sustainable Development Goals: A Road Map for Neglected Tropical Diseases 2021-2030, https://www.who.int/ publications/i/item/9789240010352, accessed in February 2025.
    » https://www.who.int/ publications/i/item/9789240010352
  • 4 Marin-Neto, J. A.; Cunha-Neto, E.; Maciel, B. C.; Simões, M. V.; Circulation 2007, 115, 1109. [Crossref] [PubMed]
    » Crossref» PubMed
  • 5 Marin-Neto, J. A.; Rassi Jr., A.; Oliveira, G. M. M.; Correia, L. C. L.; Ramos, A. N.; Luquetti, A. O.; Hasslocher-Moreno, A. M.; de Sousa, A. S.; de Paola, A. A. V.; Sousa, A. C. S.; Ribeiro, A. L. P.; Filho, D. C.; de Souza, D. D. S. M.; Cunha-Neto, E.; Ramires, F. J. A.; Bacal, F.; Nunes, M. D. C. P.; Filho, M. M.; Scanavacca, M. I.; Saraiva, R. M.; de Oliveira, W. A.; Lorga-Filho, A. M.; Guimarães, A. D. J. B. A.; Braga, A. L. L.; de Oliveira, A. S.; Sarabanda, A. V. L.; das Neves Pinto, A. Y.; do Carmo, A. A. L.; Schmidt, A.; da Costa, A. R.; Ianni, B. M.; Filho, B. M.; Rochitte, C. E.; Macêdo, C. T.; Mady, C.; Chevillard, C.; das Virgens, C. M. B.; de Castro, C. N.; de Carvalho Britto, C. F. D. P.; Pisani, C.; do Carmo Rassi, D.; Filho, D. C. S.; de Almeida, D. R.; Bocchi, E. A.; Mesquita, E. T.; Mendes, F. S. N. S.; Gondim, F. T. P.; da Silva, G. M. S.; de Lima Peixoto, G.; de Lima, G. G.; Veloso, H. H.; Moreira, H. T.; Lopes, H. B.; Pinto, I. M. F.; Ferreira, J. M. B. B.; Nunes, J. P. S.; Barreto-Filho, J. A. S.; Saraiva, J. F. K.; Lannes-Vieira, J.; Oliveira, J. L. M.; Armaganijan, L. V.; Martins, L. C.; Sangenis, L. H. C.; Barbosa, M. P. T.; Almeida-Santos, M. A.; Simões, M. V.; Yasuda, M. A. S.; Moreira, M. D. C. V.; de Lourdes Higuchi, M.; de Cassia Costa Monteiro, M. R.; Mediano, M. F. F.; Lima, M. M.; de Oliveira, M. T.; Romano, M. M. D.; de Araujo, N. N. S. L.; Medeiros, P. T. J.; Alves, R. V.; Teixeira, R. A.; Pedrosa, R. C.; Aras, R.; Torres, R. M.; Povoa, R. M. S.; Rassi, S. G.; Alves, S. M. M.; Tavares, S. B. N.; Palmeira, S. L.; da Silva, T. L.; da Rocha Rodrigues, T.; Madrini, V.; da Costa Brant, V. M.; Dutra, W. O.; Dias, J. C. P.; Arq. Bras. Cardiol. 2023, 120, e20230269. [Crossref] [PubMed]
    » Crossref» PubMed
  • 6 Okello, I.; Mafie, E.; Eastwood, G.; Nzalawahe, J.; Mboera, L. E. G.; J. Med. Entomol. 2022, 59, 1099. [Crossref] [PubMed]
    » Crossref» PubMed
  • 7 World Health Organization (WHO); Trypanosomiasis, human African (sleeping sickness), https://www.who.int/news-room/ fact-sheets/detail/trypanosomiasis-human-african-(sleeping-sickness), accessed in January 2025.
    » https://www.who.int/news-room/ fact-sheets/detail/trypanosomiasis-human-african-(sleeping-sickness)
  • 8 Franco, J. R.; Priotto, G.; Paone, M.; Cecchi, G.; Ebeja, A. K.; Simarro, P. P.; Sankara, D.; Metwally, S. B. A.; Argaw, D. D.; PLoS Neglected Trop. Dis. 2024, 18, e0012111. [Crossref] [PubMed]
    » Crossref» PubMed
  • 9 Bernhard, S.; Kaiser, M.; Burri, C.; Mäser, P.; Diseases 2022, 10, 90. [Crossref]
    » Crossref
  • 10 Barrett, M. P.; Priotto, G.; Franco, J. R.; Lejon, V.; Lindner, A. K.; PLoS Neglected Trop. Dis. 2024, 18, e0012091. [Crossref] [PubMed]
    » Crossref» PubMed
  • 11 Pereira-Silva, F. S.; de Mello, M. L. B. C.; de Araújo-Jorge, T. C.; Cienc. Saude Coletiva 2022, 27, 1939. [Crossref] [PubMed]
    » Crossref» PubMed
  • 12 Ramos, L. G.; de Souza, K. R.; Sales Jr., P. A.; Câmara, C. C.; Castelo-Branco, F. S.; Boechat, N.; Carvalho, S. A.; Acta Trop. 2024, 256, 107264. [Crossref] [PubMed]
    » Crossref» PubMed
  • 13 Schijman, A. G.; Alonso-Padilla, J.; Britto, C.; Herrera Bernal, C. P.; Lancet Reg. Health 2024, 36, 100821. [Crossref]
    » Crossref
  • 14 Urbina, J. A.; J. Eukaryot. Microbiol. 2015, 62, 149. [Crossref] [PubMed]
    » Crossref» PubMed
  • 15 Cristovão-Silva, A. C.; Brelaz-De-Castro, M. C. A.; Lima Leite, A. C.; Alves Pereira, V. R.; Hernandes, M. Z.; Front. Pharmacol. 2019, 10, 873. [Crossref]
    » Crossref
  • 16 García-Huertas, P.; Cardona-Castro, N.; Biomed. Pharmacother. 2021, 142, 112020. [Crossref] [PubMed]
    » Crossref» PubMed
  • 17 Jamabo, M.; Mahlalela, M.; Edkins, A. L.; Boshoff, A.; Int. J. Mol. Sci. 2023, 24, 12529. [Crossref]
    » Crossref
  • 18 Cobre, A. F.; Ara, A.; Alves, A. C.; Maia Neto, M.; Fachi, M. M.; Beca, L. S. A. B.; Tonin, F. S.; Pontarolo, R.; Chemom. Intell. Lab. Syst. 2024, 250, 105145. [Crossref]
    » Crossref
  • 19 Liu, K.; Chen, X.; Ren, Y.; Liu, C.; Lv, T.; Liu, Y.; Zhang, Y.; Chem.-Biol. Interact. 2022, 368, 110239. [Crossref] [PubMed]
    » Crossref» PubMed
  • 20 Parvatikar, P. P.; Patil, S.; Khaparkhuntikar, K.; Patil, S.; Singh, P. K.; Sahana, R.; Kulkarni, R. V.; Raghu, A. V.; Antiviral Res. 2023, 220, 105740. [Crossref] [PubMed]
    » Crossref» PubMed
  • 21 Klambauer, G.; Hochreiter, S.; Rarey, M.; J. Chem. Inf. Model. 2019, 59, 945. [Crossref] [PubMed]
    » Crossref» PubMed
  • 22 Dara, S.; Dhamercherla, S.; Jadav, S. S.; Babu, C. M.; Ahsan, M. J.; Machine Learning in Drug Discovery: A Review; Springer: Netherlands, 2022.
  • 23 Huang, K.; Xiao, C.; Glass, L. M.; Critchlow, C. W.; Gibson, G.; Sun, J.; Patterns 2021, 2, 100328. [Crossref]
    » Crossref
  • 24 Quazi, S.; Med. Oncol. 2022, 39, 120. [Crossref] [PubMed]
    » Crossref» PubMed
  • 25 Zhao, L.; Ciallella, H. L.; Aleksunes, L. M.; Zhu, H.; Drug Discovery Today 2021, 25, 1624. [Crossref]
    » Crossref
  • 26 Kabir, A.; Muth, A.; Pharmacol. Res. 2022, 176, 106055. [Crossref] [PubMed]
    » Crossref» PubMed
  • 27 Ryszkiewicz, P.; Malinowska, B.; Schlicker, E.; Pharmacol. Rep. 2023, 75, 755. [Crossref] [PubMed]
    » Crossref» PubMed
  • 28 Shim, J.; MacKerell, A. D.; MedChemComm 2011, 2, 356. [Crossref]
    » Crossref
  • 29 Porta, E. O. J.; Kalesh, K.; Steel, P. G.; Front. Pharmacol. 2023, 14, 1233253. [Crossref]
    » Crossref
  • 30 Sterkel, M.; Haines, L. R.; Casas-Sánchez, A.; Adung’a, V. O.; Vionette-Amaral, R. J.; Quek, S.; Rose, C.; dos Santos, M. S.; Escude, N. G.; Ismail, H. M.; Paine, M. I.; Barribeau, S. M.; Wagstaff, S.; MacRae, J. I.; Masiga, D.; Yakob, L.; Oliveira, P. L.; Acosta-Serrano, Á.; PLoS Biol. 2021, 19, e3000796. [Crossref] [PubMed]
    » Crossref» PubMed
  • 31 Jupyter Notebooks, version 0.5.0, 2021-2022, https://jupyter. org/try-jupyter/lab/index.html, accessed in February 2025.
    » https://jupyter. org/try-jupyter/lab/index.html
  • 32 Zdrazil, B.; Felix, E.; Hunter, F.; Manners, E. J.; Blackshaw, J.; Corbett, S.; de Veij, M.; Ioannidis, H.; Lopez, D. M.; Mosquera, J. F.; Magarinos, M. P.; Bosc, N.; Arcila, R.; Kizilören, T.; Gaulton, A.; Bento, A. P.; Adasme, M. F.; Monecke, P.; Landrum, G. A.; Leach, A. R.; NucleicAcids Res. 2023, 52, 1180. [Crossref]
    » Crossref
  • 33 Tingle, B. I.; Tang, K. G.; Castanon, M.; Gutierrez, J. J.; Khurelbaatar, M.; Dandarchuluun, C.; Moroz, Y. S.; Irwin, J. J.; J. Chem. Inf. Model. 2023, 63, 1166 [Crossref]; ZINC, https://zinc.docking.org/, accessed in January 2025.
    » Crossref» https://zinc.docking.org/
  • 34 Lipinski, C. A.; Adv. Drug Delivery Rev. 2016, 101, 34. [Crossref]
    » Crossref
  • 35 Lipinski, C. A.; Drug Discovery 2004, 337. [Crossref]
    » Crossref
  • 36 Yap, C. W.; J. Comput. Chem. 2010, 32, 1466.[Crossref] [PubMed]
    » Crossref» PubMed
  • 37 Pavlovi, N.; Sopta, N. M.; Mitrovic, D.; Zaklan, D.; Petrovic, A. T.; Stilinovic, N.; Vukmirovic, S.; Int. J. Mol. Sci. 2024, 25, 192. [Crossref]
    » Crossref
  • 38 Jolliffe, I. T.; Cadima, J.; Cadima, J.; Philos. Trans. R. Soc., A 2016, 374, 1. [Crossref]
    » Crossref
  • 39 Katsuno, K.; Burrows, J. N.; Duncan, K.; Van Huijsduijnen, R. H.; Kaneko, T.; Kita, K.; Mowbray, C. E.; Schmatz, D.; Warner, P.; Slingsby, B. T.; Nat. Rev. Drug Discovery 2015, 14, 751. [Crossref] [PubMed]
    » Crossref» PubMed
  • 40 Singh, V.; Pencina, M.; Einstein, A. J.; Liang, J. X.; Berman, D. S.; Slomka, P.; Sci. Rep. 2021, 11, 14490. [Crossref] [PubMed]
    » Crossref» PubMed
  • 41 Xu, Y.; Goodacre, R.; J. Anal. Test. 2018, 2, 249. [Crossref]
    » Crossref
  • 42 Lind, A. P.; Anderson, P. C.; PLoS One 2019, 14, e0219774. [Crossref]
    » Crossref
  • 43 Yu, T.; Huang, T.; Yu, L.; Nantasenamat, C.; Anuwongcharoen, N.; Piacham, T.; Ren, R.; Chiang, Y.; Molecules 2023, 28, 1679. [Crossref]
    » Crossref
  • 44 Singh, S.; Kumar, R.; Payra, S.; Singh, S. K.; Cureus 2023, 15, e44359. [Crossref]
    » Crossref
  • 45 Boldini, D.; Grisoni, F.; Kuhn, D.; Friedrich, L.; Sieber, S. A.; J. Cheminform. 2023, 15, 73. [Crossref]
    » Crossref
  • 46 Sikander, R.; Ghulam, A.; Ali, F.; Sci. Rep. 2022, 12, 5505. [Crossref]
    » Crossref
  • 47 Wu, J.; Kong, L.; Yi, M.; Chen, Q.; Cheng, Z.; Zuo, H.; Yang, Y.; Comput. Intell. Neurosci. 2022, 2022, 14. [Crossref]
    » Crossref
  • 48 Dutta, D.; Guha, R.; Wild, D.; Chen, T.; J. Chem. Inf. Model. 2007, 47, 989. [Crossref]
    » Crossref
  • 49 Awal, M. A.; Masud, M.; Hossain, M. S.; Bulbul, A. A. M.; Mahmud, S. M. H.; Bairagi, A. K.; IEEEAccess 2021, 9, 10263. [Crossref]
    » Crossref
  • 50 Azar, A. S.; Samimi, T.; Tavassoli, G.; Naemi, A.; Rahimi, B.; Hadianfard, Z.; Wiil, U. K.; Nazarbaghi, S.; Bagherzadeh Mohasefi, J.; Lotfnezhad Afshar, H.; Eur. J. Med. Res. 2024, 29, 547. [Crossref] [PubMed]
    » Crossref» PubMed
  • 51 Yang, K.; Liu, L.; Wen, Y.; Sci. Rep. 2024, 14, 3948. [Crossref] [PubMed]
    » Crossref» PubMed
  • 52 Jung, Y.; Hu, J.; J. Nonparametr. Stat. 2015, 27, 167. [Crossref]
    » Crossref
  • 53 Lundberg, S. M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J. M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I.; Nature Mach. Intell. 2020, 2, 56. [Crossref]
    » Crossref
  • 54 Tang, T.; Song, D.; Chen, J.; Chen, Z.; Du, Y.; Dang, Z.; Lu, G.; Processes 2024, 12, 384. [Crossref]
    » Crossref
  • 55 Toplak, M.; Mocnik, R.; Polajnar, M.; Bosnic, Z.; Carlsson, L.; Hasselgren, C.; Demsar, J.; Boyer, S.; Zupan, B.; Stalring, J.; J. Chem. Inf. Model. 2014, 54, 431. [Crossref]
    » Crossref
  • 56 Neves, B. J.; Moreira-Filho, J. T.; Silva, A. C.; Borba, J. V. V. B.; Mottin, M.; Alves, V. M.; Brag, R. C.; Muratov, E. N.; Andrade, C. H.; J. Braz.. Chem. Soc. 2021, 32, 110. [Crossref]
    » Crossref
  • 57 Ramsay, R. R.; Nikolic, M. R. P.; Nikolic, K.; Uliassi, E.; Bolognesi, M. L.; Clin. Transl. Med. 2018, 7, e3. [Crossref]
    » Crossref
  • 58 Zimmermann, G. R.; Lehár, J.; Keith, C. T.; Drug Discovery Today 2007, 12, 34. [Crossref] [PubMed]
    » Crossref» PubMed
  • 59 Castillo-Garit, J. A.; Abad, C.; Rodríguez-Borges, J. E.; Marrero-Ponce, Y.; Torrens, F.; Curr. Top. Med. Chem. 2012, 12, 852. [Crossref] [PubMed]
    » Crossref» PubMed
  • 60 Zanni, R.; Gálvez-Llompart, M.; Gálvez, J.; García-Domenech, R.; Curr. Comput. Aided. Drug Des. 2014, 10, 129. [Crossref] [PubMed]
    » Crossref» PubMed
  • 61 Tropsha, A.; Isayev, O.; Varnek, A.; Schneider, G.; Cherkasov, A.; Nat. Rev. Drug Discovery 2024, 23, 141. [Crossref]
    » Crossref
  • 62 Kryshchyshyn, A.; Devinyak, O.; Kaminskyy, D.; Grellier, P.; Lesyk, R.; Mol. Inf. 2017, 37, 1700078. [Crossref]
    » Crossref
  • 63 Gawriljuk, V. O.; Zin, P. P. K.; Puhl, A. C.; Zorn, K. M.; Foil, D. H.; Lane, T. R.; Hurst, B.; Tavella, T. A.; Costa, F. T. M.; Lakshmanane, P.; Bernatchez, J.; Godoy, A. S.; Oliva, G.; Siqueira-Neto, J. L.; Madrid, P. B.; Ekins, S.; J. Chem. Inf. Model. 2021, 61, 4224. [Crossref]
    » Crossref
  • 64 Van Tilborg, D.; Alenicheva, A.; Grisoni, F.; J. Chem. Inf. Model. 2022, 62, 5938. [Crossref] [PubMed]
    » Crossref» PubMed
  • 65 Wang, H.; Liang, Q.; Hancock, J. T.; Khoshgoftaar, T. M.; J. Big Data 2024, 11, 44. [Crossref]
    » Crossref
  • 66 Giuliani, A.; Drug Discovery Today 2017, 22, 1069. [Crossref]
    » Crossref
  • 67 De Rycker, M.; Wyllie, S.; Horn, D.; Read, K. D.; Gilbert, I. H.; Nat. Rev. Microbiol. 2023, 21, 35. [Crossref] [PubMed]
    » Crossref» PubMed
  • 68 de Koning, H. P.; Trop. Med. Infect. Dis. 2020, 5, 14. [Crossref]
    » Crossref
  • 69 Suski, J. M.; Braun, M.; Strmiska, V.; Sicinski, P.; Cancer Cell 2021, 39, 759. [Crossref]
    » Crossref
  • 70 Zhong, L.; Li, Y.; Xiong, L.; Wang, W.; Wu, M.; Yuan, T.; Yang, W.; Tian, C.; Miao, Z.; Wang, T.; Yang, S.; Signal Transduction Targeted Ther. 2021, 6, 201. [Crossref]
    » Crossref
  • 71 Reimão, J. Q.; Miguel, D. C.; Taniwaki, N. N.; Trinconi, C. T.; Yokoyama-Yasunaka, J. K. U.; Uliana, S. R. B.; PLoS Negl. Trop. Dis. 2014, 8, e2842. [Crossref] [PubMed]
    » Crossref» PubMed
  • 72 Lepesheva, G. I.; Waterman, M. R.; Curr Top Med Chem. 2011, 11, 2060. [Crossref]
    » Crossref
  • 73 da Silva Santos-Júnior, P. F.; Schmitt, M.; de Araújo-Júnior, J. X.; da Silva-Júnior, E. F.; Curr. Top. Med. Chem. 2021, 21, 1900. [Crossref] [PubMed]
    » Crossref» PubMed
  • 74 Talevi, A.; Bellera, C. L.; Expert Opin. Drug Discovery 2020, 15, 397. [Crossref]
    » Crossref
  • 75 Reigada, C.; Sayé, M.; Valera-Vera, E.; Miranda, M. R.; Pereira, C. A.; Heliyon 2019, 5, e01947. [Crossref]
    » Crossref
  • 76 Planer, J. D.; Hulverson, M. A.; Arif, J. A.; Ranade, R. M.; Don, R.; Buckner, F. S.; PLoS Negl. Trop. Dis. 2014, 8, e2977. [Crossref] [PubMed]
    » Crossref» PubMed
  • 77 Silva-Jardim, I.; Thiemann, O. H.; Anibal, F. F.; J. Braz. Chem. Soc. 2014, 25, 1810. [Crossref]
    » Crossref
  • 78 Engels, D.; Zhou, X.; Infect. Dis. Poverty 2020, 9, 10. [Crossref] [PubMed]
    » Crossref» PubMed
  • 79 Bhattacharya, A.; Corbeil, A.; Do Monte-Neto, R. L.; Fernandez-Prada, C.; Genes 2020, 11, 722. [Crossref] [PubMed]
    » Crossref» PubMed
  • 80 Tanoli, Z.; Vähä-koskela, M.; Aittokallio, T.; Expert Opin. Drug Discovery 2021, 16, 977. [Crossref]
    » Crossref

Edited by

  • Editor handled this article:
    Paula Homem-de-Mello (Associate)

Publication Dates

  • Publication in this collection
    21 Mar 2025
  • Date of issue
    2025

History

  • Received
    15 Aug 2024
  • Accepted
    13 Feb 2025
location_on
Sociedade Brasileira de Química Instituto de Química - UNICAMP, Caixa Postal 6154, 13083-970 Campinas SP - Brazil, Tel./FAX.: +55 19 3521-3151 - São Paulo - SP - Brazil
E-mail: office@jbcs.sbq.org.br
rss_feed Stay informed of issues for this journal through your RSS reader
Accessibility / Report Error