Logomarca do periódico: Journal of the Brazilian Chemical Society

Open-access Journal of the Brazilian Chemical Society

Publication of: Sociedade Brasileira de Química
Area: Exact And Earth Sciences
ISSN printed version: 0103-5053
ISSN online version: 1678-4790
Creative Common - by 4.0

Table of contents

Journal of the Brazilian Chemical Society, Volume: 36, Issue: 8, Published: 2025

Journal of the Brazilian Chemical Society, Volume: 36, Issue: 8, Published: 2025

Document list
Review
Recent Advances in Natural Language Processing in Chemistry and Materials Science Prati, Ronaldo Cristiano

Abstract in English:

Natural Language Processing (NLP) in chemistry and materials science enables computers to understand, analyze, and generate human-readable output related to chemical concepts and materials. With the latest advancements in NLP, text processing at a near-human level has become possible in various tasks. Large Language Models (LLMs) have demonstrated exceptional proficiency in text generation, leading to the redefinition of numerous specific NLP tasks as text generation problems. This review explores the recent progress in applying LLMs to specialized domains, such as chemistry and materials science. It was discussed how LLMs overcome limitations of traditional NLP methods (such as rigid rule-based systems and shallow statistical models) by enabling context-aware interpretation of unstructured literature, flexible entity recognition (e.g., compounds, reactions), and generative tasks. Using the capabilities of LLMs, researchers in these fields can benefit from enhanced text processing, more accurate information extraction, and improved understanding of complex chemical concepts, making it a pivotal tool for accelerating discovery in chemically complex spaces, paving the way for novel tasks such as reaction prediction and molecular design.
Full Paper
Computational Investigations on Inhibitors of Mycobacterium tuberculosis Shikimate Kinase: Machine Learning, Docking, Molecular Dynamics and Free Energy Calculations Santos, Anderson J. A. B. dos Netz, Paulo A.

Abstract in English:

Shikimate kinase emerges as an intriguing macromolecular target for the development of novel pharmaceutical agents for the treatment of tuberculosis. This study aimed to develop a neural network (NN) for the discovery of potential inhibitors of Mycobacterium tuberculosis shikimate kinase and to conduct molecular docking and molecular dynamics (MD) simulations. The NN model pointed out to a set of 810 molecules with anti-tuberculosis activity, wherein 86% of this set also demonstrated positive outcomes according to docking calculations. Among these, 54 molecules exhibited a docking score ranging from -9 to -9.8 kcal mol-1. Subsequently, a subset of molecules was selected for molecular dynamics studies and molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) calculations. Furthermore, it was possible to assess that the dataset with higher affinity shared a similar electronic profile, as evidenced by the analysis of global descriptors (electronic chemical potential, hardness, and electrophilicity). The molecules displaying the lowest Gibbs free energy (∆G)binding values, therefore the highest affinity, were identified as CHEMBL1229147, CHEMBL4095667, and CHEMBL120640.
Full Paper
Deep Reinforcement Learning and Structure-Based Approaches in the de novo Design of a New Potential Inhibitor of F13 Protein from Monkeypox Virus Alencar Filho, Edilson B. Oliveira Neto, Rosalvo F. Santos, Vanessa C. Ferreira, Allysson L. S.

Abstract in English:

Monkeypox (MPOX) is a zoonotic infectious disease caused by the monkeypox virus (MPXV) and has recently emerged as a significant concern for public health organizations globally. In 2022, the World Health Organization (WHO) reported thousands of laboratory confirmed cases, mobilizing the scientific community to control this phenomenon due to its emergency nature. Tecovirimat (TPOXX), a drug primarily recognized for the treatment of smallpox, has also been recommended for managing MPOX. It works by inhibiting the viral F13 protein (VP37), a critical component in the replication cycle of the virus. Some issues related to the possibility of drug resistance by the virus, the intrinsic chemical complexity of this molecule and the limited availability of therapeutic alternatives highlight the urgent need to explore and identify new effective compounds. In this paper, we propose the combination of modern machine learning techniques (deep reinforcement learning) with structure-based drug design (SBDD) approaches (molecular docking and dynamics) in the de novo design of molecular scaffolds with affinity for the F13 protein, lower structural complexity than TPOXX and easy synthetic accessibility, contributing to efforts in the search for therapeutic alternatives for MPOX.
Full Paper
Computational Modeling and Biological Evaluation of Benzophenone Derivatives as Antileishmanial Agents Farias, Bárbara F. Ferreira, Miller S. Miranda, Daniel O. Nunes, Tayná R. Pereira, Natália F. Espuri, Patrícia F. Januario, Jaqueline P. Colombo, Fábio A. Marques, Marcos J. Zanin, João L. B. Soares, Marisi G. Souza, Thiago B. de Carvalho, Diogo T. Chagas-Paula, Daniela A. Dias, Danielle F.

Abstract in English:

Leishmaniasis is a neglected tropical disease with limited therapeutic options characterized by high toxicity, adverse side effects, and growing resistance to existing treatments. In this study, machine learning (ML) methods were employed to design and evaluate benzophenone and xanthone derivatives as potential antileishmanial agents. A dataset of 73 compounds was curated, and Quantitative Structure-Activity Relationship (QSAR) models were developed using artificial neural networks (ANN), Random Forest (RF), and J48 decision tree classifiers. The ANN model achieved the highest accuracy (86.2%) in predicting antileishmanial activity, validated through in vitro assays. Among 14 newly synthesized benzophenones, compounds 5 and 7 demonstrated significant biological activity with inhibitory concentration 50 (IC50) values of 10.19 and 14.35 μM, respectively, and favorable selectivity indices compared to reference drugs pentamidine and amphotericin B. Structural analysis highlighted the importance of thiosemicarbazone and 4-methyl groups, alongside electronegative substituents at position 11, in enhancing activity. This study underscores the potential of computational tools to streamline the discovery of novel, effective, and selective antileishmanial agents.
Full Paper
Drug Repurposing for Trypanosomiasis: Using Machine Learning Models and Polypharmacology to Identify Multitarget Candidates Domingues, Karime Zeraik A. Cobre, Alexandre de F. Fachi, Mariana M. Lazo, Raul Edison L. Ferreira, Luana M. Pontarolo, Roberto

Abstract in English:

Chagas disease and African sleeping sickness are neglected tropical diseases (NTD) caused by Trypanosoma parasites, with current treatments facing challenges like toxicity and resistance. This study integrates machine learning and Quantitative Structure-Activity Relationship (QSAR) models to repurpose Food and Drug Administration (FDA)-approved drugs as potential treatments for these diseases. A dataset of 21,608 compounds with inhibitory activity against Trypanosoma cruzi and Trypanosoma brucei was analyzed using PubChem fingerprints. Random Forest and Extreme Gradient Boosting models were trained and applied to screen the ZINC-22 database for new therapeutic options. Posaconazole was predicted as the top candidate for multitarget activity against both Trypanosoma species, followed by pentamidine, a drug already approved for sleeping sickness. Additionally, 40 other drug candidates were identified by the models (pIC50 > 6 and coefficient of variation < 0.05), mainly antineoplastics (32%) and antifungals (19%). This approach demonstrates the potential of computational techniques in accelerating the discovery of drug candidates for neglected infectious diseases.
Full Paper
Discrimination between COVID-19 Positive and Negative Blood Sera Using an Unmodified Disposable Impedimetric Sensor and Multivariate Analysis Cruz, Ingrid G. B. L. Sales, Flávia R. P. Fragoso, Wallace D. Castellano, Lúcio R. C. Beltrão, Fabyan E. L. Cardoso, Talita N. Oliveira, Maísa S. de Lemos, Sherlan G.

Abstract in English:

The present study introduces a direct approach for classifying blood serum samples as either positive or negative for coronavirus disease (COVID-19) by associating the electrochemical impedance data of the sample with multivariate analysis. The hypothesis is that the systematic alterations in blood composition resulting from a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection give rise to a distinct impedance spectrum when infected serum is subjected to analysis. A total of 201 serum samples were analyzed using the gold standard method, reverse transcription-polymerase chain reaction (RT-PCR), which served to train and validate the classification models. Two variations of discriminant analysis (partial least squares discriminant analysis (PLS-DA) and principal component analysis-discriminant analysis (PCA-DA)) and a one-class modeling approach (soft independent modeling of class analogies (SIMCA)) were used to classify impedance data in different formats (as complex or real numbers). PCA-DA applied to imaginary impedance spectra was found to be the most effective strategy, achieving sensitivity, specificity, and precision of 94, 94, and 91%, respectively, with classification error rates as low as 6%. These findings are encouraging and could facilitate the development of an inexpensive and reliable screening method for COVID-19.
Full Paper
Machine Learning Prediction of the Most Intense Peak of the Absorption Spectra of Organic Molecules Souza, Rubens C. Duarte, Julio C. Goldschmidt, Ronaldo R. Borges Jr., Itamar

Abstract in English:

Accurate knowledge of electronic molecular properties of excited states is fundamental for understanding the behavior of functional materials for organic electronics and sensors. In this work, we focus on determining the properties of the most intense peak in the electronic absorption spectra of organic molecules. For this purpose, we employed the quantum chemistry QM-symex dataset, which has approximately 173,000 organic molecules and time-dependent density functional theory (TD-DFT) data of the first ten electronic absorption transitions. Each one is identified by its Cartesian coordinates. From data in the original QM-symex, we built a new dataset named QM symex-modif that contains molecules in simplified molecular input line entry system (SMILES) format and properties related to the main electronic transition. We then employed twenty machine learning (ML) algorithms to investigate oscillator strengths, excitation energies, transition orbitals, and the highest occupied molecular orbitals (HOMOs). As inputs for the ML algorithms, we used several chemical descriptors for each molecule generated in the RDKit tool employing the corresponding SMILES format. The generated input descriptors significantly improved the accuracy of the ML predictions for these key photophysical properties. Very good mean absolute errors (MAEs) were obtained for the test set composed of 45,056 molecules, namely, an MAE of 0.035 for oscillator strengths, 0.09 eV for excitation energies, 1.24 and 0.62 for the initial and final transition molecular orbital (MO) numbers (i.e., for each molecule, their position in the MO listing), respectively, and 0.014 for HOMO numbers, with coefficient of determination (R2) values consistently exceeding 0.94, thus demonstrating the accuracy of the models. Additionally, a Shapley additive explanation (SHAP) analysis was carried out to evaluate the importance of the input parameters for the investigated ML models. We found several interesting relationships involving the input parameters. In particular, molecular weight holds significant importance in our ML models for determining the target HOMO numbers and the transition orbitals.
Full Paper
Machine Learning to Treat Data for the Design and Improvement of Electrochemical Sensors: Application for a Cancer Biomarker Redín, Gisela Ibáñez Braz, Daniel C. Gonçalves, Débora Oliveira Jr., Osvaldo N.

Abstract in English:

Label-free immunosensors based on screen-printed carbon electrodes offer a promising platform for the detection of cancer biomarkers. Herein, we explore the use of machine learning techniques to improve the performance of these immunosensors. We evaluate the influence of various redox probes on the analytical response in detecting the cancer biomarker protein p53. Ascorbic acid (AA) was found as the optimal redox probe, exhibiting a sensitivity of 0.26 ng mL-1, attributed to its strong affinity to proteins through hydrogen bonds and electrostatic interactions. We also extracted analytical information from the voltammograms, such as shifts in peak potential and changes in peak width, to construct datasets for supervised machine learning. Using different algorithms including logistic regression, linear discriminant analysis, K-nearest neighbor, Gaussian Naive-Bayes, decision trees, and support vector machine, we identified positive samples spiked with p53 in artificial urine and saliva samples. Through a comparison of immunosensors with distinct molecular architectures, we determined the critical role of redox probe selection, which proves to be more significant than modifying the working electrodes in determining performance. Furthermore, immunosensors with inferior inherent detection ability can achieve comparable performance to those with superior analytical characteristics when feature selection and machine learning algorithms are applied to the voltammograms. These findings illustrate the significance of extracting additional information from differential pulse voltammograms beyond peak current intensity. Furthermore, using machine learning techniques allows one to design biosensors capable of distinguishing biomarkers even in complex samples.
Full Paper
Assessing Emissions of Biogenic Volatile Organic Compounds and Their Correlation with Abiotic Factors in an Atlantic Forest Reserve Using Supervised Learning Methods Figueiredo, Ana Paula S. Botelho, Junio R. Nascimento, Marcia Helena C. Canela, Maria Cristina Goodacre, Royston Filgueiras, Paulo R. Souza, Murilo O.

Abstract in English:

This study investigates the emissions of biogenic volatile organic compounds (BVOCs) in an Atlantic Forest fragment and examines their correlation with abiotic factors. Despite its rich biodiversity, the Atlantic Forest remains under threat and lacks extensive research on BVOC emissions. Using supervised learning methods (including traditional Partial Least Squares Discriminant Analysis (PLS-DA), PLS-DA with bootstrap resampling, and Support Vector Machine Ensemble (SVM ensemble) 10 BVOCs were analyzed and correlated with environmental variables. The results reveal emission patterns linked to abiotic conditions, with the models achieving high classification accuracy. The variables a-pinene, linalool, and isobornyl acetate contribute more significantly to evening/morning samples, whereas temperature and wind gusts contribute more to afternoon samples. The PLS-DA and SVM ensemble models demonstrated effective sample classification, with minimal misclassification errors. The PLS-DA bootstrap model enhanced the robustness and reliability of sample classifications, confirming the consistent and distinct differences between sample classes. To the best of our knowledge, this is the first study to apply the SVM ensemble method to BVOCs in air samples and their correlation with abiotic factors, highlighting its potential to enhance the understanding of atmospheric dynamics and inform conservation strategies for the Caparaó region and other similarly threatened biomes.
Full Paper
Quantum Active Learning for Structural Determination of Doped Nanoparticles - A Case Study of 4Al@Si11 Lourenço, Maicon Pierre Naseri, Mosayeb Herrera, Lizandra Barrios Zadeh-Haghighi, Hadi Gaur, Daya Simon, Christoph Salahub, Dennis R.

Abstract in English:

Active learning (AL) has been widely applied in chemistry and materials science. In this work, we propose a quantum active learning (QAL) method for automatic structural determination of doped nanoparticles, where quantum machine learning (QML) models for regression are used iteratively to indicate new structures to be calculated by Density Functional Theory (DFT) or Density Functional Based Tight Binding (DFTB) and this new data acquisition is used to retrain the QML models. The QAL method is implemented in the Quantum Machine Learning Software/Agent for Material Design and Discovery (QMLMaterial), whose aim is using an artificial agent (defined by QML regression algorithms) that chooses the next doped configuration to be calculated that has a higher probability of finding the optimum structure. The QAL uses a quantum Gaussian process with a fidelity quantum kernel as well as the projected quantum kernel and different quantum circuits. For comparison, classical AL was used with classical Gaussian process with different classical kernels. The presented QAL method was applied in the structural determination of doped Si11 with 4 Al (4Al@Si11) and the results indicate the QAL method is able to find the optimum 4Al@Si11 structure. The aim of this work is to present the QAL method, formulated in a noise-free quantum computing framework, for automatic structural determination of doped nanoparticles and materials defects.
Full Paper
QSAR-Lit: A No-Code Platform for Predictive QSAR Model Development - From Data Curation to Virtual Screening Sanches, Igor H. Feitosa, Francisco L. Lemos, Jade M. Silva-Mendonça, Sabrina Souza, Ester Cabral, Victoria F. Moreira-Filho, José T. Gil, Henric Neves, Bruno J. Braga, Rodolpho C. Borba, Joyce V. V. B. Andrade, Carolina H.

Abstract in English:

The development of predictive quantitative structure-activity relationship (QSAR) models using machine learning (ML) algorithms has become increasingly feasible due to the growing availability of chemical libraries with experimental data. These models can accelerate the drug discovery process and reduce failure rates by enabling data-driven decision-making. However, existing standalone software often lacks several critical components necessary for effective data preparation and modeling. Here, we introduce QSAR-Lit, an innovative, no-code, and comprehensive workflow designed for curating chemical and biological data, generating QSAR models, and performing virtual screening through an interactive Python-based Streamlit dashboard. The QSAR model development process begins with data curation, collecting and cleaning data on chemical structures and their biological activities. The next step is model building, where the curated data is used to train and optimize QSAR models. Finally, QSAR-Lit provides virtual screening, allowing QSAR models to predict the activity of new chemical structures. This application efficiently screens libraries of chemical compounds, assisting researchers in identifying and prioritizing potential candidates for further investigation.
Short Report
Effect of the Alkyl Side Chain of Antitrypanosomal Cinnamate, p-Coumarate, and Ferulate n-Alkyl Esters Using Multivariate Analysis and Computer-Aided Drug Design Silva, Matheus L. Baldim, João L. Costa-Silva, Thais A. Amaral, Maiara Romanelli, Maiara M. Levatti, Erica V. C. Tempone, Andre G. Lago, João Henrique G.

Abstract in English:

In the present work, three series of cinnamic (1), p-coumaric (2) and ferulic (3) esters containing different side-chains such as ethyl (1a-3a), n-propyl (1b-3b), n-butyl (1c-3c), n-pentyl (1d-3d), n-hexyl (1e-3e), and n-heptyl (1f-3f) were prepared, tested for activity against trypomastigote forms of the parasite Trypanosoma cruzi and toxicity against NCTC cells. Obtained results indicated that the presence of p-coumaric or ferulic moieties associated with C4-C7 linear side-chains play an important role in the bioactivity against T. cruzi since compounds 2c-2f and 3d-3f were found to be the most active derivatives with a half maximal effective concentration (EC50) value ranging from 12.8 to 1.7 μM, superior to that determined for the positive control benznidazole (EC50 = 16.4 μM). Additionally, machine learning and multivariate statistical analyses identified molecular features correlated with biological activity, emphasizing the importance of side-chain length and lipophilicity, highlighting the significance of the molecular structure of phenylpropanoid derivatives in the activity against T. cruzi.
location_on
Sociedade Brasileira de Química Instituto de Química - UNICAMP, Caixa Postal 6154, 13083-970 Campinas SP - Brazil, Tel./FAX.: +55 19 3521-3151 - São Paulo - SP - Brazil
E-mail: office@jbcs.sbq.org.br
rss_feed Stay informed of issues for this journal through your RSS reader
Accessibility / Report Error