A Study of Neolignan Compounds with Biological Activity Against Paracoccidioides brasiliensis by Using Quantum Chemical and Chemometric Methods

Métodos quimiométricos (estatísticos) são empregados para classificar um conjunto de compostos derivados de neolignanas com atividade biológica contra a Paracoccidioides brasiliensis. O método AM1 (Austin Model 1) foi utilizado para calcular um conjunto de descritores moleculares (propriedades) para os compostos em estudo. A seguir, os descritores foram analisados utilizando os seguintes métodos de reconhecimento de padrões: Análise de Componentes Principais (PCA), Análise Hierárquica de Agrupamentos (HCA) e o método de K-vizinhos mais próximos (KNN). Os métodos PCA e HCA mostraram-se bastante eficientes para classificação dos compostos estudados em dois grupos (ativos e inativos). Três descritores moleculares foram responsáveis pela separação entre os compostos ativos e inativos: energia do orbital molecular mais alto ocupado (E HOMO ), ordem de ligação entre os átomos C 1’ -R 7 (L 14 ) e ordem de ligação entre os átomos C 5’ -R 6 (L 22 ). Como as variáveis responsáveis pela separação entre compostos ativos e inativos são descritores eletrônicos, conclui-se que efeitos eletrônicos podem desempenhar um importante papel na interação entre receptor biológico e compostos derivados de neolignanas com atividade contra a Paracoccidioides brasiliensis.


Introduction
Paracoccidioides brasiliensis is a human pathogenic fungus that constitutes a major medical problem in Latin America.The microorganism is the etiological agent of paracoccioidomycosis (PCM), a systemic disease with a high incidence among the rural population of Latin America. 1 This disease is usually chronic and involves several organs and systems with predominance in the lungs which are considered as the primary site of the infection. 2ffective treatment regimens are available to control the infectious process and most patients (60-80%) develop fibrotic sequelae that may severely hamper respiratory functions and limit the patient well-being. 2 Antifungal medications are the mainstay of treatment for PCM because their mechanism of action may involve an alteration of RNA or DNA metabolism or an intracellular accumulation of peroxide that is toxic to the fungal cell. 3ome experimental and clinical investigations have indicated the relevance of humoral and/or cellular immune responses in the pathogenesis and evolution of PCM.Specific cell-mediated immune responses seem to play an important role in the resistance to P. brasiliensis.Patients with systemic PCM tend to show depressed cellular immune responses compared to those with localized disease.Also, the most severe forms of infection are associated with high levels of specific antibodies. 4everal compounds are used for inhibition of the P. brasiliensis and the neolignan compounds have been used for this purpose. 5Neolignans are organic dimers derived from oxidative coupling of allyl and propenyl phenols that occur in the Myristicaceae [6][7][8] and other primitive plant families.0][11] The resin of several Virola supplies a hallucinogen powder used in rituals, 12 and the genus Virola is a source of lignans and neolignans with recognized bioactivity. 13he aim of the present work is to investigate the relationship between molecular properties and the activity against P. brasiliensis of synthetic neolignans and analogues by using chemometric methods.

Quantum chemical analysis
Neolignans are molecules that present several degrees of rotation with the possibility of attaining many geometric conformations (Figure 1 shows the general structure and numbering used in the neolignan compounds studied).In the absence of crystallographic structures, it is necessary to carry out a conformational search in order to find the conformation of the lowest energy.In this work we used a careful procedure to obtain the molecular conformation associated with the lowest total energy.Initially a molecular mechanics (MM) conformational search was carried out with the Tripos 5.2 force field 14 by using the software Spartan 5.0. 15Next, the final molecular conformation of each compound studied was attained by using the AM1 semi-empirical method 16 employing the molecular package AMPAC 5.0. 17The molecular geometries were fully optimized by using the Precise keyword (which increases the precision of the calculations) and after the initial conformational study, the molecular properties (variables) were calculated.The chemical structures of the 11 neolignan compounds studied in this work are present in Figure 2.  Molecular properties of chemical compounds are usually correlated with biological activity.3][24] The SAR methods have been used with success in pharmaceutical applications, 25 and in this work we calculated the following molecular properties to be correlated with the biological activity under study: log P: the values of this property were obtained from the hydrophobic parameters of the substituents by using the molecular package Spartan 5.0; 26 molecular surface area (A) and volume (V): properties evaluated with the molecular package HyperChem 5.0; 27 partial atomic charges (Q n ) and bond orders (L n ): derived from the NBO analysis; 28,29 energy of the HOMO (E HOMO ) and LUMO (E LUMO ) frontier orbitals; hardness (η): obtained from the equation ; Mulliken electronegativity (χ): calculated from the equation ; other electronic properties were calculated: total energy (E T ), heat of formation (∆H f ); ionization potential (IP), dipole moment (µ) and polarizability (α).These values were obtained from the molecular package Ampac 5.0; 17 dihedral (D n ) and bond (A n ) angles: obtained also with the Ampac 5.0 program.

Chemometric analyses
The correlation between molecular properties and biological activity was done by using the following pattern recognition methods employing the computational package Pirouette. 30rincipal component analysis (PCA).PCA is a multivariate statistical technique and its central idea is to reduce the dimensionality of a data set (training set) that presents a large number of interrelated variables, while retaining as much as possible the variation of the data set.This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated and ordered so that the first PC retains most of the variation present in all of the original variables. 19ierarchical Cluster Analysis (HCA).This technique examines the distances between the samples in a data set and represents this information as a two-dimensional plot called dendrogram.The HCA method is an excellent tool for preliminary data analysis.It is informative to examine the dendrogram in conjunction with PCA as they give similar information in different forms.In HCA each point forms an only cluster initially and then the similarity matrix is analyzed.The most similar points are grouped forming one cluster and the process is repeated until all points belong to an only group. 20,31-Nearest Neighbor Analysis (KNN).The KNN method classifies a new object (compound) according to its distance to an object of the training set.The closest neighbors of the training set are found and the object will be assigned into the class that has the majority of its nearest neighbors.This method is self-validating because in the training set each sample (object) is compared with all of the objects in the set but not with itself.The best value of K can be chosen based on the results from the training set alone.32 The classical KNN approach does not have outlier detection capability, i.e. a classification is always made whether or not the unknown is a member of any class in the training set.

Pre-processing of the molecular descriptors
Before applying the pattern recognition methods to the 11 compounds under study each calculated property (variable or descriptor) was autoscaled.In the autoscaling method each variable is scaled to a mean of zero and a variance of unity.This method is very important because each variable is equally weighted and this provides a measure of the ability of a descriptor to discriminate classes of compounds. 32With this method we can compare all of the variables at the same level although presenting different units.

PCA and HCA analyses
After several analyses, the best separation was obtained by using the following variables: E HOMO , L 14 and L 22 , whose calculated values are presented in Table 1.This suggests that the other variables are not significant for the classification of the compounds studied.
The PCA results show that the first component (PC 1 ) is responsible for 83.48% of the variance of the data.Considering the first (PC 1 ) and second (PC 2 ) components, the accumulated variance increases to 96.07%.Figure 3 shows that PC 1 is in fact responsible for the discrimination between active (1, 2 and 4) and inactive (3, 5, 6, 7, 8, 9, 10 and 11) compounds.Equation (1) presents the loading values of each variable in the PC1, which is responsible for the discrimination between active and inactive compounds (1)   Figure 3 shows the separation of the training set of molecules into two groups: active and inactive molecules against P. brasiliensis when we used the variables E HOMO , L 14 and L 22 to obtain the separation.From Figure 3 we can see that the active compounds present negative values for PC 1 while the inactive compounds present positive values for PC 1 .
From equation ( 1) we can see that for a compound to become active it needs to present large and negative values for the highest occupied molecular orbital energy (E HOMO ) along with small values for the bond order between C 1' and R 7 (L 14 ) atoms and the bond order between C 5' and R 6 (L 22 ) atoms.
It is also interesting to notice that the variables responsible for the separation between active and inactive compounds, i.e.E HOMO , L 14 and L 22 , are all electronic variables.Therefore we can conclude that electronic effects should have an important role when one is trying to understand the activity of neolignan compounds against P. brasiliensis.
The energies of the frontier orbitals are important properties in several chemical and pharmacological processes.The reason for this is the fact that these properties give information on the electron donating and electron accepting character of a compound and, consequently, on the formation of a charge transfer complex (CTC).4][35][36][37][38] From these definitions, we have that: (a) the greater the E HOMO , the greater the electron donating capability; (b) the smaller the E LUMO , the smaller the resistance to accept electrons. 39For the active compounds studied in this work we can say that their E HOMO must present negative values whereas the inactive compounds must present positive values.This means that the inactive compounds are more efficient electron donor compounds than the active ones, i.e. the inactive compounds may interact through charge transfer mechanism with some compounds before reaching the biological receptor.
Regarding the bond order descriptor we can define it as half of the difference among electrons in bonding and anti-bonding molecular orbitals.The greater the bond order, the greater the dissociation energy and the smaller the bond length.Thus, for the active compounds studied we can conclude that some groups on the C 1' and C 5' positions are required so that the electronic density between C 1' -R 7 and C 5' -R 6 presents a low value and, consequently, a low bond order.So, the smaller the L 14 (C 1' -R 7 ) and L 22 (C 5' -R 6 ) bond orders, the greater the possibility of interaction between the compounds studied and the biological receptor.Figure 4 illustrates the clear separation between active and inactive compounds by using the bond orders obtained in this work (L 14 and L 22 ), confirming the importance of these descriptors in the discrimination of neolignan compounds with biological activity against P. brasiliensis.
The HCA results are abridged in the dendrogram showed in Figure 5. From this dendrogram we can notice that the similarity observed between the group of the active (A) and inactive (B) molecules is small and this means that these two classes are distinct.
The HCA and PCA methods are complementary and for the 11 neolignan compounds studied in this work the  HCA and PCA results were similar, i.e.HCA and PCA classified the 11 neolignan compounds exactly in the same way.

KNN analysis
The KNN method was used for the validation of the initial data set and Table 2 presents the results obtained with one (1NN), three (3NN) and five (5NN) nearest neighbors.For the case of 1NN, the percentage of correct information was 100% while for 3NN and 5NN the percentage decreased considerably (81.8 and 72.7%, respectively).Thus, we can see that the use of 1NN leads to a higher percentage of correct information.
The KNN results were also similar to those obtained with the PCA and HCA methods.The outcomes obtained with the three classification methods (PCA, HCA and KNN) were quite interesting as we had 100% of success in classifying a data set using these three methods.

Conclusions
The application of three pattern recognition methods (PCA, HCA and KNN) on neolignan compounds with activity against P. brasiliensis showed that the compounds studied in this work can be correctly classified into two groups: active and inactive molecules.The PCA results showed that the variables E HOMO , L 14 and L 22 are responsible for the separation between the active and inactive compounds.The HCA and KNN results were similar to those obtained with PCA, i.e. both methods classified the neolignan compounds exactly in same way as PCA.From the results obtained with the three chemometric (statistical) methods (PCA, HCA and KNN), we can see that the variables responsible for the separation between active and inactive compounds, i.e.E HOMO , L 14 and L 22 , are all electronic variables.Therefore we can conclude that electronic effects should have an important role when one is trying to understand the activity of neolignan compounds against P. brasiliensis.

Figure 1 .
Figure 1.General structure and numbering used in the neolignan compounds studied.

Figure 3 .
Figure 3.The separation of the training set into two groups: active and inactive compounds.Notice that the first component (PC 1 ) is responsible for the separation.

Figure 5 .
Figure 5. Dendrogram obtained with the HCA method.The training set was classified into two groups: A (active compounds) and B (inactive compounds).

Figure 4 .
Figure 4. Plot of L 14 versus L 22 which illustrates the discrimination between active and inactive neolignan compounds.

Table 1 .
The three most important properties (descriptors) that classified the 11 compounds of the training set as active and inactive molecules

Table 2 .
Classification obtained with the KNN method