A Structure-Activity Relationship ( SAR ) Study of Neolignan Compounds with Anti-schistosomiasis Activity

Um conjunto de dezoito compostos de neolignanas com atividade antiesquistossomose foi estudado com o método semi-empírico PM3 e outros métodos teóricos com o intuito de avaliar algumas propriedades (variáveis ou descritores) moleculares selecionadas e correlacioná-las com a atividade biológica. Análise exploratória dos dados (análise de componentes principais, PCA, e análise hierárquica de agrupamentos, HCA), análise discriminante (DA) e o método KNN foram utilizados na obtenção de possíveis correlações entre os descritores calculados e a atividade biológica em questão e na predição da atividade antiesquistossimose de algumas moléculas teste. Os descritores moleculares responsáveis pela separação entre os compostos ativos e inativos foram: energia de hidratação (HE), refratividade molecular (MR) e carga sobre o átomo C19 (Q 19 ). Estes descritores fornecem informações a respeito do tipo de interação que pode ocorrer entre os compostos e seu respectivo receptor biológico. Após a construção do modelo para compostos ativos e inativos, os métodos PCA, HCA, DA e KNN foram empregados em um estudo de predição. Foram estudados 10 novos compostos e somente 5 deles foram classificados como ativos contra esquistossomose.


Introduction
Neolignans are dimers obtained from oxidative coupling of allyl and propenyl phenols occurring in the Myristicaceae and other primitive plant families.The Virola is the most representative Myristicaceae found throughout the Americas. 1,2In 1970, initial studies of leaves of Virola surinamensis showed high efficacy in the cercariae blockage tests of Schistosoma mansoni in mice. 3he active substances responsible for protection were isolated and identified as the natural neolignans virolin and surinamensin.
In order to determine the biological activity of neolignans, Barata et al. 4 and Santos 5 synthesized eighteen analogues of neolignan compounds which were submitted to biological tests against fungi, bacteria, leishmaniasis, schistosomiasis, cancer and PAF (platelet activating factor). 6From the eighteen neolignan compounds synthesized, five have been classified as active and thirteen have been classified as inactive against schistosomiasis (all tests in vitro). 4,5n the present work we calculated selected molecular descriptors of the eighteen neolignan derivatives synthesized 4,5 and afterwards statistical methods (principal component analysis, PCA, hierarchical cluster analysis, HCA, and discriminant analysis, DA) were used in order to obtain the relationship between the molecular descriptors and the biological activity.The results obtained with PCA, HCA and DA were tested in a new set of neolignan compounds and the KNN method was used for activity prediction of these new compounds.The molecular descriptors calculated were selected so that some steric, electronic and hydrophobic characteristics of these compounds could be taken into account since each one of them can contribute to the biological activity and give information about the interactions between the compounds and their respective biological receptor.
All geometry calculations were performed by using the PM3 method 7 and the geometries were fully optimized by using the EF algorithm of the AMPAC 6.5 molecular package. 8mong many descriptors (variables) that have been utilized in SAR studies, [9][10][11] we have chosen the following descriptors to be evaluated: HOMO -The energy of the highest occupied molecular orbital (eV); LUMO -The energy of the lowest unoccupied molecular orbital (eV); χ -Mulliken's electronegativity (eV); POL -Molecular polarizability (a.u.); Q N -Net atomic charge on atom N; t -Torsional angle (see Figure 1); d -bond angle (see Figure 1); HE -Hydration energy (kcal mol -1 ); MR -Molecular refractivity (Å 3 ); VOL -Molecular volume (Å 3 ); Log P -Partition coefficient.
The calculated descriptors were selected so that they could represent electronic (HOMO, LUMO, χ, POL, Q 2 , Q 3 , Q 12 , Q 13 , Q 19 , MR and HE), steric (t, d and VOL) and hydrophobic (Log P) properties of the compounds studied.These properties (descriptors) are supposed to be important to explain the anti-schistosomiasis activity of the neolignan molecules under study here 9 and the number of calculated descriptors was limited by the softwares we used in the calculations.
The structural descriptors t and d were obtained during the optimization procedure and the most stable structures were used to obtain the other descriptors.The descriptors HOMO, LUMO, χ, POL and logP were obtained by using the HyperChem/Chemplus molecular package 12 and the atomic charges were obtained by employing the electrostatic potential method of the Spartan program. 13he electrostatic potential method is based on the calculation of a set of punctual atomic charges so that they represent the possible best quantum molecular (electrostatic) potential for a set of points defined around a molecule. 14,15The routine developed by Connolly 16 was employed and this methodology uses a density of 1 point per A 2 in four layers placed at distances 1.4, 1.6, 1.8 and 2.0 times the Van der Waals radii. 16The charges derived from the electrostatic potential method are physically more satisfactory than the Mulliken's charges, 17 especially when related to biological activity.
All the statistical analysis (PCA, HCA, DA and KNN) employed here were performed making use of the program MATLAB 6.0. 18

Results and Discussion
Principal component analysis (PCA) The central idea of PCA is to reduce the dimensionality of the data set explaining the variance-covariance structure.This is achieved by linear transformation of the original data set of variables into a smaller number of uncorrelated principal components (PCs).Geometrically, this transformation represents the rotation of the original coordinate system.The direction of the maximum residual variance is given by the first principal component axis.The second principal component, orthogonal to the first one, has the second maximum variance and so on.][21] Before applying the PCA method, each one of the variables was autoscaled so that they could be compared to each other on the same scale.Table 1 shows the correlation matrix for all calculated variables and it was initially obtained in order to eliminate the correlated variables.
After several attempts to obtain a good classification of the compounds (separation between the anti-schistosomiasis active and inactive compounds), the best separation was obtained with 3 variables (see Table 2) out of the 15 we had initially.This suggests that the other 12 variables are not important for classifying these compounds according to their anti-schistosomiasis activity.Table 3 shows the correlation matrix for the three variables used for the separation between active and inactive compounds.
The PCA results show that the first two principal components (PC1 and PC2) describe 84.63% of the overall variance as follows: PC1 = 67.11and PC2 = 17.52%.Since almost all of the variance is explained by the first two PCs, their score plot is a reliable representation of the spatial distribution of the points for the data set studied.
The score plots were examined and the most informative one is presented in Figure 3 by first principal component (PC1) against the second principal component (PC2).Table 4 shows the loading vectors for PC1 and PC2.Looking at Figure 3 we can see that the eighteen neolignan compounds studied are separated into two groups, A and B. Group A contains the active compounds (5, 6, 8, 9 and  17) and group B contains the inactive compounds (1, 2, 3,  4, 7, 10, 11, 12, 13, 14, 15, 16 and 18).Also from Figure 3, we can see that PC1 alone is responsible for the separation between the active and inactive compounds.Figure 4   From the equation above, we can see that active molecules can be obtained when we have higher values for MR combined with more positive charges on C19 and lower values for the variable HE (notice that HE is negative in the PC1 equation).These characteristics can be useful in the design of new neolignan compounds with effective anti-schistosomiasis activity.Here it is interesting to mention that the variables Q 19 , HE and MR are all electronic descriptors and represent the strength of a molecular association by electrostatic interaction.

Hierarchical cluster analysis (HCA)
Another technique very used for analyzing a complex data is the hierarchical cluster analysis (HCA).In the HCA, each object (the 18 molecules studied) is initially assumed to be a lone cluster.One similarity matrix is built, generally calculating the Euclidean distance between all the objects, and scanned for the minor values.The corresponding objects are clustered together and treated as a single cluster.Successive iterations lead to the total clustering of all objects generating a dendrogram with the objects clustered together according to their similarity level.
Figure 5 shows the results obtained from the HCA analysis.The horizontal lines in Figure 5 represent the compounds and the vertical lines the similarity values between pairs of compounds, a compound and a group of compounds and between groups of compounds.The similarity value between the two classes of compounds was 0.15 and this means these two classes are distinct.From Figure 5, we can see the HCA results are very similar to those obtained with the PCA analysis, i.e. the compounds studied were grouped into two categories: actives (5, 6, 8, 9 and 17) and inactives (1, 2, 3, 4, 7, 10, 11, 12,  13, 14, 15, 16 and 18).
The stepwise discriminant analysis is a linear discriminant method based on the Fischer test (F-test) for the significance of the variables. 22In each step one variable will be selected on the basis of its significance.After two steps, the two more significant variables were extracted from the fifteen variables under investigation: MR and HE.The discriminant functions are given as follows: Group A : -76.67 + 1.16 MR -4.71 HE Group B : -45.44 + 0.93 MR -3.22 HE The variables MR and HE represent the strength of a molecular association by electrostatic interaction.By using the quantities given in the discriminant functions above, we can obtain the classification summary showed in Table 5.
The classification error rate was 0%, resulting in a satisfactory separation of the two groups.The allocation rule derived from the DA results, when the antischistosomiasis activity of a new neolignan compound is investigated, is: (a) initially one calculates, for the new  neolignan compound, the value of the two more important variables obtained with the DA methodology (MR and HE); (b) substitute these values in the two discriminant functions obtained in this work; (c) check which discriminant function (Group A -anti-shistosomiasis active compounds or Group B -anti-shistosomiasis inactive compounds) presents the higher value.The new neolignan compound is active if the higher value is related to the discriminant function of Group A and vice versa.
Comparing the results obtained with the DA and PCA methodologies, we can notice that the variables MR and HE are important in both methodologies.Recalling the PC1 equation, one sees that MR and HE are the two variables with higher weights.Thus, combining the results obtained with DA and PCA we can say that MR and HE are key variables for explaining the anti-shistosomiasis activity of the neolignan compounds studied here, but also Q 19 is an important variable to be considered when one is trying to obtain (design) neolignan compounds with anti-schistosomiasis activity.
It is interesting to notice that all of the three variables (MR, HE and Q 19 ) found here as having an important role in anti-schistosomiasis activity are electronic descriptors, therefore we can conclude that electronic properties have a very important role in the anti-schistosomiasis activity of neolignan compounds.Particularly, as the descriptors MR, HE and Q 19 represent the strength of a molecular association by electronic interaction, it is reasonable to suggest that electrostatic interactions play an important role in the mechanism of the anti-schistosomiasis activity.

Kth nearest neighbor (KNN)
The KNN method classifies a new compound (object) according to its distance to an object of the training set.The closer neighbors of the training set are found and the object will be assigned into the class that have the majority of its nearest neighbors.This method is self-validating because in the training set each sample (object) is compared with all of the others in the set but not with itself.The best value of K can be chosen based on the results from the training set alone. 23The classical KNN approach does not have outlier detection capability, i.e. a classification is always made whether or not the unknown object is a member of any class in the training set.
This method was used for the validation of the initial data set and Table 6 presents the results obtained with 1, 3 and 5 nearest neighbors.For the case of 1 nearest neighbor (1NN), the percentage of correct information was 100% while for 3 and 5 nearest neighbors (3NN and 5NN) the percentage was 94.4%.We decided to use 5NN instead of 1NN because the percentage of correct information is still high (94.4%)and the greater the number of nearest neighbors, the better the reliability of the KNN method.Since the experimental group we are working with had the proposal of ten new structures of neolignan compounds not yet synthesized (see Figure 6) we decided to apply our PCA, HCA and DA results, along with the KNN method, in the ten new neolignan compounds with the aim to predict the anti-schistosomiasis activity for this new set of molecules (test set).Table 7 shows the calculated values obtained for the test set with the variables HE, Q 19 and MR.
The results (prediction) obtained with the PCA, HCA, DA and KNN methods for the molecules showed in Figure 6 were similar.According to these four methods, the compounds 3T, 5T, 7T, 8T and 9T were classified as active molecules against schistosomiasis while the compounds 1T, 2T, 4T, 6T and 10T were classified as inactive molecules.
It is interesting to notice that compounds 3T and 8T were classified as active molecules with PCA, DA and KNN, and as inactive molecules with HCA, and compound 10T was classified as an inactive molecule with PCA, DA and KNN, and as an active molecule with HCA.However, compounds 3T and 8T have a higher probability to be consider as active molecules and compound 10T has a higher probability to be consider as an inactive molecule as three out of the four methods used in our prediction study had the same prediction results for the compounds 3T, 8T and 10T.The prediction results obtained with the four methods are summarized in Table 8.

Conclusions
The principal component analysis (PCA) and hierarchical cluster analysis (HCA) show that the neolignan derivative compounds studied here can be correctly classified into two groups (A and B) according to their anti-schistosomiasis activity.The PCA results show that the variables MR, HE and Q 19 are responsible for the separation between active and inactive compounds.
The discriminant analysis (DA) shows that the two groups A (active compounds) and B (inactive compounds) are well separated and only two variables, MR and HE, are responsible for the separation between the active and inactive compounds.The error rate was 0% and suggests an allocation rule to classify new neolignan compounds as active or inactive against schistosomiasis.
Since MR, HE and Q 19 are all electronic descriptors we conclude that electronic properties have an important role in anti-schistosomiasis activity of neolignan compounds.Particularly, as the descriptors MR, HE and Q 19 represent the strength of a molecular association by electronic interaction, it is reasonable to suggest that electrostatic interactions play an important role in the mechanism of the anti-schistosomiasis activity.
The PCA, HCA, DA and KNN methods were applied to ten new neolignan derivative compounds and classified five of them as active against schistosomiasis.

Figure 1 .
Figure 1.The central chemical structure and numbering used in all neolignan compounds studied.The letters t and d are, respectively, dihedral and bond angles assessed in the calculations

Figure 2 .
Figure 2. The chemical structure and anti-schistomiasis activity indication of the eighteen synthetic neolignan compounds studied

Figure 3 .Figure 4 .
Figure 3. PCA scores (PC1 and PC2) for the eighteen compounds with anti-schistosomiasis activity.The PCA methodology leads to a separation between two groups: active (Group A) and inactive (Group B)

Figure 5 .
Figure 5. Dendrogram obtained for eighteen neolignan compounds studied.The compounds are grouped into two categories: A (actives) and B (inactives)

Table 2 .
Values of the three most important properties (variables) that classify the eighteen neolignan compounds studied * 1 cal = 4.18 J

Table 3 .
Correlation matrix between the three most important properties (variables)

Table 1 .
Correlation matrix between all properties (variables) calculated

Table 4 .
Loadings of the two principal components displays the plot of the loading vectors for the first two principal components (PC1 and PC2).According to

Table 6 .
The test set (the ten new neolignan compounds) Classification obtained with the KNN method

Table 7 .
Values obtained for the properties (variables) of the ten new neolignan compounds (test set)

Table 8 .
The prediction results obtained with the four pattern recognition methods for the 10 new compounds: active compound (+) and inactive compound (-)