Chalcones and N-acylhydrazones : direct analogues ? Exploratory data analysis applied to potential novel antileishmanial agents

Leishmaniasis is an important health and social problem for which there is limited effective therapy. Chalcones and N-acylhydrazones have been studied as promising antileishmanial agents in enzymatic inhibition and in vitro assays. Since these chemical classes of compounds also resemble each other structurally, it would be useful to investigate whether they share direct analogy. Exploratory data analysis was applied to a library of chalcones and nitrated N-acylhydrazones assayed against Leishmania donovani to investigate their similarity. Under the conditions applied in the present study, the two classes did not present functional or structural analogy.


INTRODUCTION
Leishmaniasis is a neglected disease that affects millions of individuals worldwide causing significant morbidity and mortality, particularly in developing countries.Available drug therapy, however, is limited and frequently inappropriate (Chung et al., 2008).Of the 1393 new drugs marketed between 1975 and 1999, only 13 were targeted for tropical diseases (Trouiller et al., 2002).In fact, the effective drugs are toxic and often require parenteral administration for long treatment courses, increasing the chances of failure due to emergence of resistance (Croft, 2005).This scenario emphasizes the importance of the search for completely new active compounds.
Screening of natural products offers the promise of discovering new molecules with a unique structure, and both high activity and selectivity (Kayser et al., 2003).Chalcones are natural 1,3-diarylpropenones which exhibit a broad spectrum of potential applications, including antiprotozoal activity (Nielsem et al., 1998;Liu et al., 2001).This activity is likely derived from a cysteine protease inhibition mechanism.Cysteine proteases are key enzymes in many parasitic biochemical pathways and, therefore, constitute potential targets in the search for drugs against several tropical infectious diseases (McKerrow et al., 1999;McKerrow, 1999).For these reasons, sets of chalcone analogues have also been tested as potential cysteine proteases inhibitors.
Studies comparing the inhibitory effect of N-acylhydrazones and chalcones on well-known parasitic cysteine proteases, such as cruzain, falcipain, and trypanopain, have been conducted (Li et al., 1995(Li et al., , 1996;;Troeberg et al., 2000;).Both classes of compounds present IC 50 values in a micromolar range, suggesting that these compounds could be applied as prototypes in the development of new inhibitor agents.Since these chemical classes of compounds also resemble each other structurally, it would be interesting to verify whether they share direct analogy.
Direct analogues are defined as chemical entities which present, simultaneously, chemical and pharmacological similarities (Wermuth, 2006).
Structural and functional analogy between chalcones and N-acylhydrazones would bring some advantages in the search for new antileishmanial agents.3D and 4D-QSAR studies applied to both chemical classes could be able to provide ligand-receptor interaction information, and a common pharmacophore.Those findings could be useful for designing new and more potent compounds, which accumulate the significant molecular features of both chalcones and N-acylhydrazones.
Several chemometric methods have broadened the arsenal of tools that can be applied to QSAR studies.Exploratory data analysis methods of Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA) (Beebe et al., 1998) number among these methods.
HCA is an important multivariate method whose primary purpose is to emphasize the clusters and patterns of the investigated data, displaying them as a dendrogram, which allows the visualization of the samples or variables in a 2D space.The distances between samples or variables are calculated and transformed into a similarity matrix whose elements correspond to the similarity indexes (Ferreira, 2002).
PCA however, is a data compression multivariate method based on correlation among variables.Its objective is to group correlated original independent variables or descriptors and replace them with a new set called principal components (PCs) onto which the data is projected.The PCs are completely uncorrelated and are built as a simple linear combination of original variables.Furthermore, the PCs contain most of the variability in the data set within a much smaller dimensional space (Ferreira, 2002).
Thus, in this study, an exploratory data analysis using the above-mentioned methods was applied to a set of ninety-four chalcones and nitrated N-acylhydrazones to verify if they indeed share direct analogy.This is fundamental to subsequent construction of 3D-QSAR models, to explaining their antileishmanial behavior and to finding a common pharmacophore, in order to design new parasitic agents for this activity.

METHODOLOGY
A complete series of ninety-four compounds were investigated and are listed in Figure 1.The set encompasses forty-nine 5-nitroheterocyclic benzhydrazides, synthesized by Rando et al. (2008), and forty-five chalcone derivatives, which were selected from the work of Nielsen et al. (1998).
Biological activities were evaluated as the compounds in vitro concentration capable of inhibiting 50 percent of Leishmania donovani promastigote proliferation (IC 50 , µM) (Figure 1).The IC 50 values were expressed in negative logarithmic units, pIC 50 (-log IC 50 ), and comprise the set of dependent variables.
The three-dimensional structures of each of the ninety-four compounds or ligands, in their neutral forms, were constructed employing the HyperChem 7 software (HyperChem, 2002).The crystallized structures of nitrofurazone and 2'-hydroxy-4''-dimethylamino-chalcone were retrieved from the Brookhaven Protein Data Bank (PDB entry code 1yki, resolution at 1.70 Å; Race et al., 2005) and file ob1067.cif(resolution at 1.01 Å; Liu et al., 2002), respectively, and were used as geometry references in the building up of all ligands.Crystallographic information of a nitrothiophene 2-carbaldehyde was retrieved from the file lh6379.cif(resolution at 1.18 Å; McBurney et al., 2005) and also used as a reference particularly in the construction of the nitrothiophene ring moiety.
The energy minimization was carried out employing the HyperChem 7 MM+ force field without any restriction.The MOLSIM 3.2 program (Doherty, 1997) was also used for geometry optimization of each structure investigated.Partial atomic charges were calculated using the PM3 (Stewart, 1989) semi-empirical method, also implemented in the HyperChem program.
The structures modeled as described above were used as the initial structures in each molecular dynamics (MD) simulation (Van Gunsteren, Berendsen, 1990), which are employed to perform the conformational ensemble profile of each ligand (MOLSIM 3.2).
The MD simulation protocol included 100,000 steps, a step size of 0.001 ps (1fs), and a simulation temperature of 300 K. Trajectory files were recorded every 20 simulation steps to generate 5,000 conformations for each molecule.The lowest energy conformation was selected from MD simulation for each ligand and electrostatic partial atomic charges (ChelpG) were computed using the HF/6-31G* method, implemented in the Gaussian 03W program (Gaussian, 2003).
HCA and PCA (Beebe, 1998) were carried out using the Pirouette 3.11 program (Pirouette, 2003).The autoscaling procedure was applied as a preprocessing method, meaning that each element of data matrix was subtracted by its mean column and divided by the standard deviation of its column before the analysis.After screening using different linkage methods, HCA studies were performed considering the centroid linkage method.The PCA formalism was run up to seven factors.

RESULTS AND DISCUSSION
Table I shows the complete list of independent variables calculated for the compounds together with their descriptions and other relevant information.
The thermodynamic descriptors found for the lowest    energy conformation from MD simulation of each ligand included the following energy contributions: stretching energy (EL stretch ), bending energy (EL bend ), torsional energy (EL tors ), Lennard-Jones or 1,4 interactions energy (EL 1,4 ), electrostatic energy (EL el ), van der Waals energy (EL vdW ), hydrogen bonding energy (EL Hb ), and solvation energy (EL solv ).The total energy of each ligand (EL tot ) corresponds to the summation of all these energy contributions.
The correlation between the biological data (pIC 50 values) and the calculated independent variables or descriptors was visualized through the respective scatter plots.This is an important step in exploratory data analysis for acquiring reliable information about the dataset behavior.Correlation coefficients were not employed as a cutoff factor because other types of correlation, besides the linear type, must also be considered in similarity studies.
The D 4 parameter, for instance, describes the total distance between the two aromatic rings in both subsets of compounds.The D 4 range values found for the nitrated N-acylhydrazones and chalcone derivatives were 4.80 to 4.90 Å and 4.90 to 5.20 Å, respectively.These distance ranges suggest that, although the intermediary chain of N-acylhydrazones possesses four atoms, its 3D models are comparable to those of the chalcone derivatives, which have only three atoms in their intermediary chain.
Conversely, the scatter plot pIC 50 versus D 4 indicates that the related structural similarity does not translate to functional similarity (Figure 2).Nitrated N-acylhydrazones presents a narrow range of D 4 values but a significant pIC 50 value range.This indicates that factors other than D 4 are interfering with the biological action of these compounds.Similar behavior was also verified for the other distances and angles measured, particularly the D 2 , A 1 and A 2 descriptors (Figure 2 and Table II).
The visual inspection of the biological activity versus calculated descriptor scatter plots was also applied as a filter for selecting the independent variables that would be used to perform HCA and PCA.Concerning scatter plot behavior, the independent variables can be classified as shown in Table III.
Among the forty-seven calculated descriptors, thirteen (Ehb, Ehomo, ChelpG 4, D1, D2, D3, A1, A2, A3, H Donor , Dreiding, Hind 1, Hind 2) were practically constant and did not present significant variability with the pIC 50 values.Moreover, six parameters (ELstrech, Elumo, GAP, CLogP Hyper , D4, PSA) were grouped in their scatter plots, as opposed to showing a regular dispersion behavior, and were also not considered for further analysis.Thus, twenty-eight descriptors, which presented good or moderate correlation with biological activity, were employed in the exploratory data analysis (Table III).Volume and atomic mass were excluded from the analysis because they were not considered very expressive.
Preliminary HCA displayed chalcones and N-acylhydrazones grouped in two different main sub-clusters: I, which comprises exclusively nitrated N-acylhydrazones, and II comprising only chalcones, as presented in Figure 3.The similarity index between the sub-clusters I and II is 0.617 (70 selected descendants), suggesting low similarity.Additionally, sub-cluster III is composed of chalcone derivatives that are significantly different from the chalcone derivatives grouped in sub-cluster II.Moreover, regarding the PCA 3D-diagram, N-acylhydrazones are mainly distributed in the PC2 and PC3 spaces whereas chalcones can be found in all factors (PC1, PC2 and PC3 space), indicating distinct behavior among the compounds of this subset.
The PCA scores plot (Figure 3) also reveals three chalcone derivatives (n-53, n-72 and n-77) that possibly differ from the others.These three compounds could be acting as outliers interfering with the overall dataset behavior and leading to miscalculations.A 2D outlier diagnosis plot was therefore constructed to confirm this suspicion (Figure 4).
Considering the outlier diagram (sample residual versus Malahanobis distance), samples falling outside one or both of the thresholds are potential outliers.The sample residual threshold is based on a 95% probability limit (Pirouette, 2006), thus 5% of normal samples would be expected to fall outside this cutoff.For this reason, samples slightly exceeding only one threshold may be normal.However, samples lying either significantly beyond one threshold, or beyond both, are more likely to be outliers.Accordingly, only compound n-53 can be considered an outlier.In addition, another compound can also be identified as a potential outlier, namely, 32/S.After the exclusion of compounds n-53 and 32/S individually and, later, at the same time, HCA procedures were carried out and the similarity indexes between the previous sub-clusters I and II were computed and evaluated*.When compound 32/S was individually eliminated, the similarity index obtained (0.618) was identical to that obtained before the exclusions (0.617).Nevertheless, when only compound n-53 was excluded, the similarity index diminished from 0.617 to 0.464, indicating that n-53 interferes significantly in the behavior of the chalcone derivatives.A similar result was found when both compounds n-53 and 32/S were excluded (0.460).This finding suggests that compound n-53 has greater impact on the overall set behavior than compound 32/S, and it should be considered as the only outlier.
Optimization of the HCA and PCA procedures was performed by removing just compound n-53.The HCA results are shown in Figure 5.The new dendrogram reveals two major clusters.The first is composed of the sub-clusters I, II, III, IV, V, and VI, whereas the second was called cluster VII and is distinct from the rest of the investigated data.Sub-cluster II corresponds to the N-acylhydrazones main group while sub-cluster III represents the chalcones main group.As previously mentioned, the similarity index between II and III was 0.464 as shown in Figure 5.
Sub-cluster I is consists of seven N-acylhydrazones, containing more than one nitro group in its molecular structures (9/O, 9/S, 17/S, 23/O, 23/S, 24/O, and 24/S).The sub-clusters IV and V comprise chalcone derivatives that differ substantially from the chalcones of sub-cluster III.Finally, the dendrogram also revealed that compounds 32/O and 32/S did not share any kind of similarity neither with chalcones nor with the other nitrated N-acylhydrazones, being classified as a distinct sub-cluster (VI).These compounds possess a sulphonylamide group in their molecular structures.This chemical group can attribute completely different physicochemical features to the structures, such as higher hydrophilic and ionization potential properties, which could be responsible for the distinct behavior of these ligands compared to the other ligands of the investigated set.
It is noteworthy that cluster VII is composed of one single ligand, n-65, which does not share any kind of similarity with the investigated set of compounds.Considering the calculated parameters, this derivative differs significantly from the others in terms of its torsional energy contribution (EL tors ) (see Table A, Supplementary Material).
PCA methodology without the n-53 compound better elucidated the HCA results.The PCA procedure was run up to a maximum of seven PCs because when more than seven PCs were computed the total variance explained remained practically unchanged.
As shown in Table IV, the first three PCs described 62% of the total variance explained.The relative importance of each descriptor for the respective factor or PC (loadings) is also listed in this table.
The 3D scores plot confirmed the classification of the investigated set into two main separate subsets: Nacylhydrazones and chalcones.This fact, which can be visualized in Figure 5, is much more evident after the outlier exclusion, corroborating the substantial impact of compound n-53 on the set behavior.Moreover, the partitioning of the investigated set also indicates that chalcones and N-acylhydrazones do not share direct analogy or, in other words, do not show both structural and functional similarity, at least regarding the methodology applied in this study.
The clusters found with the HCA technique can also be observed in PCA results.For instance, sub-clusters I, II, VI, and cluster VII (compound n-65) are clearly discernible in view 1 of the 3D scores plot.
Parallel analysis considering the loading plot as well as the score (Table B, Supplementary Material) and loadings (Table IV), the tables revealed that the nitrated N-acylhydrazones biological behavior depends chiefly on thermodynamic and electronic descriptors, such as total energy (EL tot ), bending energy (EL bend ), total dipole moment (µ tot ), and the number of acceptor hydrogen bonding sites (H Accep ).
It is important to consider that nitroderivatives can also exert their action by a nitro group reduction pathway (Orna, Mason, 1989), and this could be the mechanism highlighted in the methodologies applied here.Non-nitrated N-acylhydrazones could be more suitable to demonstrate a common mechanism of action between the two classes of compounds investigated.However, as previously mentioned, a set of non-nitrated N-acylhydrazones assayed under the same biological conditions as the classes evaluated in this study were not available in the literature.
Nonetheless, if a second mechanism of action takes place and is similar to that of chalcones, descriptors of any other nature should be common to both classes.
Chalcones, on the other hand, presented a broader distribution in PCs coordinates than did N-acylhydrazones (see View 2, Figure 6).The van der Waals energy (EL vdW ) and electronic descriptors, such as dipole moment at the X coordinate (µ x ), and electrostatic potential partial atomic charges at the carbonyl carbon (ChelpG 1) and oxygen (ChelpG 2), respectively, are related to chalcone biological behavior.These latter descriptors could indicate the importance of the carbonyl group to chalcone action and also support the cysteine protease inhibitory mechanism of action.
Interestingly, sub-cluster V of chalcones differs from the other chalcone derivatives by descriptors related to the molecular volume, such as van der Waals surface areas (SA VDW ), polarizability (a), and molar refractivity (MR).All compounds in this group presented bigger or more branched side chains (Figure 1).In fact, the Connolly surfaces of these compounds can present volume differences compared to the other unsubstituted chalcone analogues (sub-cluster III) (Figure 7).

FIGURE 3 -
FIGURE 3 -Preliminary PCA and HCA of the investigated set of compounds.

FIGURE 7 -
FIGURE 7 -Electrostatic potential maps calculated on the Connolly surfaces of the chalcone derivatives from subclusters III and V. Negative electrostatic potential regions are represented in red (high electronic density) while positive electrostatic potential areas are shown in dark blue (low electronic density).

FIGURE 6 -
FIGURE 6 -Scores and loadings on 3D plots found for the first three factors or PCs.

TABLE I -
Descriptors calculated for the set studied

TABLE I -
Descriptors calculated for the set studied (cont.)

TABLE II -
Topological and structural parameters calculated for the investigated compounds

TABLE III -
Descriptor classification regarding tendency on scatter plots versus pIC 50 values

TABLE IV -
Principal component analysis runs for 93 samples and 25 independent variables.The loadings of independent variables are listed below