Core Electron Binding Energy ( CEBE ) Shifts Applied to Structure Activity Relationship ( SAR ) Analysis of Neolignans

O deslocamento da energia de ligação do elétron do caroço (∆CEBE) e o CEBE de átomos de carbono, calculados com o método semi-empírico HAM/3, foram utilizados como descritores no estudo das relações estrutura-atividade (SAR) de seis neolignanas. Os resultados obtidos demonstraram a eficiência deste tipo de descritores, nas análises de SAR. Usando-se cinco valores selecionados de CEBE, dos carbonos presentes nos anéis fenílicos das neolignanas, foi possível classificá-las nas duas categorias, ativa e inativa, usando-se os métodos HCA, PCA, KNN e SIMCA.


Introduction
The choice of molecular descriptors is one of the most crucial parts in the work of SAR/QSAR.Many descriptors have been suggested and employed.Many of them are successful and well accepted.Some descriptors that can be calculated by quantum mechanical methods have been recognized as useful in QSAR.However there still remains room to search for alternative and/or better descriptors than those in use, especially those descriptors that can be evaluated theoretically.One of our objectives is to look for more useful descriptors than thus far used, employing mainly quantum mechanical and/or other theoretical and computational methods.Lindberg et al. had shown that core electron binding energy correlate linearly to the Hammett sigma constants (σ) in substituted benzene derivatives. 1 Let us consider electrophilic aromatic substitution at para position of mono substituted benzene Ph-X as an example.X in Ph-X is a substituent such as OH, NH 2 , NO 2 etc. Linderberg et al. 1 demonstrated the validity of an equation similar to equation 1: Left hand side of equation 1 is the difference between core electron binding energy (CEBE) of para carbon of Ph-X and CEBE of carbon atom of benzene Ph-H, which is the reference molecule.The left hand side of equation 1 is called CEBE shift or ∆CEBE.The right hand side of equation 1 is a product between a constant κ and a Hammett sigma constant at para position, σ X p , of Ph-X.Equation 1 is an approximate equation.There are equations similar to equation 1 at other carbon atoms in the molecule such as ortho and meta positions in Ph-X.Linear relationship between ∆CEBE and Hammett sigma constants is not limited to monosubstituted benzenes.There are corresponding equations for multiply substituted benzenes.Straight lines were obtained by plotting experimentally observed CEBE values of Ph-X with respect to corresponding Hammett sigma constants σ X . 1 This is a demonstration of the validity of equation 1.
Recently we reconfirmed this 2 by calculating accurate ∆CEBE of the ring carbon in mono substituted benzene (Ph-X) in relation to the ring carbon in Ph-H.Density functional theory (DFT) was employed for the calculation of the ∆CEBE .Good agreement between the calculated CEBE and the Hammett σ constant 3 of the corresponding atom was obtained. 2Since Hammett sigma constant is one of the important descriptors in QSAR analysis, 4 we can expect that ∆CEBE calculated theoretically can be also a useful descriptor in QSAR.Hammett sigma constants are usually determined experimentally.However in many drug molecules, Hammett sigma constant is not available.The object of the present work is firstly, to calculate ∆CEBE of a set of selected molecules whose Hammett sigma constants are not known, and secondly, to investigate whether or not the ∆CEBE is related to the biological activity of the molecules.
We chose six neolignans in which three of them are inactive and the other three are active against leishmaniasis.Figure 1 shows a skeleton of the neolignans and Table 1 list the six selected molecules and classes of biological response (active or inactive).They were taken from our previous publication. 5All the six molecules have the common basic skeleton.

Method of Calculation
We used the molecular geometry calculated by MM2 method previously. 6The semi-empirical HAM/3 (Hydrogenic Atoms in Molecules, version 3) 7 method was used to calculate CEBE of the compounds.As far as we know, HAM/3 is the only semi-emprical method that can calculate CEBE's of a molecule.The widely used and well known semi-empirical method such as AM1 and ZINDO are not capable of calculating CEBE of a molecule.From our previous experience, average absolute deviation of CEBE's calculated by HAM/3 is expected to be about 1.50 eV. 8 This is much larger than 0.3 eV that was attained by non-empirical DFT. 2 Only advantage of HAM/3 is its much higher speed of calculation in comparison to nonempirical DFT.Since neolignan is fairly large molecule and calculation of CEBE has to be done one atom at a time, HAM/3 is a method of choice.We calculated CEBE's of 14 carbon atoms, C1-C14, in each molecule, that comprise of all the 12 carbon atoms in the two benzene rings plus two carbons that bridge the two benzene rings (see Figure 1).∆CEBE's were calculated by equation 2, taking the difference between the calculated CEBE of each of the molecules and the value of 286.20 eV which is the CEBE of a carbon atom in an isolated benzene molecule calculated by HAM/3.
Then, Fisher's weights of the ∆CEBE's were calculated.Some top greatest values of the weights were selected as useful descriptors for SAR analysis.Pattern recognition methods 9 such as principal component analysis (PCA), hierarchical clustering analysis (HCA), K-nearest neighbors (KNN) and SIMCA were employed to study relation between the selected descriptors and the biological activity (SAR).The data were preprocessed by the method of autoscaling, then they were employed in the pattern recognition methods.

Results and Discussions
Table 2 and Table 3 list calculated CEBE's and ∆CEBE's respectively of five carbon atoms C1, C2, C4, C9 and C11 (See Figure 1) selected.These carbon atoms have the top five greatest Fishers' weights among the whole set of the 14 carbon atoms.The first three atoms, C1, C2, C4, belong to the A-Ring; the other two, C9 and C11, belong to the B-Ring of neolignan.Figure 2 shows a dendogram produced with HCA using the ∆CEBE's in Table 3.The linkage method used was that of single link.The scale numbers on top part of the figure indicate similarity.The active compounds (5, 4 and 6) are grouped together upper part of the figure, while inactive ones (1, 2 and 3) are grouped together lower part of the figure.The two groups are well separated.Figure 3 shows score plot produced by PCA using the ∆CEBE's in Table 3.The x-axis represents PC1 explains 79.0% of variance and PC2 explains 15.0%.Cumulate variance up to PC2 is, therefore, 94.0%.Equation 3indicates that all the five selected carbon atoms contribute more or less the same magnitude.The outstanding descriptors in equation 4 are C9 and C1. Figure 4 shows the loading graph for the five descriptors.The ∆CEBE's at C2 and C4 are mainly responsible for pulling the active group (4, 5 and 6) towards right hand side in the score graphics.The ∆CEBE's of the three inactive compounds (1, 2 and 3) are generally smaller than those of active compounds.This is especially true at C2 (Table 3).These are the reasons why the three inactive compounds (1, 2 and 3) are located extreme left in Figure 3.We also used KNN, and SIMCA methods using the five selected     3).
descriptors.All of the 6 compounds were correctly classified by both KNN and SIMCA.Instead of ∆CEBE (Table 3), we also used CEBE values themselves (Table 2) to see if they work as useful descriptors in SAR analysis with the pattern recognition methods.The results were completely identical to those obtained with ∆CEBE.This is due to the fact that the difference between CEBE and ∆CEBE is the constant (equations 1 and 2).After preprocessing of the data sets, the processed CEBE and ∆CEBE data sets become identical.Equation 1 can be rewritten in the form of equation 5, In equation 5, CEBE (Ph-H) is a constant because it is the CEBE of benzene which is the reference molecule.CEBE (Ph-X) is a linear function of variable σ X with slope κ.The linearity of equation 5 was shown in figures in the literatures. 1,2Equation 5 shows that if ∆CEBE's work as descriptors in SAR analysis, CEBE(Ph-X)'s themselves equally work as descriptors in SAR analysis.This situation is what we have confirmed numerically.We used mono substituted benzene (Ph-X) to discuss equations 1 and 5.But we can extend the discussions to multi substituted benzenes without loss of generality.
The Hammett equation, equation 6, correlates the equilibrium (or rate) constants (K) with the substituent constants σ X for a system concerned X : Here the subscript 0 denotes a reference system, and ρ, the reaction constant, is specific for the reaction considered.In case of chemical equilibrium under constant temperature, left hand side of equation 6 is linearly proportional to the difference of the change of free energy of Gibbs (∆∆G) in chemical/biological reactions between the system concerned and its reference system.∆∆G is directly related to the relative affinity of interaction between the ligand and the biological target in the system concerned.This is the reason why Hammett sigma constants σ are so widely employed in the area where chemical and/ or biological reactions are concerned.Comparison between equation 1 and equation 6 immediately reveals that ∆CEBE is a quantity that is linearly proportional to ∆∆G.∆CEBE has similar interpretability to the Hammett sigma constant σ.
Equation 5 indicates that CEBE (Ph-X) itself has a similar interpretability as ∆CEBE.Since in SAR studies, it is the relative quantity of ∆∆G that is important.Absolute value of ∆∆G is not necessary for the most of the cases.Both ∆CEBE and CEBE are approximately proportional to ∆∆G.This is the reason why they work in SAR analysis.
Number of compounds we worked in the present work is only six.This number is very small.The first reason why we worked with such a small set of molecules is that we wanted a quick and preliminary test if ∆CEBE (and CEBE) calculated with HAM/3 would serve as useful descriptor for SAR.Secondly, PCA works well even number of compounds are as small as six.This was demonstrated in our previous publication. 10We are currently working with a large number of compounds in order to see if ∆CEBE can really be one of useful descriptors in SAR/QSAR.

Conclusion
∆CEBE (and CEBE) calculated with HAM/3 method was shown to serve as useful descriptor for SAR analysis of the six neolignans studied.Using five selected ∆CEBE's, the compounds were well separated by HCA, PCA, KNN and SIMCA methods.CEBE and its shift (∆CEBE) of an atom in a molecule reflect faithfully its chemical environment.Since ∆CEBE is linearly proportional to Hammett sigma constant, there is no surprise that ∆CEBE (and CEBE) demonstrated its usefulness in SAR.The conclusion thus far described is of a temporary nature, because the number of samples treated is very limited.Definite and general conclusion can be drawn only when a large number of samples with different types of molecules are treated.

Figure 2 .
Figure 2. Dendogram of the six neolignans resulted with HCA method.The numerical values on the scale on top part of the figure are similarity.

Figure 3 .
Figure 3. PCA score plot for the six neolignans.

Table 1 .
Six studied neolignans.See Figure1for the positions of substituents "R" listed in the first line of the table