EMPLOYING CONFORMATIONAL ANALYSIS IN THE MOLECULAR MODELING OF AGROCHEMICALS : INSIGHTS ON QSAR PARAMETERS OF 2 , 4-D

A common practice to compute ligand conformations of compounds with various degrees of freedom to be used in molecular modeling (QSAR and docking studies) is to perform a conformational distribution based on repeated random sampling, such as Monte-Carlo methods. Further calculations are often required. This short review describes some methods used for conformational analysis and the implications of using selected conformations in QSAR. A case study is developed for 2,4-dichlorophenoxyacetic acid (2,4-D), a widely used herbicide which binds to TIR1 ubiquitin ligase enzyme. The use of such an approach and semi-empirical calculations did not achieve all possible minima for 2,4-D. In addition, the conformations and respective energies obtained by the semi-empirical AM1 method do not match the calculated trends obtained by a high level DFT method. Similar findings were obtained for the carboxylate anion, which is the bioactive form. Finally, the crystal bioactive structure of 2,4-D was not found as a minimum when using Monte-Carlo/AM1 and is similarly populated with another conformer in implicit water solution according to optimization at the B3LYP/aug-cc-pVDZ level. Therefore, quantitative structure-activity relationship (QSAR) methods based on three dimensional chemical structures are not fundamental to provide predictive models for 2,4-D congeners as TIR1 ubiquitin ligase ligands, since they do not necessarily reflect the bioactive conformation of this molecule. This probably extends to other systems.


INTRODUCTION
The stereochemical control of molecules governs many micro and macroscopic properties of substances, like the biological activity of drug like compounds. Molecules with one or more rotational degrees of freedom (rotatable bonds) exhibit conformational isomerism, where the stable conformers correspond to energy minima and, therefore, to the most probable structures to be found in materials, pharmaceuticals and agrochemicals (FREITAS; . The three dimensional shape of molecules is ruled by intramolecular interactions, but it is also influenced by the medium (solvent, for instance). Certainly, hydrogen bonding is the main interaction stabilizing conformations, but others also take place in many cases, such as electrostatic effects and hyperconjugation. For example, the double helix in DNA is due to intermolecular hydrogen bonding between nucleotides of different strands (VOET et al., 2000), while the anomeric effect (a widely known effect operating in carbohydrate derivatives) (EDWARD, 1955) has been claimed to be originated either from electrostatic effects (the dipolar repulsion in the  anomer is supposed to be stronger than in the  anomer) (MO, 2011) or hyperconjugation (electron donation from the endocyclic oxygen lone pair toward the vicinal axial antibonding orbital) (KIRBY, 1983).
Because conformation plays a key role in biological activity, since most of action mechanisms are based on the correct fit between the active site of an enzyme and the substrate, many quantitative structure-activity relationship (QSAR) methods have been developed to account for the three dimensional structure of drug like molecules. In QSAR, descriptors representing molecular structures are usually calculated to be correlated with their biological properties; thus, a correlation model is built and the biological property of new molecules can be estimated using the calculated molecular descriptors in the equation obtained. Consequently, there is no need to synthesize and biologically test a new compound to obtain its biological property, since it can be predicted using the QSAR model. However, three dimensional QSAR methods and higher order analogs are usually difficult to be handled, especially because of the conformational screening and three dimensional alignment rules required by these methods to obtain molecular descriptors. In addition, there is not any evidence that a given conformation instead of another can affect the quality of QSAR models, despite the profound effects of conformation on the biological activity. Thus, simpler methods, like those based on connectivity indices and bidimensional shape, are supposed to perform appropriately, at least in many practical cases (BROWN; MARTIN, 1997;ESTRADA et al., 2001).
The most widely known three dimensional method is based on comparative molecular field analysis (CoMFA) (CRAMER III et al., 1988), in which aligned molecular structures of a data set are put inside a virtual grid cell, where probes reflecting steric, electrostatic and other effects pass through the grid intersections to generate thousands of descriptors. A similar approach including other indices, like hydrogen bond, was further developed and named Comparative Molecular Similarity Indices Analysis (CoMSIA) (KLEBE et al., 1994), but no significant improvement in comparison to CoMFA was achieved. In order to account for new dimensions, namely conformational and alignment freedom, receptor insight and solvation effects, 4D (HOPFINGER et al., 1997), 5D (VEDANI;DOBLER, 2002) and 6D QSAR (VEDANI et al., 2005) methods have been developed. In all cases, the three dimensional structure of a possible ligand is required and, consequently, several methods to generate and optimize molecular structures should be used to give rise an input geometry for the QSAR calculations. Thus, an overview about some classes of theoretical models used to obtain molecular conformations are briefly described as follows.

COMPUTATIONAL MOLECULAR MODELS
The conformational isomerism of organic compounds can be determined by means of experimental techniques, with emphasis on spectroscopic methods. For example, the conformational preferences of a given compound can be estimated by the relative band intensities of conformers possessing a functional group using infrared spectroscopy. Alternatively, since coupling constants in nuclear magnetic resonance is often an average value among conformers, their populations can be obtained if the corresponding intrinsic coupling constant is determined through low temperature experiments or using rigid derivatives. However, because computational estimation of geometries and energies has been progressively improved in the recent years (both in terms of computing time and accuracy), molecular modeling has replaced or at least corroborated experimental approaches in many research examples. The use of a variety of computational models to represent molecular properties is especially dependent upon the system size and/or type, the number of molecules to be analyzed and the availability of computational resource. The computational models are briefly described as follows.
The simplest computational model is based on molecular mechanics, which is quite used in the modeling of large molecular systems. Molecular mechanics (MM) methods use principles from classical physics to predict the structures and properties of molecules, while the different MM methods differ from one another by their force fields (FORESMAN;FRISCH, 1996). A force field has a set of equations defining how the potential energy of a molecule varies with the locations of its component atoms. The potential energy is dependent on the parameters included in the routine program, which are values used in the equations to relate atomic characteristics to energy Ciênc. agrotec., Lavras, v. 37, n. 6, p. 485-494, nov./dez., 2013 components, and structural data such as bond lengths and angles. Since QSAR models are usually built using large data sets, MM methods are frequently applied to obtain the molecular structures used in 3D QSAR. An optimal geometry is probably not generated using this approach, because its theoretical background is often not sufficiently rigorous to provide very accurate results, which is better achieved using electronic structure methods.
Electronic structure methods include semi-empirical and ab initio methods, whose similarity lies on trying to solve the Schrödinger equation (H  = E  ). On the other hand, the main difference between these methods is that semi-empirical methods use parameters derived from experimental data to simplify the computation (different semi-empirical methods are largely characterized by their differing parameter sets), while ab initio method do not use any parameter derived from experiment, but they are based solely on the laws of quantum mechanics (FORESMAN;FRISCH, 1996). Another class of electronic structure methods are the density functional methods (DFT), which are similar to ab initio methods in many ways, but they present some advantages, such as the inclusion of effects of electron correlation (the fact that electrons in a molecular system react to one another´s motion and attempt to keep out of one another´s way). Since the electron correlation approximation in DFT is quite simpler than in Hartree-Fock calculations, DFT methods are computationally less expensive and, therefore, have been preferred in many studies. For years, the Becke´s DFT method using the electron correlation approach of Lee, Yang and Parr (B3LYP) (BECKE, 1988;LEE et al., 1988) has been widely used in many studies about molecular structure. More recently, DFT analyses have been improved by including dispersion effects, becoming the so called DFT-D methods as accurate as the high level method CCSD(T) in some cases, even for highly dispersive systems (JOSA et al., 2013).
Thus, geometries and energies obtained by different methods not always converge to similar results; important discrepancies are observed between MM and electronic structure methods, but also between semiempirical and ab initio or DFT methods . Considering that chemical structures used in 3D QSAR derived from a simple optimization step can be obtained either by MM or electronic structure methods, the chosen molecular framework used in the alignment step may not be only different from the bioactive structure, but the order of stabilities in the energy minima will probably vary with the theoretical level. Consequently, the choice for a given conformer to be used in 3D QSAR from a simple optimization step is arbitrary, making traditional QSAR feasible to be revisited. In fact, the identification of bioactive conformations is an important part of understanding the relationship between the structure and the biological activity of a molecule. Consequently, derived experimental data on the bioactive conformation should serve as verification wherever possible ( CUNHA et al., 2005). It should be kept in mind that the MD resulting structures can also be applied to construct the conformational ensemble profile (CEP) of each ligand into the active site, thus overcoming the scenario of a poor molecular conformational sampling (CUNHA et al., 2012;SODERO et al., 2012). Furthermore, crystallographic data of the enzyme-inhibitor complex can generate useful information about the enzyme-inhibitor interaction, as well as on the bioactive conformation of ligand in order to validate the chemical structures in 3D QSAR (CUNHA et al., 2013).

QSAR METHODS
The main advantage of 3D QSAR methods over traditional QSAR, according to their developers, is that chemical information can be obtained from the surfaces of molecular force fields, which indicate the groups leading to increase or decrease in biological activity. Since such surfaces depend on the ligand conformation, which is dependent on the theoretical level of calculation (which can lead or not the structure to the bioactive-like conformation), the value of 3D QSAR methods is not superior to those of traditional QSAR approaches. The systematic in QSAR was first introduced in the sixties by Hansch and Fujita (HANSCH;FUJITA, 1964), in which hydrophobicity was found to correlate with the biological activity of various compounds against mosquito larvae, bacteria, houseflies, rodents, guinea pigs and mice. Consequently, the biological activity could be indirectly determined by calculating logP (the logarithm of the octanol-water partition coefficient) and other constants. These parameters are independent of the three dimensional structure of a molecule.
Many molecular descriptors were subsequently investigated to provide a better correlation with bioactivity. For example, physicochemical descriptors representing topological and connectivity indices, constitutional parameters, charge, walk and path counts, etc. are widely used to create QSAR models (TODESCHINI; CONSONNI, 2000). Images representing bidimensional molecular structures have also been used to generate descriptors to use in QSAR; descriptors in the so called MIA-QSAR (Multivariate Image Analysis applied to QSAR) (FREITAS et al., 2005) are pixels. Because pixels can be managed numerically as binaries, the white color corresponds to 765, while black pixels are zero, according to the RGB system of colors. In MIA-QSAR, images correspond to chemical structures drawn with the aid of a software like ChemDraw and CheSketch. The structural changes or changes in the substituent positions in a congeneric series of compounds correspond to alterations in the pixel coordinates; these alterations explain the variance in the y block, which is the dependent variable block (e.g. biological activity). More recently, the MIA-QSAR method (a genuine 2D approach) was augmented by introducing new dimensions to better encode chemical information: different atomic radii and colors were found to represent size (like in steric effects) and the diverse periodic properties of elements (e.g. different colors encode the different atomic electronegativities) (NUNES; FREITAS, 2013).
In the next section, a case study is developed to show the dependence of structural conformation with various theoretical levels for geometry optimization, as well as the implications of this in QSAR.

CASE STUDY: 2,4-DICHLOROPHENOXYACETIC ACID (2,4-D)
2,4-Dichlorophenoxyacetic acid (2,4-D) is a synthetic auxin (a class of plant hormones) widely used in the control of broadleaf weeds. 2,4-D complexes with TIR1 ubiquitin ligase enzyme, then controlling plant growth and development (TAN et al., 2007). A variety of conformers can take place upon rotation of the four torsional angles of 2,4-D, but one corresponds to its crystal, bioactive conformation ( Figure 1).
Conformational search consists of finding lowenergy conformers from a pool of possibilities. A Monte-Carlo search follows a path that biases in favor of lowenergy conformers (but that does not completely exclude high-energy conformers). While there is no guarantee that the lowest-energy conformer (the global minimum) will actually be located, it can be shown that the set of con former s exam ined a pproaches a Boltzm ann distribution. Semi-empirical calculations are often coupled to Monte-Carlo to search for conformations of possible ligands, due to the lower computational cost compared to ab initio or DFT methods. This practice is particularly common to generate structures to be used in three dimensional (3D) QSAR, such as molecular field fit techniques (CRAMER III et al., 1988;KLEBE et al., 1994) and in docking studies.
Certainly, conformation plays a key role in describing the biological activity of a given molecule. However, QSAR models based on 3D descriptors are not fundamental to produce highly predictive models, since the most stable conformation of a given ligand may not correspond to the best choice, which is the bioactive conformation. Such QSAR methods are very informative, but one and two dimensional descriptors are supposed to be capable of estimating biological properties similarly well (TIAN et al., 2007;FREITAS et al., 2005, TODESCHINI;CONSONNI, 2000). In order to show the dependence of a conformational search method and the level of theory with conformer energies and structures, the conformations of 2,4-D (both undissociated and dissociated forms) were analyzed using semi-empirical and DFT methods and then compared with the crystal structure of this herbicide (in the carboxylate form).
Monte-Carlo search tries to find energy minima by random sampling; thus, the whole set of conformers is not necessarily found. Such a search was performed using the semi-empirical Austin Model 1 (AM1) method and 12 conformers were found for the undissociated 2,4-D, but two of them are equivalent (9 and 10, figure 2), resulting in 11 different conformers. Conformer 1 was calculated to be the most stable form in the gas phase at the AM1 level, while optimization of the conformers obtained by AM1 gave a different conformational profile when using density functional theory (B3LYP/aug-cc-pVDZ). In the gas phase, conformer 2 was found as the global minimum and ca. 1 kcal mol -1 more stable than 1 according to DFT; this preference remains in water solution (to approach a physiological medium), despite the changes in relative energies of the other conformers (Table 1). Moreover, conformers 6, 7, 9 and 11 found by AM1 were not located as energy minima at the B3LYP/aug-cc-pVDZ level, which converted those conformers to 6', 7', 9' and 11' (2 and 11', and 9' and 12 are structurally similar to each other; thus, 10 different conformers are shown in figure 3).
Figure 2 -Conformations obtained by the Monte-Carlo search at the AM1 level (relative energies are given in kcal mol -1 outside parenthesis). Conformers 9 and 10 are equivalent. Conformers 6, 7, 9 and 11 converge to different structures when optimized at the B3LYP/aug-cc-pVDZ level.
According to the stability profiles given in table 1 for the DFT results, 2 is the main conformer because of an intramolecular hydrogen bond O--H ... O; this is confirmed by the energy of stabilization of the corresponding hyperconjugative interaction n O   * OH of 4 kcal mol -1 , obtained by natural bond orbital (NBO) analysis. Not in a biological medium, the conformational equilibrium of undissociated 2,4-D is dictated by a balance of hyperconjugation and Lewis-type interactions. The full energy of a molecule can be decomposed into hyperconjugative (attractive) and Lewis-type (repulsive) interactions; NBO analysis provides individual interactions and the total energy of hyperconjugation by deleting twoelectron/two-orbital interactions and computing the resulting energy of the system. Therefore, given the relative energy of the conformers, the Lewis-type contribution can also be obtained. Importantly, those conformers with the C-O-C portion coplanar to the aromatic ring are capable of exhibiting strong n O   * C1C2 interactions, owing to the resonance of the lone pairs of the ether oxygen; for example, this interaction is ca. 28 kcal mol -1 stabilizing for conformer 2 and is similarly observed Figure 3 -Conformations optimized at the B3LYP/aug-cc-pVDZ level from that obtained by the semi-empirical AM1 calculations (relative energies are given in kcal mol -1 outside parenthesis). Conformers 2 and 11', as well as 9' and 12, are equivalent. Ciênc. agrotec., Lavras, v. 37, n. 6, p. 485-494, nov./dez., 2013 for conformers 4, 8, 10 and the crystal structure. On the other hand, such geometry imposes syn coplanarity to the fragment C-O-C=C and steric repulsion then takes place. Table 2 gives the contribution of hyperconjugative and Lewis-type interactions for the conformational stabilities of undissociated 2,4-D, both in the gas phase and implicit water. Table 1 -Relative energies (in kcal mol-1) in conformers of 2,4-dichlorophenoxyacetic acid (2,4-D). a a Equal letters (a, b and c) correspond to structures that converged to equivalent conformers. d Conformation obtained by AM1 which is different from that obtained by DFT The bioactive conformation of 2,4-D is in the carboxylate form, due to the lack of a proton when bound to the active site of TIR1 ubiquitin ligase enzyme (PDB code: 2P1N) (Figure 4). In order to compare the calculated conformations of 2,4-D with the bioactive crystal structure, a Monte Carlo conformational search at the semi-empirical AM1 level was performed for the dissociated 2,4-D, and two conformers with an energy difference of 3.6 kcal mol -1 were found ( Figure 5). These two conformers do not correspond to the bioactive-like geometry, while further optimization of the AM1 structures at the B3LYP/aug-cc-pVDZ level converged to other two structures ( Figure 5), with the most stable one (by 6.2 kcal mol -1 ) being very similar to the experimental, crystal 2-4-D. However, in water solution (using implicit solvent with the PCM model), the relative energy of the least stable conformer decreased to only 0.3 kcal mol -1 , indicating that it should be significantly populated in aqueous solution (mimicking a real situation). Thus, it would not be improbable to select the second stable conformer in a QSAR modeling, but it does n ot corr espon d to th e a ct ua l bioa ct ive conformation, which is induced by the enzyme active site, because of intermolecular hydrogen bond between the carboxylate oxygens and hydrogens in Arg403 and Ser438 amino acid residues. found by a density functional method. Some minima found by AM1 are not equivalent to that obtained by DFT. An additional conformational screening using molecular mechanics encountered 25 structures and the most stable one corresponds to the global minimum found by DFT; however, many structures correspond to saddle points. The crystal structure of the 2,4-D ligand (carboxylate) is not coincident with any conformation found by Monte-Carlo search and is not even an energy minimum at the AM1 level. At the B3LYP/aug-cc-pVDZ level, the global minimum is similar to the experimental crystal structure in the gas phase, but the other conformer must be similarly populated in water solution.
In mol ecul ar model in g, especi a ll y 3D QSAR, conformations (poses) that are obtained by simple optimization steps can not necessarily reflect the bioactive conformation; even though, the use of such three dimensional structures provide significant correlation with biological properties, at least in many practical cases. Obviously, molecular conformation plays a key role in describing biological properties of molecules, but the results of this work suggest that, in QSAR studies, satisfactory correlations can be obtained in dependen tl y of t he con form at ion used. Th is corroborates the fact that QSAR methods based on bidimensional representation of molecules, such as MIA-QSAR (FREITAS et al., 2005), are so predictive as the n-dimensional (n  3) techniques, while 3D methodologies are usually time and cost demanding, due to the necessity of conformational screening and 3D alignment of ligands.

COMPUTATIONAL METHODS
A conformer distribution was carried out for 2,4-D using the Monte-Carlo method at the semi-empirical AM1 level (DEWAR et al., 1985), available in the Spartan program (2000). Each minimum found was subsequently optimized at the B3LYP/aug-cc-pVDZ level (BECKE, 1988;LEE et al., 1988;KENDALL et al., 1992) of theory, both in the gas phase and water (using the polarizable continuum model, PCM, and using a cavity built up using the UFFradii with spheres around each solute atom) (TOMASI, et al., 2005). Frequency calculations were performed to guarantee that no imaginary frequencies were present. Natural bond orbital (NBO) calculations (GLENDENING et al., 2001) were performed at the same level of theory, also using the Gaussian09 program (FRISCH et al., 2009). The crystal structure of 2,4-D was obtained from the Protein Data Bank (PDB code: 2P1N) and submitted to optimization and NBO calculations at the B3LYP/aug-cc-pVDZ level.

CONCLUSIONS
Bioactivity of organic molecules is conformationdependent. However, theoretical conformational analysis is dependent on the method and level of theory. Intramolecular effects can drive the conformer stability and, therefore, the mode of interaction with an enzyme. On the other hand, the biological environment can also induce conformational changes, such as demonstrated in this work for 2,4-D. 3D QSAR considers the spatial structure of a molecule to generate variables to be used in the modeling, such as steric and electrostatic descriptors. Since the conformer chosen in 3D QSAR may not correspond to the effective, bioactive structure, the good correlation obtained by some 3D QSAR methods in som e cases could pr obabl y be not due to the conformation used; 2D shape, connectivity indices and fragment/atomic properties are some parameters supposed to be more descriptive, at least for the example of this work.