Introduction
Human growing population and prosperity around the world is driving an increased demand for food. Therefore, agriculture has developed to achieve higher yields and better-quality products. To yield high crop production, it is important to protect the cultures from pests and diseases.^{1}^{,}^{2} Weeds produce the highest potential loss overall among the pests, which can be avoided by applying physical, biological and chemical measures.^{2+} Crop protection by chemical measures spreads worldwide since the commercialization of the two phenoxyacid herbicides: 2,4-dichlorophenoxyacetic acid (usually called 2,4-D) and the 2-methyl-4-chlorophenoxyacetic acid (MCPA).^{2}^{,}^{3} The use of chemicals in crops has become the main method of weed control because of its low cost and high efficiency, encouraging reliance on them.^{4} However, frequent use of chemicals with similar biological mechanism creates a selective pressure on weeds, leading to the development of resistance.^{1}^{,}^{4}^{,}^{5} Herbicide resistance was firstly reported in 1970 and confirmed in 495 unique cases until 2017.^{5}^{,}^{6} Usually, the resistance acquired by a weed species is a result of a mutation in the active site. The development of herbicides with a different mechanism of action is desirable, because it will generally be able to eliminate the herbicide-resistant weed.^{7} However, Duke,^{7} in 2012, stated that “no new major herbicide mode of action has been introduced in a commercial herbicide active ingredient in the last 20 years”. Moreover, environmental and health hazards concerns related to pesticides have grown since 1960’, which led to stringent regulation of pesticides.^{4} It is imperative to find new substances with herbicide properties to fight weed resistance and that should be in agreement with the environmental protection agencies regulations.
There are different experimental strategies in the search for novel herbicides. They usually involve the measurement and analysis of herbicide activity of sets of analogue compounds. These experimental activities could be further exploited using quantitative structure-activity relationship (QSAR) studies. For that, molecular descriptors, that could be related to biological properties based on the chemical structure of the set of compounds, are calculated and mathematical models are built to predict the biological property using the desirable molecular descriptors. The interpretation of such models could provide information about the ligand-receptor interactions and the mechanism of action of the set of molecules. QSAR models can be also used to predict the biological activity of compounds with similar structures.^{8}^{-}^{11} Therefore, QSAR study is an important research tool when applied to the development of chemicals with biological activity, such as herbicides.^{8}^{,}^{12}
A large number of QSAR studies applied to novel herbicides was published in last few years. Karacan et al.^{12} reported a QSAR analysis on perfluoroisopropyldinitrobenzene derivatives, a type of dinitrophenolic compound, which have been used in crop protection. A 2D-QSAR model of a series of 1H-1,4-benzodiazepine-2,5-dione analogues was proposed by Banjare et al.^{13} and used to generate a library of new compounds with comparable and better expected activity. Teixeira et al.^{14} used photosynthetic inhibition activity data (IC_{50}) for 19 nostoclide analogues to build and analyze a 2D-QSAR model, which indicated that their activity is associated with their polarity. A series of rubrolide analogues were synthesized by Barbosa et al.^{15} and 2D-QSAR models were built and analyzed using their photosynthetic inhibition activity, leading to insights about the most effective substituents among the set. Funar-Timofei et al.^{16} published a QSAR study in combination with molecular docking applied to fused heterocyclic herbicides that act as inhibitors of the D1 protein of photosystem II. A 3D-QSAR study of series of 3-(pyridin-2-yl)benzenesulfonamide derivatives was made by Xie et al.^{17} based on CoMFA and CoMSIA models, which were then used to successfully design new derivatives with superior herbicide activity. In recent a work, Teixeira et al.^{18} synthesized and measured the photosynthetic inhibitory activity of a set of trifluoromethyl arylamides, which were also used to build a 2D-QSAR model.
Herbicides that target the photosystem II compete with the plastoquinone Q_{B} for its binding site and block the electron transfer to plastoquinone Q_{A}, thus inhibiting the photosynthetic electron transport.^{19} Trebst^{20} proposed the classification of the herbicides that act on photosystem II into two families: the serine family, whose herbicides bind oriented towards the D1-Ser264 residue; and the histidine family, whose herbicides bind oriented towards the D1-His215 residue.
Cyanobacterin (Figure 1a) is a natural occurring lactone that has the property of specifically inhibit photosynthetic electron transport.^{21}^{,}^{22} Nostoclides (Figure 1b) are another natural occurring lactones that resembles cyanobacterin, but their herbicide properties were not fully investigated yet.^{1} Teixeira et al.^{1} synthesized and measured the photosynthetic inhibitory activity of a series of nostoclide analogues to investigate their potential phytotoxicity.
In the present study, the reported experimental results were used to build QSAR models to predict the photosynthetic inhibitory activity of 34 nostoclide analogues.^{1} The best models were interpreted in order to understand the ligand-receptor interactions and provide some insight on the mechanism of action of this class of compounds.
Methodology
Data set
A set of 34 3-benzyl-5-(arylmethylene)furan-2(5H)-ones (Figure 2) and their photosynthetic inhibitory activity were used. The aromatic ring of the arylidene portion presented different substitution patterns. The percentages of photosynthetic inhibitory activity (PI) are relative to untreated control for an analogue concentration of 10 µmol L^{-1}. Table 1 shows the nostoclide analogues and the respective PI values. The molecules that presented PI lower than 5% were originally considered ineffective.^{1} In this work, such compounds were included in the QSAR study considering their PI as 5%.
Molecule | R_{1} | R_{2} | R_{3} | R_{4} | R_{5} | PI^{b} / % |
m01 | –H | –H | –H | –H | –H | 29.2 |
m02 | –CH_{3} | –H | –H | –H | –H | 26.0 |
m02i | –H | –H | –H | –H | –CH_{3} | |
m03 | –H | –CH_{3} | –H | –H | –H | 16.7 |
m03i | –H | –H | –H | –CH_{3} | –H | |
m04 | –H | –H | –CH_{3} | –H | –H | 8.0 |
m05 | –H | –H | –CH_{2}CH_{3} | –H | –H | 50.7 |
m06 | –H | –CN | –H | –H | –H | NE^{c} |
m06i | –H | –H | –H | –CN | –H | |
m07 | –H | –H | –CN | –H | –H | NE^{c} |
m08 | –CF_{3} | –H | –H | –H | –H | 44.9 |
m08i | –H | –H | –H | –H | –CF_{3} | |
m09 | –H | –CF_{3} | –H | –H | –H | 49.9 |
m09i | –H | –H | –H | –CF_{3} | –H | |
m10 | –H | –H | –CF_{3} | –H | –H | 55.5 |
m11 | –H | –H | –Ph | –H | –H | NE^{c} |
m12 | –H | –N(CH_{3})_{2} | –H | –H | –H | 38.6 |
m12i | –H | –H | –H | –N(CH_{3})_{2} | –H | |
m13 | –H | –H | –N(CH_{3})_{2} | –H | –H | 7.1 |
m14 | –Cl | –H | –N(CH_{3})_{2} | –H | –H | 25.4 |
m14i | –H | –H | –N(CH_{3})_{2} | –H | –Cl | |
m15 | –H | –NO_{2} | –H | –H | –H | 39.5 |
m15i | –H | –H | –H | –NO_{2} | –H | |
m16 | –H | –H | –NO_{2} | –H | –H | 57.8 |
m17 | –OH | –H | –H | –H | –H | NE^{c} |
m17i | –H | –H | –H | –H | –OH | |
m18 | –H | –OH | –H | –H | –H | 6.9 |
m18i | –H | –H | –H | –OH | –H | |
m19 | –H | –H | –OH | –H | –H | 6.5 |
m20 | –H | –OH | –OCH_{3} | –H | –H | NE^{c} |
m20i | –H | –H | –OCH_{3} | –OH | –H | |
m21 | –H | –H | –OCH_{3} | –H | –H | 15.0 |
m22 | –OCH_{3} | –H | –H | –OCH_{3} | –H | 43.5 |
m22i | –H | –OCH_{3} | –H | –H | –OCH_{3} | |
m23 | –OCH_{3} | –H | –OCH_{3} | –H | –OCH_{3} | 43.6 |
m24 | –H | –OCH_{3}O– | –H | –H | NE^{c} | |
m24i | –H | –H | –OCH_{3}O– | –H | ||
m25 | –F | –H | –H | –H | –H | 49.5 |
m25i | –H | –H | –H | –H | –F | |
m26 | –H | –F | –H | –H | –H | 28.8 |
m26i | –H | –H | –H | –F | –H | |
m27 | –H | –H | –F | –H | –H | 25.6 |
m28 | –F | –F | –F | –F | –F | 6.6 |
m29 | –Cl | –H | –H | –H | –H | 30.8 |
m29i | –H | –H | –H | –H | –Cl | |
m30 | –H | –Cl | –H | –H | –H | 20.3 |
m30i | –H | –H | –H | –Cl | –H | |
m31 | –H | –H | –Cl | –H | –H | 5.8 |
m32 | –Br | –H | –H | –H | –H | 21.8 |
m32i | –H | –H | –H | –H | –Br | |
m33 | –H | –Br | –H | –H | –H | 22.6 |
m33i | –H | –H | –H | –Br | –H | |
m34 | –H | –H | –Br | –H | –H | NE^{c} |
^{a}The analogues highlighted in grey are those used to build the final QSAR models. The substituents R_{1}, R_{2}, R_{3}, R_{4} and R_{5} are related to Figure 2;
^{b}percentage of photosynthetic inhibitory activity;
^{c}not effective: PI < 5%.
Molecular descriptor calculation
The crystallographic structure of compound m03 (Table 1) was retrieved from Cambridge Structural Database (CSD) (code CONPEB).^{23} Geometry optimization was carried out in Gaussian 09 software (G09Rev-D.01) using DFT/B3LYP method and Def2-TZVPP basis set.^{24}^{-}^{27}
Some molecules presented substituents at the meta and ortho positions of the aromatic ring of the arylidene portion. In these cases, several conformations arise from rotation around C1-C2 bond (Figure 2) and there are two preferential ones (Figure 3). Both conformations were optimized and one of them was selected for further analysis, as described in the next sub-sections.
The 2D molecular descriptors were calculated at the same theory level in Gaussian 09 software (G09Rev-D.01)^{24} for the optimized structures. They were: total and molecular orbital energies; molecular orbitals energy gaps; dipole and quadrupole moments; mean polarizability; Mulliken and CHELPG (charges from electrostatic potentials using a grid based method) partial atomic charges; and sum of squares of the wave functions’ molecular orbital coefficients (for the frontier orbitals ranging from highest occupied molecular orbital (HOMO)-4 to lowest unoccupied molecular orbital (LUMO)+4). In addition, the sums of partial atomic charges for certain sets of atoms were also calculated.
The 3D molecular interaction field descriptors were calculated by LQTAgrid software^{8} using a 21 × 15 × 9 Å grid with 1 Å increment and a NH_{3}^{+} probe. The alignment was made using the optimized geometries. The reference structure was compound m01 and the atoms aligned were C8, O9, C10, C11, C12 and O13 (Figure 2). For each grid point, electrostatic and Lennard-Jones potential energies were calculated.^{8}
Conformer selection
From the 34 compounds, 19 assume two different preferential conformations by rotation around the C1-C2 bond, resulting in 19 pairs of conformers. However, only one conformer of each pair was used in the final QSAR models. The selection of a specific conformer in each pair was carried out in two steps. Firstly, a conformational equilibrium at 25 ºC was considered to calculate the population of each conformer according to equation 1. This equation is derived from Boltzmann distribution, where P is the population percentage for each conformer.^{28} E and E_{count.} are the optimization energies of the conformer and its counterpart, respectively; k is Boltzmann’s constant and T is the temperature. The same procedure was applied to each one of the 19 pairs of conformers.
The conformers with P < 1% compared to its counterparts were eliminated. For example, the compound mXXi would be excluded if it presented P < 1% compared to its counterpart, mXX.
Considering the remaining 11 pairs of conformers, one of each was excluded and the remaining compounds were used to build a partial least squares (PLS) regression model with the 2D descriptors. This procedure was repeated exchanging the excluded conformers to test all possible combinations. These regression models were internally validated by leave-one-out cross-validation (LOOCV) and the combination of conformers corresponding to the models with the best statistics was selected.
Descriptor selection
The descriptors selection was carried out on the conformer combination selected in “Conformer selection” sub-section. The 2D descriptors were inspected to remove those degenerated. The following step was to eliminate descriptors that had the absolute value of Pearson’s correlation coefficient (|r|) with biological activity lower than 0.3. The ordered predictors selection algorithm (OPS) implemented in the public domain QSAR modeling software was used to reduce the number of descriptors to be kept in the regression model.^{29}^{,}^{30} Finally, they were manually selected in Pirouette-3.11^{31} software by building PLS regressions and comparing their statistics.
The 3D descriptors were selected by applying the digital filter for molecular interaction field descriptors (comparative distribution detection algorithm, CDDA).^{32} OPS algorithm and Pirouette-3.11 software were used in the next step of the selection. Descriptors’ probes and molecules were visualized using Chimera-1.11.^{33}
All the QSAR-models were built using PLS regressions method on autoscaled data (mean centered and scaled to unity variance) and internally validated by LOOCV.^{34}
Conformer reselection
To ensure that the best combination of conformers was selected, they were retested following the same method described in “Conformer selection” sub-section, but using only the descriptors selected in “Descriptor selection” sub-section. The same procedure was applied for the 2D and 3D molecular descriptors. PLS regressions were performed to choose the best combination of conformers.
The selected combination of conformers and their molecular descriptors were used to build 2D and 3D QSAR models. The 2D and 3D selected descriptors were also combined and reselected manually in Pirouette-3.11^{31} to build a hybrid QSAR model. All the QSAR-models were built using PLS regressions and internally validated by LOOCV (in MATLAB software).^{35} Other validation tests such as leave-N-out (LNO) cross-validation and y-randomization were carried out in QSAR modelling software.^{36}^{,}^{37}
The molecular structure figures were created using ChemDraw Ultra 12.0,^{38} Chimera-1.11,^{33} GaussView 5.0^{39} and MATLAB software^{35} and the graphics were built using Pirouette-3.11^{31} and MATLAB software.^{35}
Results and Discussion
Conformers and molecular descriptors selection
Since conformation at the active site is unknown, two preferential conformations of nostoclide analogues were initially considered (Figure 3). However, only one conformer from each pair was selected for model building. In a first step, the optimization energies of each pair of conformers were compared using equation 1. Only those conformers significantly less stable than its counterpart (P < 1%) were eliminated. The pairs of conformers for which the optimization energies were not significantly different (1 ≤ P ≤ 99%) were selected in the next step.
Conformers m02i, m08i, m14i, m17i, m22i, m25i, m29i and m32i presented P < 1%, thus were excluded, whereas their counterparts, m02, m08, m14, m17, m22, m25, m29 and m32, were kept. The eliminated conformers have an R_{5} substituent, suggesting instability at this position. Similar observation was previously proposed in a structural and conformational study of nostoclide analogues; it was assumed that the preferential conformation was due to a non-bonding steric repulsion between O13 and the substituent at R_{5} position.^{23}
The preferred conformer from the remaining pairs were selected in order to choose those molecular descriptors which provided better predictions based on LOOCV of PLS regressions. Thus, the selected conformers are expected to have a conformation which is closer to the one they assume in its active site, or that better correlates with it.
Molecular descriptors selection was carried out using only the analogues that remained after the first conformer selection. When PLS models were built, 5 analogues appeared to be outliers. Two of them, molecules m23 and m28, are the only structures having substituents in both positions R_{1} and R_{5}. The non-bonding steric repulsion between O13 and the substituent at R_{5} position is unavoidable, causing the arylidene and furanone rings to be in different planes. Although it does not imply a problem in the experimental biological activity data, all the other analogues presented an optimized structure with those rings in the same plane (Figure 4). Therefore, the molecular descriptors calculated for molecules m23 and m28 are significantly different from the other compounds. The plots scores and sample’s residues versus Mahalanobis distance from principal component analysis, presented in Figure 5, confirm the atypical behavior of these two molecules. The remaining outliers were molecules m05, m25 and m34, which presented atypical behavior compared to tendency of similar nostoclide analogues, e.g., m02-m04, m26-m27 and m32-m33, respectively.
The conformer selection did not consider the descriptors selection and the exclusion of the outliers. Therefore, the excluded conformers were re-included and a new conformers selection was carried out (conformer reselection). A similar methodology was applied, but excluding outliers and using only the molecular descriptors selected. The final set of molecules used to build the QSAR models are highlighted in grey in Table 1.
2D, 3D and hybrid QSAR models’ statistics
A QSAR model predictivity can be evaluated by the values of Q^{2+}, R^{2}, and k_{C}, which are, respectively, the cross-validated correlation coefficient, correlation coefficient of multiple determination, Pearson’s correlation coefficient of calibration and slope of measured versus predicted regression line through the origin. According to the literature, a QSAR model can be considered predictive if:^{36}^{,}^{37}
The statistics of the 2D, 3D and hybrid QSAR models are presented in Table 2. All the models satisfy the conditions of inequalities 2 to 6. The root mean square error of calibration (RMSEC) and root mean square error of cross-validation (RMSECV) values of the hybrid QSAR models were slightly better than the 3D QSAR’s, while the 2D QSAR presented higher RMSEC and RMSECV.
Model | 2D-QSAR | 3D-QSAR | Hybrid QSAR |
Number of descriptors | 5 | 7 | 7 |
Latent variables | 1 | 2 | 2 |
Explained variance / % | 57.4 | 63.4 | 61.2 |
k_{C}^{a} | 0.901 | 0.935 | 0.939 |
RMSEC^{b} / % | 9.11 | 7.55 | 7.33 |
^{c} | 0.717 | 0.813 | 0.824 |
R^{2 d} | 0.717 | 0.813 | 0.824 |
RMSECV^{e} / % | 9.60 | 8.52 | 8.38 |
Q^{2 f} | 0.663 | 0.734 | 0.743 |
^{a}Slope of measured versus predicted regression line through the origin;
^{b}root mean square error of calibration;
^{c}Pearson correlation coefficient of calibration;
^{d}correlation coefficient of multiple determination;
^{e}root mean square error of cross-validation;
^{f}cross-validated correlation coefficient.
The measured versus predicted percentages of inhibition plots are presented in Figures 6, 7 and 8 for the 2D, 3D and hybrid QSAR models, respectively. The 2D-QSAR model plot presented absolute deviations greater than 10% for 13 molecules, most of them with measured values of PI between 5 and 27%. The 3D and hybrid QSAR models showed better predictions in this range, although not for all molecules (6 of them with absolute deviation larger than 10%). However, the hybrid QSAR model presented lower absolute deviations overall, resulting in a more evident linearity.
Although statistical parameters previously calculated are necessary to indicate the quality of QSAR models, further validations are required to evaluate its robustness and the existence of a real structure-activity relationship. This can be done through the tests of leave-N-out (LNO) cross-validation and of y-randomization, which are presented in Figures 9, 10 and 11.^{36}^{,}^{37}
The LNO cross-validation test was repeated 10 times for each value of N. The rows from the data matrix were randomized for each replicate, resulting in average values of Q^{2}_{LNO}. For a robust QSAR model, the average Q^{2}_{LNO} is expected to be close to Q^{2}_{LOO} even for a large number of molecules (20-30% from original set) removed from the set. It is also desirable that two times the standard deviation for each N (including the critical one) presents values lower than 0.1.^{36}^{,}^{37} In this work, N varied from 2 to 10. Figures 9a, 10a and 11a show the mean value and standard deviation of Q^{2} for each N and the limits Q^{2}_{LOO} ± 0.05. All the models presented little fluctuation for N varying from 1 to 10. Therefore, all the QSAR models can be considered robust.
The y-randomization test consists in building several QSAR models (50 in this work) for which the vectors of biological activity (y vector) is randomly shuffled. For a good QSAR model, these randomized models are expected to be of poor quality and without real meaning. The basic LOO statistics of the randomized models should be worse than the original model: low Q^{2}_{yrand} and values. A QSAR model that presents relatively high values of Q^{2}_{yrand} and Q^{2}_{yrand} (> 0.2 and > 0.4, respectively) indicates the existence of chance correlation, which means that the real model may contain descriptors which are statistically well correlated to y, but in reality there is no cause-effect relationship encoded in such correlations.^{36}^{,}^{37}^{,}^{40} Figures 9b, 10b and 11b show the plots of R^{2} versus Q^{2} for the y-randomization test for the three models: 2D, 3D and hybrid QSAR, respectively. The original model is labeled as y and the dashed lines indicate the cutoff = 0.4. Based on these results, none of the QSAR models built presented chance correlation.
Analysis of the selected molecular descriptors
The correlation matrix and regression coefficients for the 2D, 3D and hybrid QSAR models are shown in Tables 3, 4 and 5. The correlation matrices of these tables present the descriptor-PI and descriptor-descriptor correlation coefficients for each molecular descriptor in the model. Comparison between the regression and the descriptor-PI correlation coefficients for each molecular descriptor shows that they present the same sign. This implies that the contribution of each descriptor to its model is consistent with its correlation to the PI. Table 6 presents the details of the 2D descriptors and Figure 12 shows a tridimensional plot of the aligned nostoclide analogues and the position of the selected 3D descriptors, where “L” and “E” stands for the Lennard-Jones and the electrostatic potentials, respectively.
CO+CM | wHO02 | QMafYY | QMbYY | QMbZZ | |
wHO02 | 0.310 | ||||
QMafYY | 0.107 | 0.439 | |||
QMbYY | –0.679 | –0.212 | –0.232 | ||
QMbZZ | 0.778 | 0.558 | 0.297 | –0.794 | |
PI^{a} | –0.683 | –0.521 | –0.439 | 0.705 | –0.792 |
R. coef.^{b} | –0.238 | –0.182 | –0.153 | 0.246 | –0.276 |
R. model^{c} | PI = –498(CO + CM) – 283(wHO02) – 0.358(QMafYY) + 75.6(QMbYY) – 82.2(QMbZZ) – 335 |
^{a}Percentage of photosynthetic inhibitory activity;
^{b}regression coefficients for descriptors in autoscaled form;
^{c}regression mathematical model.
E1 | E2 | E3 | E4 | E5 | L1 | L2 | |
E2 | –0.622 | ||||||
E3 | –0.340 | 0.391 | |||||
E4 | –0.333 | 0.407 | 0.991 | ||||
E5 | –0.0848 | 0.0630 | 0.165 | 0.190 | |||
L1 | –0.293 | –0.0468 | 0.213 | 0.239 | 0.487 | ||
L2 | –0.335 | 0.526 | 0.381 | 0.382 | 0.403 | –0.0209 | |
PI^{a} | –0.429 | 0.567 | 0.571 | 0.550 | –0.344 | –0.400 | 0.407 |
R. coef.^{b} | –0.180 | 0.244 | 0.248 | 0.228 | –0.343 | –0.373 | 0.157 |
R. model^{c} | PI = –0.109(E1) + 0.151(E2) + 0.142(E3) + 0.124(E4) – 0.143(E5) – 44.2(L1) + 4.75(L2) – 293 |
^{a}Percentage of photosynthetic inhibitory activity;
^{b}regression coefficients for descriptors in autoscaled form;
^{c}regression mathematical model.
CO+CM | wHO02 | QMafYY | QMbYY | E1 | E3 | L1 | |
wHO02 | 0.310 | ||||||
QMafYY | 0.107 | 0.439 | |||||
QMbYY | –0.679 | –0.212 | –0.232 | ||||
E1 | 0.184 | 0.430 | 0.322 | –0.289 | |||
E3 | –0.285 | –0.499 | –0.440 | 0.441 | –0.340 | ||
L1 | 0.353 | 0.0225 | –0.0269 | –0.278 | –0.293 | 0.213 | |
PI^{a} | –0.683 | –0.521 | –0.439 | 0.705 | –0.429 | 0.571 | –0.400 |
R. coef.^{b} | –0.284 | –0.154 | –0.137 | 0.275 | –0.149 | 0.201 | –0.292 |
R. model^{c} | PI = –593(CO+CM) – 240(wHO02) – 0.320(QMafYY) + 84.6(QMbYY) – 0.0907(E1) + 0.115(E3) – 34.6(L1) – 335 |
^{a}Percentage of photosynthetic inhibitory activity;
^{b}regression coefficients for descriptors in autoscaled form;
^{c}regression mathematical model.
Detail | |
CO+CM | sum of CHELPG atomic charges for ortho and meta carbons of the benzyl’s ring, i.e., C16, C17, C19, C20 |
HO+HM | sum of CHELPG atomic charges for ortho and meta hydrogens of the benzyl’s ring, i.e., H23, H24, H26, H27 |
wHO2 | sum of squares of the wave functions’ molecular orbital coefficients for C02 carbon |
QMafYY^{a} | YY component of the electrostatic quadrupole moment from arylidene and furanone, i.e., atoms 1-13 and 28-34 |
QMbYY^{a} | YY component of the electrostatic quadrupole moment from benzyl’s ring, i.e., atoms 15-20 and 23-27 |
QMbZZ^{a} | ZZ component of the electrostatic quadrupole moment from benzyl’s ring, i.e., atoms 15-20 and 23-27 |
^{a}Regression coefficients for descriptors in autoscaled form. CHELPG: charges from electrostatic potentials using a grid based method.
The most important molecular descriptor for the 2D model (see regression coefficients in Table 3) is QMbZZ, which is one of the components of the electrostatic quadrupole moment of the aromatic ring of the benzyl group. This descriptor is negatively correlated to the PI and its values are negative. It means that higher absolute values of QMbZZ correspond to higher inhibition of the photosystem II. The QMbYY molecular descriptor, another component of the same quadrupole moment, was also selected for the 2D QSAR model. It is positively correlated to the PI and its values are positive, resulting in the same behavior of the QMbZZ. The quadrupole moment of an aromatic ring can be associated with aromatic stacking, sometimes called “pi-stacking”. This is an interaction that is important to many biological systems.^{41}^{,}^{42} At the photosystem II, these interactions were observed only between the plastoquinone Q_{A} and its site in the D2 protein.^{43}^{,}^{44} However, based on a QSAR study, Karacan et al.^{12} proposed the existence of aromatic stacking interactions between a set of perfluoroisopropyl-dinitrobenzen derivatives and the plastoquinone Q_{B} site in the D1 protein. Likewise, the molecular descriptors QMbZZ and QMbYY indicate the possibility of aromatic stacking interactions between the benzyl group of the nostoclide analogues and the residues of the plastoquinone Q_{B} site. Figure 13 shows the contribution of the benzyl quadrupole moment to the electrostatic potential for the m01 analogue, which presents no substituents and medium PI. The blue and red isosurfaces correspond to the same absolute value of electric potential, but with positive and negative signs, respectively.
The positive isosurface of this quadrupole moment is longer in the Y axis than in the X axis. It also means that there is a dipole moment in the aromatic ring of the benzyl group. Studies regarding the toluene aromatic stacking have shown that its interactions occur preferably in a parallel offset manner as a consequence of its dipole moment.^{42} Therefore, the quadrupole moment presented in Figure 13 suggests the occurrence of an offset aromatic stacking, which is consistent with the interaction proposed by Karacan et al.^{12} and with the one observed in the plastoquinone Q_{A} site.^{43}^{,}^{44} The molecular descriptor CO+CM presents reasonable correlation with the descriptors QMbZZ and QMbYY (ca. ± 0.7), because they were calculated based on the CHELPG partial atomic charges of the aromatic ring of the benzyl group. The CO+CM descriptor is relevant for the model and indicates that the proposed aromatic stacking is more sensitive to the electronic density of the ortho and meta carbon atoms of the aromatic ring.
The molecular descriptor QMafYY is also a quadrupole moment, but calculated using only the arylidene and the furanone groups. This descriptor is illustrated in Figure 14 for the m01 analogue, where blue stands for positive and red for negative values of the isosurface. It is negatively correlated to the PI and, consequently, a higher inhibitory activity corresponds to a more negative value of QMafYY. As a result, there is an approximation of the red portion of the electric potential isosurface to the Y axis. This descriptor is also reasonably well correlated to both charge of the atom at position R_{3} and charge of the oxygen O13 (correlation coefficients of –0.8598 and –0.5139, respectively). The substitution at position R_{3} is the most relevant for the QMafYY descriptor, which influences the atomic charge of the oxygen O13 and, consequently, the PI. For instance, an increase in the atomic charge of the atom in R_{3}, such as in m10 and m16, leads to a higher charge in oxygen O13 and an increase in PI. Several studies^{12}^{,}^{16}^{,}^{21} of herbicides targeting the D1 protein of photosystem II have shown the importance of the hydrogen bond interaction between these herbicides and the His215 residue for their photosynthetic inhibition activity. Moreover, a voltammetric study of rubrolide analogues showed the possibility of these compounds to act as hydrogen acceptor at photosystem I, being reduced in the process.^{45} More recently, Teixeira et al.^{18} proposed a photoelectron transfer to the herbicide via His215, forming a radical anion. Based on these results and the QMafYY descriptor, it is reasonable to propose a similar photoelectron transfer to the nostoclide analogues via a hydrogen bond interaction with the oxygen O13, which is favored by an electron deficiency at the atom in position R_{3}.
The wHO02 molecular descriptor indicates the electronic density of the HOMO located at carbon C2. However, the HOMO electron densities at the other atoms are also dependent on the wHO02 descriptor. Figure 15 shows the HOMO electronic density for molecules m13 and m16, one of the least and one of the most active analogues, respectively; carbon C2 is highlighted. The electron density in this atom is greater for m13 than for m16, as it is expected by the negative correlation between the wHO02 descriptor and PI. The change in electron density is observed in both arylidene and furanone rings, reaching the methylene of the benzyl. Although wHO02 have shown to be significant to the 2D QSAR model, it was not possible to attribute a conclusive interpretation for its concrete relation to the photosynthetic inhibition of the nostoclides.
Among the 3D descriptors, the most important to the 3D QSAR model was L1, a Lennard-Jones potential energy for the NH_{3}^{+} probe located in the proximity of the aromatic ring of the arylidene portions (Figure 16). Its correlation with the PI is negative. On the other hand, the more negative the energy values of the van der Waals potential results in a more stable interaction. The L1 descriptor’s probe is positioned above the aromatic ring of the arylidene portions, which is consistent with an aromatic stacking, known to have a contribution from the van der Waals potential.^{46}
The E5 descriptor has the second highest absolute regression coefficient, it is negatively correlated to PI and its values are negative. The NH_{3}^{+} probe is located very close to carbon C16 of the benzyl group, which is one of the ortho ones. This descriptor is possibly related to the QMbYY 2D descriptor, located in the same ring, because their correlation coefficient is –0.6164 and both have electrostatic nature. Their interpretation is also similar: a more negative value of E5 leads to more negative electrostatic density close to the probe, which results in a more positive value of the QMbYY. E5 might be related to aromatic stacking interactions between the nostoclide analogues and the plastoquinone Q_{B} site, as well as the QMbYY was theorized to be. Similarly, the descriptors E3 and E4 show some correlation to QMbZZ (correlation coefficients of –0.5964 for E3, and –0.5792 for E4), thus, could be related to the aromatic stacking of the benzyl’s ring.
The E2 descriptor’s probe is in the proximity of oxygen O9 and the carbonyl C10=O13 and its values are negative, which corresponds to a negative electrostatic density. It is positively correlated to the PI, a similar behavior compared to QMafYY molecular descriptor regarding positions R_{3} and the oxygen O13. The E2 descriptor suggests a similar chemical interpretation as the QMafYY, which could be related to the photoelectron transfer to the nostoclide analogues via His215.
The probe of E1 descriptor is located between the R_{3} and R_{4} positions of the aromatic ring of the arylidene portions. This descriptor’s values vary from negative to positive depending on the substituent, but it is negatively correlated with PI. Consequently, substituents with more negative electrostatic density at positions R_{3} and R_{4} (i.e., CF_{3}, NO_{2} and F) favor the photosynthetic inhibition property. Barbosa et al.^{15} observed that these substituents are also important for photosynthetic inhibition of rubrolide analogues. L2 is a Lennard-Jones potential descriptor located close to the benzyl group, whose atoms do not present significant variance of their positions. Therefore, it was not possible to make a consistent interpretation for this molecular descriptor.
Descriptors included in the hybrid QSAR model were selected among those used to build both the 2D and the 3D QSAR models. The selection was performed using PLS regressions and analyzing which combination would give rise to a QSAR model of good quality. The molecular descriptors CO+CM, QMbYY, and E3 were selected over the QMbZZ, E4 and E5, regarding the electrostatic interactions of the benzyl group. Also, the molecular descriptor QMafYY was selected over the E2 to take into account the role of the electrostatic density of the oxygens from the furanone ring. The descriptors E1, L1 and wHO02 presented complementary importance for the hybrid QSAR model, and thus, they were all kept in the model. The descriptor L2 was excluded. As can be seen from Table 2, Figures 8 and 11, the statistics of this final model is somewhat better than the previous.
Conclusions
Experimental data of the percentage of inhibition of a set of 34 nostoclide analogues, published by Teixeira et al. in 2008, was used to build QSAR models based on 2D and 3D molecular descriptors. Although the 2D QSAR model presented higher errors in the leave-one-out cross-validation, the selected descriptors showed superior correlation coefficients with PI. Furthermore, its chemical interpretation gave some insight regarding the interactions between the analogues and the plastoquinone Q_{B} active site from photosystem II. On the other hand, the 3D QSAR model showed lower errors in the internal validations and some complementary information about the ligand-receptor interactions regarding the photosynthetic inhibitory activity of the studied compounds. The hybrid QSAR model, presenting the best statistics, highlighted the most important molecular descriptors for the PI. Analysis of the molecular descriptors used to build the QSAR models showed the major importance of electrostatic interactions for the activity of the nostoclide analogues. These interactions were observed, primarily, in the aromatic ring of the benzyl group, and associated with an aromatic stacking interaction due to the electrostatic quadrupole moment of the ring. For the studied nostoclides, strong electron withdrawing substituents, such as CF_{3} and NO_{2} (present in the several of the most actives analogues, i.e., m08-10 and m15-16), tend to increase the quadrupole moment of the aromatic ring of the benzyl group. Another important interaction proposed in this work was the photoelectron transfer to the nostoclide analogues via hydrogen bond between their carbonyl and the His215 residue of the D1 protein. Van der Waals interaction observed near the arylidene group presented correlation with the PI. The analyses of the chemical properties of the nostoclide analogues and their relation to the inhibition of the photosystem II presented important insights about the ligand-receptor interactions of these herbicides. The results obtained in this work are expected to be useful in future researches of herbicides targeting the photosystem II, especially for the further experimental development of novel nostoclide analogues.