Molecular Graphics-Structural and Molecular Graphics Descriptors in a QSAR Study of 17-α-Acetoxyprogesterones

Neste trabalho, foi feito um estudo de relações quantitativas entre a estrutura e a atividade biológica de 21 derivados de progesteronas ministrados via oral, dentre os quais 19 são 17-αacetoxiprogesteronas. O método de quadrados mínimos parciais foi usado para construir modelos de regressão de boa qualidade, com Q = 0,707 e R = 0,811 utilizando duas componentes principais e quatro descritores. A maioria dos descritores moleculares foi obtida a partir de gráficos moleculares das geometrias otimizadas por meio de cálculos ab initio com um conjunto de base DFT 6-31G** (descritores moleculares gráficos). Os outros descritores foram obtidos pela combinação dos descritores anteriores com parâmetros estruturais experimentais extraídos do complexo progesteronareceptor da progesterona (descritores gráfico-estruturais ou descritores gráficos e de modelagem). Os modelos de regressão empregando somente cinco descritores gráficos e três componentes principais foram satisfatórios, Q=0,556, R=0,718, demonstrando a utilidade dos mesmos em estudos QSAR. Neste trabalho, onde foram estudados derivados de progesterona, ficou evidente que os descritores moleculares gráficos descreveram com sucesso os efeitos conformacionais, estéreos e eletrônicos dos diferentes substituintes.


Introduction
Progestogens, progesterone derivatives, nowadays are widely known as oral contraceptives.Besides that, health research [1][2][3][4] (hormone replacement and anti-cancer therapies, gynecological disorders etc.) and veterinary practice (animal birth control) 5,6 are today the two most promising areas of progestogen applications.It is difficult to have an entirely clear picture of the progestogen behavior at atomic level due to the lack of large amount and homogeneity of progestogen activity data.Progesterone derivatives have been used as target of various Structure-Activity Relationship (SAR) and Quantitative SAR (QSAR) studies since four decades ago. 7Progesterone (Figure 1), although having relatively simple structure, is a quite complicated molecular system.That is why researchers had to confront with the difficulty in quantifying the progestogen molecular properties without knowing 3D receptor-drug structure, and also to develop appropriate methods to treat the nonlinearity of steroid QSAR. 8Recently, the crystal structure of progesterone receptor (PR) -progesterone complex 9 made possible to explain mutations at atomic level 10 and perform more advanced drug design.
This work continues the SAR 11,12 and QSAR 13 idea to relate molecular descriptors of 21 oral progesterones to their oral progestational activity (relative to norethisterone, IC), 14 19 of which are 17-α-acetoxyprogesterones (Figure 2, Table 1), at the level of prediction using Partial Least Squares (PLS) 15 regression models.In previous work 13 various classes of molecular descriptors, as a priori, 16 computed (at DFT ab initio level) and some molecular graphics-based descriptors were calculated, and PLS models were constructed and validated.It was observed that the new class of descriptors, molecular graphics-based descriptors, had significant contribution in PLS models, but more extensive study about this finding has not been performed yet.High-quality 2D projections of molecules or molecular aggregates obtained by current molecular graphics techniques can be an extensive source of quantitative information on molecular properties.In general, quantities directly "measured" from pictures using some digital or analogue technique can be 1D (linear, like molecular dimensions), 2D (surface areas of various molecular fragments projected onto the plane of projection or screen) or 3D (as molecular volume in some cases).Such measured descriptors, their combinations or functions can be named molecular graphic descriptors.Combination of these descriptors with some structural information from other sources (like data from experimental structure determination or molecular modeling), yields composite functions which can be called molecular graphics-structural descriptors or molecular graphics and modeling descriptors.Both classes of descriptors can be global (describing the entire molecule) or local (being related to some molecular fragment).Using molecular graphics descriptors and some structural information from PR -progestogen complex modeling 13 based on crystal structure of PR -progesterone complex, 9 three sets of molecular graphics-structural descriptors were generated in this work.PLS models were built and validated for each data set, and the prediction of activity for three progestogens was performed (Figure 2).Finally, two composite descriptors, unweighted 3D-Morse signals 4 and 11, 17 were added to the data set and new PLS models were constructed, validated and the predictions were performed.The meaning of the molecular-graphics based descriptors as well as their usefulness is discussed.The main goal of the regression analysis is to estimate the predictive power of PLS models based exclusively on molecular graphics-based descriptors, or on these and other types of molecular descriptors.

Molecular graphics descriptors
In previous work 13 molecular geometries of progestogens 1-24 were optimized at DFT 6-31G** level, and high-quality figures of the molecules were constructed by positioning them along the C6(sp 2 )-substituent or C6(sp 3 )-α-substituent bond.Two projected surface areas were measured by analogue, empirical method 16 as shown in Figure 3: the projected surface areas of substituent at C6 (S 6 -including, and S 6 ' -excluding hydrogens), and the projected surface area S of atoms or groups describing structural variations of the set 1-21 (H6β and substituents at C1, C2, C6-β, C9-C13, C21).S 6 ' was set to zero for C6-αH atom.The choice of surfaces S 6 and S as molecular descriptors seemed to be reasonable as their structural variations are in accordance with the induced fit model. 7ost of substitutions in 1-21 are at position C6-α, and it was preliminary observed 13 that the biological activity of this set of compounds is a quadratic function of the   substituent size.On the other side, groups at positions C10, C13 or C21 also affect the activity.The third phenomenon to be taken into account is the saturation e.g. the presence of double bonds C1=C2, C4=C5 and/or C6=C7, as has been observed almost four decades ago. 14ll these phenomena define the active conformation of the steroids, as Zeelen concluded more than two decades ago. 18Molecular graphics observations on studied compounds 13 suggested that most of these structural changes can be better viewed along certain directions.The following composite molecular graphics descriptors were calculated: where w a = c  2.

Molecular graphics-structural descriptors
Preliminary molecular graphics and modeling study on PR -progestogen complexes 13 including compounds 1-21 revealed that the nonlinear character of progestogen activity is mainly related to sterical relationships between the substituent at C6 (especially the substituent atom bound to C6) and sulfur atom from methionine 801 residue (Figure 4).The most appropriate substituents at C6 are Cl and CH 3 , while small (H, F) and big (Br) ones reduce the activity.Electronic relationships were not so clearly observed, but the polar NO 2 group which is of the appropriate size to fit in the hole between S(Met801) and C6 significantly reduces the activity with respect to that of 1. S(Met801) participates in interactions with C7-H 2 group, which can be disturbed by large C6 substituents.On the other side, hydrophobic Met801 residue prefers to interact with substituents of similar hydrophobicity, i.e. with non-polar or slightly polar groups such as CH 3 or higher halogens.The steric effects of C6 substituents were incorporated into new molecular graphics-structural descriptors in the following way: interatomic distances between S(Met801) and atoms of C6 substituent were measured; 13 D XS -X…S distance (X is the substituent atom covalently bound to C6); D YS -Y…S distance (Y is the closest substituent atom to the Met801 sulfur); D ZS -equals to D XS .
Van der Waals radii determined by Bondi 18 were used: R X -vdW radius of atom X, R Y -vdW radius of Y, R Zequals to R X with exception of CH 3 group (2.0 instead of 1.70 Å), and R S = 1.80 Å for S. The measure of S-X,Y proximity was calculated as where T=X, Y, Z.The three sets of D TS , R T values and experimental biological activities 11 are in Table 2.In this way, the substituent sterical effect was linearized.Six molecular graphics-structural descriptors were calculated for each data set:

Partial Least Squares models
Three data sets I, II and III were generated such that the descriptors S 6 , S, S 6 ', P 1 -P 4 were common to all of them.In  addition, the descriptors M 1T -M 4T , P 5T , P 6T (for T = X, Y, and Z) were calculated for data sets I, II and III, respectively.Variable selection, validation (leave-one out crossvalidation) of the models and prediction were performed using Pirouette software. 20The data sets were treated independently.The main purpose of this data analysis was to evaluate the binding power and usability of molecular graphics and molecular graphics-structural descriptors for the progestogen QSAR.Finally, knowing that these descriptors cannot describe entirely the 17-aacetoxyprogesterone activity, two more descriptors previosly calculated 13 -unweighted 3D-Morse signals 4 (M 04 ) and 11 (M 11 ), 17 were added to each data set.The final models were compared to molecular graphics-based models.

Results and Discussion
Correlations of all molecular descriptors with the biological activity (log IC) are presented in Table 3. PLS models for data sets I, II and III and predictions for 22-24 are in Table 4; the models are named (a) when using all molecular graphics-based descriptors, (b) after variable selection of these descriptors, (c) all descriptors from (a) plus two 3D-Morse signal descriptors, (d) models analogous to the best model from our previous work 13 which is presented in Figure 5 (Id).
The correlation coefficients from Table 3 reveal interesting structure-activity relationships.Correlations concerning S, S 6 , S 6 ' are low, moderate and high, respectively.By other words, the activity for 1-21 is determined more by C6 substituents than by any other (at C18, C19, C21 substitution sites), what is expected since the active site hole of PR, even after its complexation with progesterone, has the largest unoccupied space around C6. 13 Exclusion of hydrogens in S 6 ' showed to be even a better choice, as they could be considered soft atoms, and so methyl and ethyl groups can be approximated as one or two carbon atoms.Weighted and unweighted linear combinations of descriptors P 1 -P 4 (equations 1-4) result in better descriptors, what is in accordance with above mentioned induced fit model when all the changes (substitutions, saturations, etc.) should determine the active conformation of a progestogen.M 1 -M 4 descriptors (equations 6-9) represent extended weighting scheme for linear combination of S and S 6 '; variable D includes information about the proximity between sulfur from Met801 residue and some substituent atom.If the distance between the atoms is much greater than van der Waals sum of the atomic radii (usually with 0.2 Å tolerance), the atoms have no contact and so no interaction occur.If the atoms penetrate to each other beyond this tolerance, electron correlation would set them at an equilibrium distance, interfering into other drug-protein interactions; by other words, the steric effect would reduce the activity.This substituent size-activity relationship showed to be strictly quadratic at substitution sites C6 and C21. 13 Thus the absolute value of ∆ (equation 5) would linearize this effect.M 1 -M 4 seem not to be better than S 6 ' and P 1 -P 4 (Table 3).Inclusion of ∆ into P 5 and P 6 (equations 10 and 11) give new variables which are much better than P 1 -P 4 and even more suitable than S 6 ' (correlation coefficients reaching 0.79).One could conclude a priori that S 6 ', P 5 and P 6 would be the best variables for QSAR models, but more precise study on the subject should be carried out in variable selection.
The PLS study presented in Table 4 suggests which descriptors would be the most appropriate for progestogen QSAR, and also that models with good quality can be achieved using only these molecular graphics-based descriptors.Two cases should be distinguished among the twelve models: models based only on molecular graphics and molecular graphics-structural descriptors, and models including 3D-Morse variables.In the first case, data compression yields three significant Principal Components (models Ia, IIa, IIIa) and reaches R=0.841, Q=0.695 (model IIa).After variable selection further compression is achieved for Ib and IIIb, while not for IIb; these models include two more variables than the models just suggested above (P 1 and P 3 show to be the best descriptors, Table 3).These models are not much better than the previous ones (R=0.847,Q=0.746), but nicely illustrate the meaning and usefulness of molecular graphics-based descriptors: five of them are necessary for a QSAR model, and they can describe the PR -progestogen binding as 2D phenomena.These facts encourage the search for other types of descriptors which can bring some new information on molecular properties, so new PLS models would be more quantitative.Models Ic, IIc, IIIc are such an attempt.The best R=0.932, Q=0.829 (IIc) and standard error of prediction (SEP) was reduced with respect to the previous models.On the other side, these models use three or more PCs and all the variables.Descriptors M 04 and M 11 contain some information in common with molecular graphics-based descriptors, so only  four descriptors (S 6 ', P 5 , M 04 , M 11 ) are sufficient to build a good PLS model (Id, IId, IIId) with only two PCs (the best R=0.909, Q=0.845 in model IId).These three models seem to be the best ones in QSAR studies.Prediction of activities for 22-24, compared to the expected 13 (22 -non-active due to lack of Me at C18 and C19; 23 -less active than its chlorine analogue; 24 -highly active due to Et placed left to S from Met801), is an additional criterion for searching the best models.Molecule 22 should be even less active than 2 (IC<0.07), the activitiy of 23 in between that of 19 and 21 (IC=23 to 50), and 24 is expected to be far more active than 10 (IC>3) and even more than 21 (IC>50).Models Ic, IIc and IIIc predicted the activities in decreasing order 24 -22 -23 instead of 24 -23 -22 as expected.Considering all the parameters, Id (Figure 5), IId and IIId can be used as the best PLS models.Of course, in the case where only molecular graphics based-descriptors are utilized, Ia, IIa and IIIa are recommendable models.

Conclusions
New molecular graphics-based descriptors were defined and calculated for 24 progestogens in a QSAR study.Biological activities (oral progestational activities relative to norethisterone) were calculated for 1-21 and predicted for 22-24 employing various PLS models.The best PLS models include molecular graphics-based and 3D-Morse descriptors, and reproduce biological activities for 1-21 satisfactorily well.PLS models show that molecular graphics and molecular graphics-structural descriptors, although having prevalent contribution, are not sufficient to build a high-quality PLS model.The chemical meaning of molecular graphics-based descriptors for progestogens is fully understandable in terms of induced fit model.Prediction of activity for for 22-24 using the best models is in accord with expectations.

Figure 4 .
Figure 4. PR -6α-choro-progesterone complex at the active site of the protein.The closest amino-acid residues are shown.The drug (dark) was modeled by replacing 6α-H of progesterone by chlorine atom in the crystal structure of PR -progesterone complex (PDB code: 1A28 9 ).Sulfur from methionine 801 (light) is in close contact with Cl atom of the drug.

Figure 3 .
Figure 3. Definition of molecular graphics descriptors S 6 and S for molecule 23.

Figure 5 .
Figure 5. PLS plot for model Id.

Table 3 .
Correlation coefficients between the biological activity (log IC) and molecular descriptors