Descriptor-and Fragment-based QSAR Models for a Series of Schistosoma mansoni Purine Nucleoside Inhibitors

A enzima purina nucleosídeo fosforilase de Schistosoma mansoni (SmPNP) é um alvo molecular atrativo para o tratamento de importantes doenças infecciosas parasitárias, com especial ênfase para o seu papel na descoberta de novos fármacos contra a esquistossomose, uma doença tropical que afeta cerca de 200 milhões de pessoas em 74 áreas endêmicas no mundo todo. No presente trabalho, a potência inibitória foi determinada e estudos das relações quantitativas entre a estrutura e atividade (QSAR), baseados em descritores e fragmentos, foram desenvolvidos para uma série de 9-deazaguaninas que atuam como inibidores da SmPNP. Parâmetros estatísticos significantes (modelo baseado em descritor: r = 0,79; q = 0,62, rpred = 0,52; e modelo baseado em fragmento: r = 0,95; q = 0,81; rpred = 0,80) foram obtidos, indicando o potencial dos modelos para compostos ainda não testados. O modelo baseado em fragmento foi então usado para predizer a potência inibitória de um conjunto teste de compostos, e os valores preditos estão em boa concordância com os resultados experimentais.


Introduction
Purine nucleoside phosphorylase (PNP, EC 2.4.2.1) plays an important role in the purine salvage pathway and has long been explored in drug design for the therapy of cancer and auto-immune diseases. 16][7][8][9] In this context, the use of selective PNP inhibitors from S. mansoni (SmPNP) can cause purine starvation, leading to death of the parasite.Schistosomiasis is a major infectious disease that affects 200 million people in 74 endemic areas worldwide. 4][12] Vol.22, No. 9, 2011   This scenario prompted us to investigate several 9-deazaguanine analogs, which have been described as promising SmPNP inhibitors. 10In the present study, we have collected values of IC 50 for a series of ground-state inhibitors of SmPNP and used the data to create descriptorand fragment-based quantitative structure-activity relationship (QSAR) models which show substantial predictive promise.Our strategy took advantage of previous structure-based drug design (SBDD) studies that revealed essential requirements for SmPNP binding affinity and selectivity (e.g., binding to the hydrophobic pocket near Phe161, H-bonding to Tyr201). 10 The results reported herein revealed important molecular requirements for the design of new PNP inhibitors with improved potency.

Biochemical assays and data set composition
4][15][16] The reaction mixture contained 5 nmol L -1 SmPNP (as the monomer), 50 mmol L -1 phosphate buffer (K 3 PO 4 , pH 7.4), 10 µmol L -1 inosine, and xanthine oxidase 40 milliunits mL -1 .Uric acid formation was monitored at 293 nm, in triplicate at 25 o C (extinction coefficient for uric acid, e 293 = 12.9 L mmol -1 cm -1 ). 15The percentage of inhibition was calculated according to the following equation: where, V i and V 0 are the initial velocities (enzyme activities) determined in the presence and in the absence of inhibitor, respectively.Compound 3, a known SmPNP inhibitor, was used as a positive control for enzyme inhibition. 10 1) and test (compounds 20-26, Table 1) sets so that both datasets present structural diversity and cover the whole dataset potency range.

Descriptor-based QSAR approach
About 2,500 2D molecular descriptors, including topological descriptors, connectivity indices, 2D autocorrelation and physicochemical descriptors and so forth, were computed using the DRAGON 5.5 software (Talette SRL, Milan, Italy) and then pre-selected as follows: descriptors with high inter-correlation (≥ 97%) or those poorly related to the biological property (r 2 < 0.10) were discarded.This strategy yielded 218 physicochemical descriptors that were employed to build multiple linear regression models (MLR) with up to 3 descriptors per model, as available in MOBYDIGS 1.0 software (Talette SRL, Milan, Italy).The MLR models were generated by genetic algorithm using the following fitting criteria: QUIK rule (0.005), asymptotic Q2 rule (−0.005), redundancy Table 1.continuation RP rule (0.1) and overfitting RN rule (0.01). 18Due to the stochastic nature of the genetic algorithm, the search was carried out using ten independent populations of 100 models each that evolved for more than 1000 generations or at least one million steps.The descriptors found in the 10 best models of each population were polled together, autoscaled and employed to develop partial least squares (PLS) models, as implemented in the PIROUETTE 4.0 software (Infometrix, Washington, USA).

Fragment-based QSAR strategy
0][21] Briefly, each molecule in the dataset is broken down into several unique structural fragments (linear, branched, and overlapping), which are arranged within the bins of a fixed length array (53 to 401 bins) to form a molecular hologram.The bin occupancies can be considered as structural descriptors encoding compositional and topological molecular information.Parameters that affect hologram generation such as hologram length, fragment size and fragment distinction (atoms (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch), and donor/acceptor (DA)) were evaluated during model development, using default fragment size 4-7 over the 12 default series of hologram lengths.Next, the influence of fragment size was further investigated for the best models.All models generated in this study were investigated using the full cross-validated r 2 (q 2 ) partial least squares (PLS) leaveone-out (LOO) method.

QSAR model validation
External validation was carried out using a test set of seven compounds, which were not considered for the purpose of QSAR model development.The predictive ability of the models was estimated as described previously. 22

Results and Discussion
In the present work, a series of twenty six structurally diverse compounds (Table 1, and Supplementary Information) was evaluated to determine the in vitro potency (IC 50 ) through kinetic studies.As expected based on previous studies, 10,17 these are competitive inhibitors of SmPNP.For instance, double reciprocal plots of velocity as a function of substrate for compounds 15 and 16 show that V max (intercept value of 1/v 0 ) is constant at all inhibitor concentrations, whereas the apparent value of K M (x-intercept, −1/K M ) changes with increasing inhibitor concentration (Figure 1).This experimental behavior is observed for all SmPNP inhibitors, whose IC 50 values range from 0.1 to 200 mM, a factor of potency of 2000.
Although structure-activity relationships (SAR) have been widely described in the last decades for groundstate mammalian PNP inhibitors, the opposite situation is true for SmPNP inhibitors.It was only more recently that the first SAR studies were provided in the literature, describing key structural requirements for SmPNP affinity and selectivity. 10,17These studies suggest that hydrophobic interactions in the active site of SmPNP play an important role in the binding affinity of the inhibitors.In spite of their significance and usefulness, the SAR information, of qualitative nature, would gain strategic advantages in drug design through the incorporation of statistical predictive modeling capabilities. 23In this context, QSAR models are useful tools for the quantitative analysis of the internal consistency and predictive ability of different data sets of compounds, with the advantage of revealing important molecular features associated with biological activities. 24,25he synergy between descriptor-based and fragmentbased QSAR models has been a valuable approach to boost SAR studies, due to the complementary nature of these ligand-based drug design (LBDD) strategies. 26,27Thus, our initial efforts focused on the development of QSAR models by means of topological descriptors that account for molecular size, shape and branching through graph theoretical invariants (using the DRAGON 5.5 software).Additional information regarding molecular charge and polarizability was also considered through the weighting of the descriptors. 28A total of 2489 descriptors were calculated, and the highly correlated and those that convey no information towards the biological activity (constant and r 2 < 0.10) were excluded from further consideration.This protocol afforded 218 descriptors that were employed to build a number of preliminary QSAR models by multiple linear regression (MLR), containing up to 3 descriptors.While the best MLR model obtained showed good internal statistical parameters (n = 19, r 2 = 0.82, q 2 = 0.78), the predictive ability was poor (r 2 pred = 0.17).This suggests that the chemical and structural features captured in the model do not extend beyond the chemical space of trainingset compounds, limiting its usefulness in drug design.Therefore, we resorted to more powerful statistical tools, such as PLS.For this purpose, the descriptors found in the 10 best models from each population were gathered, autoscaled and used for further independent QSAR modeling.
Although our initial QSAR models showed inferior statistical parameters (r 2 = 0.64 and q 2 = 0.51, and 3 components), the iterative exclusion of the descriptors that presented a lower contribution to the regression vector led to improved models.The final QSAR model (r 2 = 0.79, q 2 = 0.62, and 2 principal components) (Table 2) showed an increased predictive ability (r 2 pred = 0.52) when compared to the MLR models (Figure 2 and Table 3), though insufficient for guiding the design of more potent SmPNP inhibitors.
Thus, the analysis of the descriptors that have major contributions to the QSAR regression vector would depict misleading structure-activity relationships that hold true only for the training set compounds.In fact, the low predictive ability of descriptor-based QSAR models may suggest that compounds 22 and 24 are outliers, however, as can be seen below, a careful investigation indicates that their high residual values are a consequence of topological  descriptors shortcomings, such as ineffective sampling of the deazapurine-analogs chemical space.As part of our strategies in medicinal chemistry, we employed the fragment-based hologram QSAR (HQSAR) approach to investigate the crucial structural features related to SmPNP inhibition.HQSAR is an interesting method for this particular study, as no 3D structural information is required (e.g., macromolecular target, putative binding information). 20,21HQSAR investigations require the evaluation of parameters that specify the length of the hologram, as well as the size and type of fragment that are to be encoded.Several combinations of fragment distinction were considered during the QSAR modeling runs.The generation of molecular fragments was carried out using the following fragment distinctions: atoms (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch), and donor and acceptor (DA).In order to assess the process of hologram generation, several combinations of these parameters were considered using the fragment size default  4).The patterns of fragment counts from the training set inhibitors were then related to the experimental biological data using PLS, as summarized in Table 4.
The influence of fragment distinction parameters has considerable effects on the quality of the models.As it can be seen in Table 4, the best statistical results among all models were obtained for models 5 (q 2 = 0.79, r 2 = 0.96, and 4 components) and 8 (q 2 = 0.81, r 2 = 0.95, and 4 components).These models were derived using A/B/H and A/B/H/Ch as fragment distinction, respectively.The use of other fragment distinctions into the molecular holograms did not improve the statistical quality of the models as shown in Table 4.It is worth noting that due to the intrinsic nature of different and highly diverse data sets, several different combinations of fragments must be considered in order to generate the best final HQSAR model. 29reviously, it has been shown that an extensive H-bonding network is responsible for the binding affinity of the 9-deazaguanine derivatives into the active site of SmPNP. 10 This is in good agreement with our present studies, in which the presence of the fragment distinction H is highlighted in the best models 5 and 8.The influence of different fragment size in the statistical   parameters was further investigated for the two best HQSAR models (models 5 and 8, Table 4).Fragment size parameters control the minimum and maximum length of fragments to be included in the hologram fingerprint.Table 5 summarizes the statistical results for the distinct fragment sizes used to generate the QSAR models.As it can be seen, the variation of fragment size did not lead to the generation of better HQSAR models, and, therefore, the best statistical results were obtained with default fragment size (4-7) in both cases (A/B/H, model 5; and A/B/H/Ch, model 8).
It is important to note that the high q 2 values obtained for the best HQSAR models do not imply automatically that these models would possess high predictive ability for external compounds. 30The most important test of a QSAR model is its ability to predict the property value for new structurally related compounds.The predictive power of the best HQSAR model derived using the training set molecules (model 8; fragment distinction A/B/H/Ch, and fragment size 4-7) was assessed by predicting pIC50 values for 7 test set molecules (compounds 20-26, Table 1) that were completely excluded during the training of the model.The results are listed in Table 3, and the graphic results for the experimental vs. predicted activities of both training set and test set are displayed in Figure 2. The good agreement between experimental and predicted values for test set compounds indicates the reliability of the constructed HQSAR model (r 2 pred = 0.80).The graphic results further show the consistency between experimental and predicted pIC 50 values of both training and test sets.The low residual values shown in Table 3 suggests that the HQSAR model obtained can be used to predict the biological activity of novel compounds within this structural class.The predicted values fall close to the experimental pIC 50 values, deviating by less than 0.7 log units.The results show that the test set compounds are well predicted without any outliers (Figure 3).On the other hand, the quality of the results obtained for the external prediction of model 5 (r 2 pred = 0.71), under similar conditions, was not comparable with that of the model 8 (results not shown).
Useful fragment-based QSAR models should not only have statistical quality and predictive power, but also provide hints about which molecular fragments may be important to activity.Usually, the interpretation of the descriptors found in QSAR equations gives some clues about key electronic and steric components, which are essential for the biological property.Besides that, HQSAR has the advantage of offering an alternative and easier way to analyze the individual atomic contributions through a visual assessment of the different molecules of the data set.During the HQSAR analysis, the molecules can be colored to reflect their contribution (e.g., positive, neutral or detrimental) to the biological activity of interest.The colors reflecting poor contributions are at the red end of the spectrum (red, red orange, and orange), while the Vol.22, No. 9, 2011 colors reflecting favorable contributions are at the green end (yellow, green blue, and blue).Atoms colored white reflect neutral contributions. 31Surprisingly, comparison of the contribution maps of compounds 14, 15 and 26 reveal that the purine ring might have opposing effects toward potency (Figure 4).This result can be explained by the H-bonding requirements in the SmPNP active site.On one hand, it has been proposed that compounds possessing aryl groups in the 9 position of the purine ring (such as 15 and 26) can reach the hydrophobic pocket in the vicinity of Phe161. 17On the other hand, 9-substituted compounds with shorter and non-planar chains can bind loosely, being easily displaced by water molecules.Taken together, these evidences clarify the opposite role of the fragments of compound 14 in the H-bonding to Asn245 and Glu203 (reddish colored, poor H-bonding capability) in comparison with the corresponding fragments in compounds 15 and 26 (colored in green, stronger H-bonding network).

Conclusions
In spite of the urgent need for novel drugs for tropical infectious diseases, the investments in research and development (R&D) have been inadequate, as a consequence of the lack of interest shown by the major pharmaceutical and biotechnological companies.In order to circumvent this problem, most of the efforts devoted to the area of neglected diseases is observed in academia and non-governmental organizations, through public-private partnerships. 32However, the main focus is on the early efforts to identify good targets or identify new leads for individual diseases, leaving a crucial gap in the current research and development pipeline.In this work, we have generated important descriptor-and fragment-based QSAR models for a series of 9-deazaguanines as potent inhibitors of SmPNP, showing high internal and external consistency.In addition, the fragment-based model exhibited high predictive power for new compounds within this structural diversity.The molecular information gathered in this study should be useful for future efforts in the design of new inhibitors having increased affinity and selectivity.
in reference 10; b biological data available in reference 17.

Figure 2 .
Figure 2. Plot of predicted vs. experimental values of pIC 50 for the 26 SmPNP inhibitors (training and test sets) according to the 2D descriptorbased QSAR model.

Figure 3 .
Figure 3. Plot of predicted vs. experimental values of pIC 50 for the 26 SmPNP inhibitors (training and test sets) for the best HQSAR model (A/B/H/Ch).

Table 1 .
Chemical structure and biological activity of deazaguanine analogs employed in QSAR model development

Table 2 .
Descriptors considered in the final QSAR model . 1 of Burden matrix / weighted by atomic van der Waals volumes BEHv6 highest eigenvalue n. 6 of Burden matrix / weighted by atomic van der Waals volumes BELv6 lowest eigenvalue n. 6 of Burden matrix / weighted by atomic van der Waals volumes GGI10 topological charge index of order 10 SEigv eigenvalue sum from van der Waals weighted distance matrix SEige eigenvalue sum from electronegativity weighted distance matrix

Table 3 .
Predicted pIC 50 values according to the descriptor-based and fragment-based QSAR models

Table 5 .
Influence of different fragment sizes on the statistical parameters of the two best fragment-based models