SciELO - Scientific Electronic Library Online

vol.22 número9Ethanol electrooxidation using Ti/(RuO2)(x) Pt(1-x) electrodes prepared by the polymeric precursor methodFast and simultaneous determination of Pb2+ and Cu2+ in water samples using a solid paraffin-based carbon paste electrode chemically modified with 2-aminothiazole-silica-gel índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Journal of the Brazilian Chemical Society

versão impressa ISSN 0103-5053

J. Braz. Chem. Soc. vol.22 no.9 São Paulo set. 2011 



Descriptor-and fragment-based QSAR models for a series of Schistosoma mansoni purine nucleoside inhibitors



Humberto F. FreitasI; Matheus P. PostigoII; Adriano D. AndricopuloII,*; Marcelo S. CastilhoI,*

IDepartamento do Medicamento, Faculdade de Farmácia, Universidade Federal da Bahia, 40170-115 Salvador-BA, Brazil
IILaboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, 13566-970 São Carlos-SP, Brazil




The enzyme purine nucleoside phosphorylase from Schistosoma mansoni (SmPNP) is an attractive molecular target for the treatment of major parasitic infectious diseases, with special emphasis on its role in the discovery of new drugs against schistosomiasis, a tropical disease that affects millions of people worldwide. In the present work, we have determined the inhibitory potency and developed descriptor- and fragment-based quantitative structure-activity relationships (QSAR) for a series of 9-deazaguanine analogs as inhibitors of SmPNP. Significant statistical parameters (descriptor-based model: r2 = 0.79, q2 = 0.62, r2pred = 0.52; and fragment-based model: r2 = 0.95, q2 = 0.81, r2pred = 0.80) were obtained, indicating the potential of the models for untested compounds. The fragment-based model was then used to predict the inhibitory potency of a test set of compounds, and the predicted values are in good agreement with the experimental results.

Keywords: purine nucleoside phosphorylase, schistosomiasis, fragment-based, descriptors, QSAR


A enzima purina nucleosídeo fosforilase de Schistosoma mansoni (SmPNP) é um alvo molecular atrativo para o tratamento de importantes doenças infecciosas parasitárias, com especial ênfase para o seu papel na descoberta de novos fármacos contra a esquistossomose, uma doença tropical que afeta cerca de 200 milhões de pessoas em 74 áreas endêmicas no mundo todo. No presente trabalho, a potência inibitória foi determinada e estudos das relações quantitativas entre a estrutura e atividade (QSAR), baseados em descritores e fragmentos, foram desenvolvidos para uma série de 9-deazaguaninas que atuam como inibidores da SmPNP. Parâmetros estatísticos significantes (modelo baseado em descritor: r2 = 0,79; q2 = 0,62, r2pred = 0,52; e modelo baseado em fragmento: r2 = 0,95; q2 = 0,81; r2pred = 0,80) foram obtidos, indicando o potencial dos modelos para compostos ainda não testados. O modelo baseado em fragmento foi então usado para predizer a potência inibitória de um conjunto teste de compostos, e os valores preditos estão em boa concordância com os resultados experimentais.




Purine nucleoside phosphorylase (PNP, EC plays an important role in the purine salvage pathway and has long been explored in drug design for the therapy of cancer and auto-immune diseases.1 More recently, the PNP enzyme has also been investigated as a potential target for the treatment of parasitic infectious diseases, such as malaria and schistosomiasis.2-4 In particular, the parasite Schistosoma mansoni, one of the etiologic agents of human schistosomiasis, lacks the de novo pathway for purine biosynthesis and depends entirely on the salvage pathway for its purine requirements for synthesis of RNA and DNA.5-9 In this context, the use of selective PNP inhibitors from S. mansoni (SmPNP) can cause purine starvation, leading to death of the parasite. Schistosomiasis is a major infectious disease that affects 200 million people in 74 endemic areas worldwide.4 Praziquantel, the only effective drug for the treatment of the disease, has been in use for more than two decades and significant resistance has emerged in different geographic regions.10-12

This scenario prompted us to investigate several 9-deazaguanine analogs, which have been described as promising SmPNP inhibitors.10 In the present study, we have collected values of IC50 for a series of ground-state inhibitors of SmPNP and used the data to create descriptor- and fragment-based quantitative structure-activity relationship (QSAR) models which show substantial predictive promise. Our strategy took advantage of previous structure-based drug design (SBDD) studies that revealed essential requirements for SmPNP binding affinity and selectivity (e.g., binding to the hydrophobic pocket near Phe161, H-bonding to Tyr201).10 The results reported herein revealed important molecular requirements for the design of new PNP inhibitors with improved potency.



Biochemical assays and data set composition

The data set of twenty six SmPNP inhibitors (1-26, Table 1) employed in this work consists of one guanine (18), one 9-substituted-guanine (21), one 9-substituted-oxadiazolo-guanine (23) and several 9-substituted-9-deazaguanine derivatives, kindly supplied by BioCryst Pharmaceuticals Inc. Kinetic measurements were carried out spectrophotometrically with the aid of a Cary100 UV-Vis spectrophotometer, using a standard coupled assay as previously described.10,13-16 The reaction mixture contained 5 nmol L-1 SmPNP (as the monomer),50 mmol L-1 phosphate buffer (K3PO4, pH 7.4),10 µmol L-1 inosine, and xanthine oxidase 40 milliunits mL-1. Uric acid formation was monitored at 293 nm, in triplicate at 25 ºC (extinction coefficient for uric acid, e293 = 12.9 L mmol-1 cm-1).15 The percentage of inhibition was calculated according to the following equation:

% of Inhibition = 100 × (1 - Vi / V0)

where, Vi and V0 are the initial velocities (enzyme activities) determined in the presence and in the absence of inhibitor, respectively. Compound 3, a known SmPNP inhibitor, was used as a positive control for enzyme inhibition.10 Values of IC50 (concentration of compound required for 50% inhibition of SmPNP) for the whole series of inhibitors were independently determined by making rate measurements for at least six inhibitor concentrations. The type of inhibition was determined for a subset of potent inhibitors as described previously. All kinetic parameters were determined from the collected data by nonlinear regression employing the SigmaPlot enzyme kinetics module. The values represent means of at least three individual experiments. Values of IC50 for inhibitors 1-3, 5-9, 11, 12, 14 and 19-26, measured at 10 mmol L-1 inosine, are in good agreement with those previously described,17 whereas comparable values are not available for the other inhibitors of the data set. The chemical structures of all SmPNP inhibitors used in the modeling studies were constructed in the SYBYL 8.0 package (Tripos Inc., St. Louis, USA) and the energy was computed in a single point calculation using the AM1 semi-empirical method (keywords: 1SCF XYZ ESP NOINTER SCALE=1.4 NSURF=2 SCINCR=0.4 NOMM) as implemented in the MOPAC module. A hierarchical cluster analysis (HCA), carried out with Pirouette 4.0 software (Infometrix, Washington, USA), using the complete linkage clustering method and Euclidean distances, guided the division of the complete dataset into training (compounds 1-19, Table 1) and test (compounds 20-26, Table 1) sets so that both datasets present structural diversity and cover the whole dataset potency range.

Descriptor-based QSAR approach

About 2,500 2D molecular descriptors, including topological descriptors, connectivity indices,2 D autocorrelation and physicochemical descriptors and so forth, were computed using the DRAGON 5.5 software (Talette SRL, Milan, Italy) and then pre-selected as follows: descriptors with high inter-correlation (> 97%) or those poorly related to the biological property (r2 < 0.10) were discarded. This strategy yielded 218 physicochemical descriptors that were employed to build multiple linear regression models (MLR) with up to 3 descriptors per model, as available in MOBYDIGS 1.0 software (Talette SRL, Milan, Italy). The MLR models were generated by genetic algorithm using the following fitting criteria: QUIK rule (0.005), asymptotic Q2 rule (-0.005), redundancy RP rule (0.1) and overfitting RN rule (0.01).18 Due to the stochastic nature of the genetic algorithm, the search was carried out using ten independent populations of 100 models each that evolved for more than 1000 generations or at least one million steps. The descriptors found in the 10 best models of each population were polled together, autoscaled and employed to develop partial least squares (PLS) models, as implemented in the PIROUETTE 4.0 software (Infometrix, Washington, USA).

Fragment-based QSAR strategy

Statistical HQSAR modeling was carried out as previously described.19-21 Briefly, each molecule in the dataset is broken down into several unique structural fragments (linear, branched, and overlapping), which are arranged within the bins of a fixed length array (53 to 401 bins) to form a molecular hologram. The bin occupancies can be considered as structural descriptors encoding compositional and topological molecular information. Parameters that affect hologram generation such as hologram length, fragment size and fragment distinction (atoms (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch), and donor/acceptor (DA)) were evaluated during model development, using default fragment size 4-7 over the 12 default series of hologram lengths. Next, the influence of fragment size was further investigated for the best models. All models generated in this study were investigated using the full cross-validated r2 (q2) partial least squares (PLS) leave-one-out (LOO) method.

QSAR model validation

External validation was carried out using a test set of seven compounds, which were not considered for the purpose of QSAR model development. The predictive ability of the models was estimated as described previously.22


Results and Discussion

In the present work, a series of twenty six structurally diverse compounds (Table 1, and Supplementary Information) was evaluated to determine the in vitro potency (IC50) through kinetic studies. As expected based on previous studies,10,17 these are competitive inhibitors of SmPNP. For instance, double reciprocal plots of velocity as a function of substrate for compounds 15 and 16 show that Vmax (intercept value of 1/v0) is constant at all inhibitor concentrations, whereas the apparent value of KM (x-intercept, -1/KM) changes with increasing inhibitor concentration (Figure 1). This experimental behavior is observed for all SmPNP inhibitors, whose IC50 values range from 0.1 to 200 mM, a factor of potency of 2000.



Although structure-activity relationships (SAR) have been widely described in the last decades for ground-state mammalian PNP inhibitors, the opposite situation is true for SmPNP inhibitors. It was only more recently that the first SAR studies were provided in the literature, describing key structural requirements for SmPNP affinity and selectivity.10,17 These studies suggest that hydrophobic interactions in the active site of SmPNP play an important role in the binding affinity of the inhibitors. In spite of their significance and usefulness, the SAR information, of qualitative nature, would gain strategic advantages in drug design through the incorporation of statistical predictive modeling capabilities.23 In this context, QSAR models are useful tools for the quantitative analysis of the internal consistency and predictive ability of different data sets of compounds, with the advantage of revealing important molecular features associated with biological activities.24,25

The synergy between descriptor-based and fragment-based QSAR models has been a valuable approach to boost SAR studies, due to the complementary nature of these ligand-based drug design (LBDD) strategies.26,27 Thus, our initial efforts focused on the development of QSAR models by means of topological descriptors that account for molecular size, shape and branching through graph theoretical invariants (using the DRAGON 5.5 software). Additional information regarding molecular charge and polarizability was also considered through the weighting of the descriptors.28 A total of 2489 descriptors were calculated, and the highly correlated and those that convey no information towards the biological activity (constant and r2 < 0.10) were excluded from further consideration. This protocol afforded 218 descriptors that were employed to build a number of preliminary QSAR models by multiple linear regression (MLR), containing up to 3 descriptors. While the best MLR model obtained showed good internal statistical parameters (n = 19, r2 = 0.82, q2 = 0.78), the predictive ability was poor (r2pred = 0.17). This suggests that the chemical and structural features captured in the model do not extend beyond the chemical space of training-set compounds, limiting its usefulness in drug design. Therefore, we resorted to more powerful statistical tools, such as PLS. For this purpose, the descriptors found in the 10 best models from each population were gathered, autoscaled and used for further independent QSAR modeling.

Although our initial QSAR models showed inferior statistical parameters (r2 = 0.64 and q2 = 0.51, and 3 components), the iterative exclusion of the descriptors that presented a lower contribution to the regression vector led to improved models. The final QSAR model (r2 = 0.79, q2 = 0.62, and 2 principal components) (Table 2) showed an increased predictive ability (r2pred = 0.52) when compared to the MLR models (Figure 2 and Table 3), though insufficient for guiding the design of more potent SmPNP inhibitors.





Thus, the analysis of the descriptors that have major contributions to the QSAR regression vector would depict misleading structure-activity relationships that hold true only for the training set compounds. In fact, the low predictive ability of descriptor-based QSAR models may suggest that compounds 22 and 24 are outliers, however, as can be seen below, a careful investigation indicates that their high residual values are a consequence of topological descriptors shortcomings, such as ineffective sampling of the deazapurine-analogs chemical space.

As part of our strategies in medicinal chemistry, we employed the fragment-based hologram QSAR (HQSAR) approach to investigate the crucial structural features related to SmPNP inhibition. HQSAR is an interesting method for this particular study, as no 3D structural information is required (e.g., macromolecular target, putative binding information).20,21 HQSAR investigations require the evaluation of parameters that specify the length of the hologram, as well as the size and type of fragment that are to be encoded. Several combinations of fragment distinction were considered during the QSAR modeling runs. The generation of molecular fragments was carried out using the following fragment distinctions: atoms (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch), and donor and acceptor (DA). In order to assess the process of hologram generation, several combinations of these parameters were considered using the fragment size default (4-7) as follows: A/B/C, A/B/C/H, A/B/C/H/Ch, A/B/C/H/Ch/DA, A/B/H, A/B/Ch, A/B/DA, A/B/H/Ch, A/B/Ch/DA, A/B/H/DA and A/B/H/Ch/DA (Table 4). The patterns of fragment counts from the training set inhibitors were then related to the experimental biological data using PLS, as summarized in Table 4.

The influence of fragment distinction parameters has considerable effects on the quality of the models. As it can be seen in Table 4, the best statistical results among all models were obtained for models 5 (q2 = 0.79, r2 = 0.96, and 4 components) and 8 (q2 = 0.81, r2 = 0.95, and 4 components). These models were derived using A/B/H and A/B/H/Ch as fragment distinction, respectively. The use of other fragment distinctions into the molecular holograms did not improve the statistical quality of the models as shown in Table 4. It is worth noting that due to the intrinsic nature of different and highly diverse data sets, several different combinations of fragments must be considered in order to generate the best final HQSAR model.29

Previously, it has been shown that an extensive H-bonding network is responsible for the binding affinity of the 9-deazaguanine derivatives into the active site of SmPNP.10 This is in good agreement with our present studies, in which the presence of the fragment distinction H is highlighted in the best models 5 and 8. The influence of different fragment size in the statistical parameters was further investigated for the two best HQSAR models (models 5 and 8, Table 4). Fragment size parameters control the minimum and maximum length of fragments to be included in the hologram fingerprint. Table 5 summarizes the statistical results for the distinct fragment sizes used to generate the QSAR models. As it can be seen, the variation of fragment size did not lead to the generation of better HQSAR models, and, therefore, the best statistical results were obtained with default fragment size (4-7) in both cases (A/B/H, model 5; and A/B/H/Ch, model 8).

It is important to note that the high q2 values obtained for the best HQSAR models do not imply automatically that these models would possess high predictive ability for external compounds.30 The most important test of a QSAR model is its ability to predict the property value for new structurally related compounds. The predictive power of the best HQSAR model derived using the training set molecules (model 8; fragment distinction A/B/H/Ch, and fragment size 4-7) was assessed by predicting pIC50 values for 7 test set molecules (compounds 20-26, Table 1) that were completely excluded during the training of the model. The results are listed in Table 3, and the graphic results for the experimental vs. predicted activities of both training set and test set are displayed in Figure 2. The good agreement between experimental and predicted values for test set compounds indicates the reliability of the constructed HQSAR model (r2pred = 0.80). The graphic results further show the consistency between experimental and predicted pIC50 values of both training and test sets. The low residual values shown in Table 3 suggests that the HQSAR model obtained can be used to predict the biological activity of novel compounds within this structural class. The predicted values fall close to the experimental pIC50 values, deviating by less than 0.7 log units. The results show that the test set compounds are well predicted without any outliers (Figure 3). On the other hand, the quality of the results obtained for the external prediction of model 5 (r2pred = 0.71), under similar conditions, was not comparable with that of the model 8 (results not shown).



Useful fragment-based QSAR models should not only have statistical quality and predictive power, but also provide hints about which molecular fragments may be important to activity. Usually, the interpretation of the descriptors found in QSAR equations gives some clues about key electronic and steric components, which are essential for the biological property. Besides that, HQSAR has the advantage of offering an alternative and easier way to analyze the individual atomic contributions through a visual assessment of the different molecules of the data set. During the HQSAR analysis, the molecules can be colored to reflect their contribution (e.g., positive, neutral or detrimental) to the biological activity of interest. The colors reflecting poor contributions are at the red end of the spectrum (red, red orange, and orange), while the colors reflecting favorable contributions are at the green end (yellow, green blue, and blue). Atoms colored white reflect neutral contributions.31 Surprisingly, comparison of the contribution maps of compounds 14,15 and 26 reveal that the purine ring might have opposing effects toward potency (Figure 4). This result can be explained by the H-bonding requirements in the SmPNP active site. On one hand, it has been proposed that compounds possessing aryl groups in the 9 position of the purine ring (such as 15 and 26) can reach the hydrophobic pocket in the vicinity of Phe161.17 On the other hand,9-substituted compounds with shorter and non-planar chains can bind loosely, being easily displaced by water molecules. Taken together, these evidences clarify the opposite role of the fragments of compound 14 in the H-bonding to Asn245 and Glu203 (reddish colored, poor H-bonding capability) in comparison with the corresponding fragments in compounds 15 and 26 (colored in green, stronger H-bonding network).



In spite of the urgent need for novel drugs for tropical infectious diseases, the investments in research and development (R&D) have been inadequate, as a consequence of the lack of interest shown by the major pharmaceutical and biotechnological companies. In order to circumvent this problem, most of the efforts devoted to the area of neglected diseases is observed in academia and non-governmental organizations, through public-private partnerships.32 However, the main focus is on the early efforts to identify good targets or identify new leads for individual diseases, leaving a crucial gap in the current research and development pipeline. In this work, we have generated important descriptor- and fragment-based QSAR models for a series of 9-deazaguanines as potent inhibitors of SmPNP, showing high internal and external consistency. In addition, the fragment-based model exhibited high predictive power for new compounds within this structural diversity. The molecular information gathered in this study should be useful for future efforts in the design of new inhibitors having increased affinity and selectivity.


Supplementary Information

Supplementary data are available free of charge at as pdf file.



We gratefully acknowledge financial support from the Fundação de Amparo à Pesquisa do Estado da Bahia (FAPESB), São Paulo Research Foundation (FAPESP) and the National Council for Scientific and Technological Development (CNPq), Brazil. We are also grateful to BioCryst Pharmaceuticals, Inc. for the gift of the inhibitors employed in this work.



1. Castilho, M. S.; Postigo, M. P.; de Paula, C. B.; Montanari, C. A.; Oliva, G.; Andricopulo, A. D.; Bioorg. Med. Chem. 2006,14,516.         [ Links ]

2. Bzowska, A.; Kulikowska, E.; Shugar, D.; Pharmacol. Ther. 2000,88,349.         [ Links ]

3. Pereira, H. M.; Cleasby, A.; Pena, S. D. J.; Franco, G. R.; Garratt, R. C.; Acta Crystallogr., Sect. D: Biol. Crystallogr. 2003,59,1096.         [ Links ]

4., accessed in December 2010.         [ Links ]

5. Senft, A. W.; Crabtree, G. W.; Biochem. Pharmacol. 1977,26,1847.         [ Links ]

6. Shi, W.; Ting, L.; Kicska, G. A.; Lewandowicz, A.; Tyler, P. C.; Evans, G. B.; Furneaux, R. H.; Kim, K.; Almo, S. C.; Schramm, V. L.; J. Biol. Chem. 2004,279,18103.         [ Links ]

7. Pereira, H. D. M.; Franco, G. R.; Cleasby, A.; Garratt, R. C.; J. Mol. Biol. 2005,353,584.         [ Links ]

8. Guido, R. V. C.; Oliva, G.; Andricopulo, A. D.; Curr. Med. Chem. 2008,15,37.         [ Links ]

9. Azevedo Jr., W. F.; Soares, M. B.; Curr. Drug Targets 2009,10,193.         [ Links ]

10. Castilho, M. S.; Postigo, M. P.; Pereira, H. M.; Oliva, G.; Andricopulo, A. D.; Bioorg. Med. Chem. 2010,18,1421.         [ Links ]

11. Webster, M.; Fallon, P. G.; Fulford, A. J.; Butterworth, A. E.; Ouma, J. H.; Kimani, G.; Dunne, D. W.; Parasite Immunol. 1997,19,333.         [ Links ]

12. Sabra, A. N.; Botros, S. S.; J. Parasitol. 2008,94,537.         [ Links ]

13. Postigo, M. P.; Guido, R. V. C.; Oliva, G.; Castilho, M. S.; Pitta, I. R.; Albuquerque, J. F. C.; Andricopulo, A .D.; J. Chem. Inf. Model. 2010,50,1693.         [ Links ]

14. Farutin, V.; Masterson, L.; Andricopulo, A. D.; Cheng, J.; Riley, B.; Hakimi, R.; Frazer, J. W.; Cordes, E. H.; J. Med. Chem. 1999,42,2422.         [ Links ]

15. Andricopulo, A. D.; Yunes, R. A.; Chem. Pharm. Bull. 2001,49,10.         [ Links ]

16. Kim, B. K.; Cha, S.; Parks Jr., R. E.; J. Biol. Chem. 1968,243,1771.         [ Links ]

17. Postigo, M. P.; Krogh, R.; Terni, M. F.; Pereira, H. M.; Oliva, G.; Castilho, M. S.; Andricopulo, A. D.; J. Braz. Chem. Soc. 2011,3,583.         [ Links ]

18. Todeschini, R.; Consonni, V.; Mauri, A.; Pavan, M.; Anal. Chim. Acta 2004,515,199.         [ Links ]

19. Guido, R. V. C.; Castilho, M. S.; Mota, S. G. R.; Oliva, G.; Andricopulo, A. D.; QSAR Comb. Sci. 2008,27,768.         [ Links ]

20. Borchhardt, D.; Castilho, M. S.; Andricopulo, A. D.; Lett. Drug Des. Discovery 2008,5,57.         [ Links ]

21. Moda, T. L.; Montanari, C. A.; Andricopulo, A. D.; Bioorg. Med. Chem. 2007,15,7738.         [ Links ]

22. Schuurmann, G.; Ebert, R.; Chen, J.; Wang, B.; Kuhen, R.; J. Chem. Inf. Model. 2008,48,2140.         [ Links ]

23. Andricopulo, A. D.; Salum, L. B.; Abraham, D. J.; Curr. Top. Med. Chem. (Sharjah, United Arab Emirates) 2009,9,771.         [ Links ]

24. Salum, L. B.; Polikarpov, I.; Andricopulo, A. D.; J. Chem. Inf. Model. 2008,48,2243.         [ Links ]

25. Honorio, K. M.; Polikarpov, I.; Garratt, R. C.; Andricopulo, A. D.; J. Mol. Graph. Model. 2007,25,921.         [ Links ]

26. Salum, L. B.; Andricopulo, A. D.; Molec. Divers. 2009,13,277.         [ Links ]

27. Mota, S. G. R.; Barros, T. F.; Castilho, M. S.; J. Braz. Chem. Soc. 2009,20,451.         [ Links ]

28. Caballero, J.; Garriga, M.; Fernandez, M.; J. Comput. Aided Mol. Des. 2005,19,771.         [ Links ]

29. Castilho, M. S.; Guido, R. V. C.; Andricopulo, A. D.; Lett. Drug Des. Discovery 2007,4,106.         [ Links ]

30. Golbraikh, A.; Tropsha, A.; J. Mol. Graph. Model. 2002,20,269.         [ Links ]

31. accessed in December 2010        [ Links ]

32. Nwaka, S.; Ridley, R. G.; Nat. Rev. Drug Discovery 2003,2,919;         [ Links ] Nwaka, S.; Hudson, A.; Nat. Rev. Drug Discovery 2006,5,941.         [ Links ]



Submitted: January 13,2011
Published online: June 16,2011
FAPESP has sponsored the publication of this article.



* e-mail:,



Supplementary Information


Figure S1 - Click to enlarge