Two-Dimensional Quantitative Structure-Activity Relationship Studies on Bioactive Ligands of Peroxisome Proliferator-Activated Receptor d

Os receptores PPAR formam uma subclasse da superfamília dos receptores nucleares e são considerados importantes alvos para o desenvolvimento de novos agentes terapêuticos para o tratamento de vários distúrbios metabólicos, como dislipidemia e diabetes mellitus tipo 2. Neste trabalho, estudos utilizando o método do holograma QSAR (HQSAR) foram realizados para uma série de potentes ligantes da isoforma PPARd. Resultados estatísticos significativos (r = 0,947 e q = 0,791) foram obtidos, indicando a confiabilidade do modelo 2D QSAR gerado. A seguir, o modelo 2D foi utilizado para predizer a atividade biológica de um conjunto de compostos-testes e os valores preditos a partir do modelo de HQSAR estão em boa concordância com os resultados experimentais. Desta forma, é possível dizer que o modelo de QSAR bidimensional obtido neste trabalho, juntamente com as informações extraídas de mapas 2D de contribuição atômica podem ser muito úteis para o desenvolvimento de novos ligantes do receptor PPARd para o tratamento de doenças metabólicas.


Introduction
Metabolic syndrome is considered a highly prevalent disease due to the sedentary lifestyle of the population and can be defined as a group of metabolic disturbances, such as abnormal lipid and carbohydrate metabolism, and a pro-inflammatory state of the body. 1,2Associated with the metabolic syndrome are multiple related clinical disorders, such as obesity, hypertension, cardiovascular diseases and type 2 diabetes mellitus.This last disorder affects about 6% of the adult population in Western society, and it is expected that this number will increase 6% annually to reach 200-300 million cases in 2010. 3Several drug therapies have been employed to treat cases of type 2 diabetes with the aim of reducing hyperglycemia, including sulfonylureas (which increase insulin release from pancreatic islets) and metformin (which reduces hepatic glucose production).However, these therapies have limited efficacy and tolerability as well as significant mechanism-based side effects. 3,4otential molecular targets to treat and prevent these metabolic disorders include peroxisome proliferatoractivated receptors (PPARs), which are important members of the nuclear receptor superfamily and play a key role in lipid metabolism, glucose homeostasis, cell differentiation, obesity and cancer, [5][6][7] besides having the ability to regulate inflammation and the immune response. 8There are three isoforms of PPAR: PPARa, PPARg and PPARd (or PPARb) that have distinct tissue expressions and can be considered important therapeutic targets.The a isoform is activated by polyunsaturated fatty acids and fibrates and is related to the regulation of lipid metabolism, lipoprotein synthesis and metabolism as well as inflammatory responses in the liver and other tissues.PPARg is an important regulator of the proliferation and differentiation of several cell types, including adipose cells, and it is activated by thiazolidinediones, resulting in insulin sensibilization and antidiabetic action. 9Physiological functions of PPARd are not yet fully known, and currently there are no marketed PPARd drugs.However, some studies have indicated that PPARd is a key regulator of lipid homeostasis and glucose disposal. 10PARd is a member of the nuclear hormone receptor family that enables cells to respond to the presence of small molecules such as steroids, fat soluble vitamins, fatty acids and xenobiotics through transcriptional regulation of gene expression. 8,11Some potential endogenous candidates to activate PPARd are fatty acids, triglycerides and prostacyclin.One synthetic PPARd ligand (GW501516, Figure 1) has shown to improve insulin resistance and to reduce plasma glucose levels in rodent models of type 2 diabetes and to correct metabolic syndrome in obese primates.This ligand has also been recently shown to reduce serum triglycerides and prevent the decrease of HDL-c and apoA-1 levels in sedentary human volunteers. 10The data strongly suggest that PPARd is an important biological target for the treatment of several metabolic diseases, and its agonists may have therapeutic usefulness in treating these disorders.It is important to note that a large number of PPAR agonists have been described in the literature, but the availability of PPARd-selective activators is very low.[20][21][22][23][24][25][26]

Data set
The data set of 34 PPARd ligands used for the HQSAR analyses was selected from literature and consisted of anthranilic acid-based, tetrahydroisoquinoline and indole sulfonamide analogues. 28The chemical diversity of the data set is very significant, since three main molecular regions were substituted: anthranilic acid ring (substitution of this ring with small substituents ortho, meta or para to the carboxylic acid group); tetrahydroisoquinoline region; and indole sulfonamide group, as it can be observed in Figure 2.
The generation of the molecular structures, as well as all QSAR modeling analyses, calculations and visualizations were performed using the SYBYL 8.0 package (Tripos Inc., St. Louis, USA).The chemical structure and the biological    property value (EC 50 , molar concentration of a substance that produces 50% of the maximum biological response) for all compounds studied are listed in Table 1.It is important to say that the values of EC 50 were selected from literature and measured under the same experimental conditions, 28 which is considered a fundamental requirement for successful QSAR studies. 29,30The EC 50 values were converted to the corresponding pEC 50 (-logEC 50 ) value and used as dependent variables in the HQSAR analyses.In Table 1 it can be seen that the values of pEC 50 span approximately three orders of magnitude and are acceptably distributed across the pEC 50 range values.
Another important characteristic of generating reliable statistical models is related to the choice of appropriate training and test sets.For this purpose we have employed hierarchical cluster analyses, which were performed with Tsar 3D (Accelrys, San Diego, USA).Training and test sets were selected in such a way that structurally diverse molecules having a wide range of biological activities were included in both sets.From the original data set of 34 PPARd ligands, 27 compounds were selected as members of the training set for model construction (1-27, Table 1), and the other 7 molecules (28-34, Table 1) were defined as members of the test set for the external model validation.Thus, the data set is appropriate for the purpose of QSAR model development.

HQSAR analysis
In this work, we have explored the 2D molecular features related to the biological activity presented by a series of PPARd agonists using hologram QSAR (HQSAR)  33 The strategy used in HQSAR is to translate chemical structures into binary bit strings, known as fingerprints.
HQSAR uses an extended form of fingerprint, known as molecular hologram 34,35 which encodes more information (e.g., branched and cyclic fragments, stereochemistry) than the traditional 2D fingerprint.The key difference, however, is that a molecular hologram contains all possible molecular fragments within a molecule, including overlapping fragments, and maintains a count of the number of times each fragment occurs.In fingerprint approach, the molecular structures are converted to all possible linear, branched and overlapping fragments of size between M and N atoms.These fragments are then assigned a specific integer by using a cyclic redundancy check (CRC) algorithm. 36These integers are then hashed to a bin in an integer array of fixed length which can vary between 50 to 500.These arrays are known as molecular hologram and the bin occupancies of the molecular holograms are used as the descriptors in statistical analyses. 35The descriptors (molecular holograms) are expected to encode the chemical and topological information of molecules.As a result, a molecular hologram is presented as a string of integers. 33igure 3 displays the main procedure employed in the generation of molecular holograms.
The methodology employed in HQSAR consists of some basic steps: (i) data set preparation, (ii) substructural fragmentation of the training set molecules, (iii) molecular hologram generation, (iv) statistical analysis (model generation), and (v) test set prediction (external validation). 37 the HQSAR method, each compound is hashed to a molecular fingerprint encoding the frequency of occurrence of various molecular fragment types using a predefined set of rules.One important feature of HQSAR methodology involves the progress of incorporating information about each fragment and each of its constituent sub-fragments, as this process implicitly encodes 3D structural information (e.g., hybridization and chirality). 35During the HQSAR analyses, several parameters can be varied, such as hologram length (variable that controls the number of bins in the hologram), fragment size (parameter that controls the minimum and maximum length of fragments to be included in the hologram) and fragment distinction. 38In our studies, holograms were generated using the standard parameters implemented in SYBYL 8.0.

Results and Discussion
An initial HQSAR analysis involves varying some fragment distinctions during the generation of the molecular fragments, and the distinctions used in this study were: atoms (A), bonds (B), connections (C), hydrogen atoms (H), chirality (Ch) and donor and acceptor (DA); several combinations of these parameters were considered during the generation of 2D QSAR models.The HQSAR analyses were performed by screening the 12 default series of hologram length values ranging from 53 to 401 bins.Afterwards, the partial least square (PLS) method was employed to relate the data set compounds' patterns of fragment counts to the experimental biological activity of these compounds.The statistical results obtained from PLS analyses using several fragment distinction combinations and the default fragment size (4-7) are presented in Table 2.According to Table 2, the best statistical results among all models using the training set compounds were obtained for model 10 (q 2 = 0.719), which was derived using the following combination of fragment distinctions: A, B, H and DA, with 5 being the optimum number of PLS components.This indicates that atom types, bonds, hydrogen and donor and acceptor atoms are essential features of the molecular structures for biological activity.This finding is in agreement with experimental evidence because there are possible hydrogen bonds between a cocrystallized ligand (GW9371) and important residues in the binding site. 28he next stage in an HQSAR analysis is studying the influence of different fragment sizes on the key statistical parameters.Fragment size parameters control the minimum and maximum length of fragments to be included in the hologram fingerprint and can be varied to incorporate larger or smaller fragments during the analyses. 35The HQSAR results obtained for several fragment sizes, using the best statistical model (model 10, Table 2), are displayed in Table 3.It is seen in Table 3 that the variation of fragment size provided substantial improvements in the statistical parameters, as can be observed for the model with a fragment size equal to 7-10.This model presents a high cross-validated correlation coefficient (q 2 = 0.791) associated with a low cross-validated standard error (SEP = 0.433) that indicates high predictive capability of the HQSAR model.
After the construction of a robust HQSAR model, we validated this model using an external set of compounds to predict their biological property values.This external validation process can be considered the most valuable method of validation, as the new compounds were completely excluded during the training of the model.In this way, the predictive power of the best HQSAR model derived from the training set molecules (fragment distinction A/B/H/DA; fragment size 7-10, Table 3) was assessed by predicting the pEC 50 values for the test set compounds (28-34, Table 1).The external validation results are listed in Table 4, and the graphic results for the experimental versus predicted activities of both compound sets (training and test sets) are displayed in Figure 4.
From Table 4 and Figure 4, we can see that the test set compounds are well-predicted without any outliers, i.e., there is good agreement between experimental and predicted values for the seven test set compounds.From the low residual values, it is possible to say that the HQSAR model obtained is highly reliable and can be used to predict the biological property of new untested compounds.The predicted pEC 50 values fall very close to the experimental values, deviating by less than 0.55 log units.
Finally, a complete HQSAR analysis involves the investigation of important indications of the molecular fragments directly related to biological activity or responsible for the low biological potency of the compounds and to propose structural modifications.In this way, one can obtain contribution maps that indicate the individual contributions to activity of each atom in a given molecule of the data set and to analysis the most relevant structural fragments incorporated to the hologram-based QSAR models, which can indicate the possible molecular mechanisms between a ligand and the biological receptor.The contribution map obtained from the HQSAR module implemented in SYBYL 8.0 presents a color system that discriminates the main atomic contributions to activity, i.e., the colors at the red end of the spectrum (red, red-orange and orange) reflect poor contributions, whereas colors at the green end (yellow, green-blue and green) reflect favorable contributions.Atoms with intermediate contributions are colored white.The individual atomic contributions for the most potent compound (24) of the data set are presented in Figure 5, and we can observe important structural features such as regions with poor contributions (colored in orange and red) that can be identified as potential targets for molecular modification and further SAR studies.The main regions that negatively contribute to biological activity include the methyl group linked to the anthranilic acid ring, the sulfur atom of the sulfonamide moiety and the five-member ring of the indole group.These groups could be replaced by other substituents with different structural and physicochemical features with the aim to increase the affinity and potency of the compounds studied in this work.Additionally, the main molecular fragments strongly related to biological potency (colored in green and yellow) are the carboxylate group (in agreement to experimental study) 28 and the benzene ring of the sulfonamide group.Furthermore, the main fragments highlighted by the HQSAR model are directly related to important interactions that determine the preferred binding mode of the compounds studied and PPARd, such as (i) substitution of the anthranilic acid with small substituents meta or para to the carboxylic acid led to a significant increase in PPARd affinity, and the X-ray structure of PPARd with a ligand (GW9371) reveals a lipophilic region below the anthranilic acid group; (ii) several PPAR agonists have an acidic group that usually forms hydrogen bonds with Tyr473, His323, and/or His449 (numbering based on PPARg); (iii) the variation of substituents at R 4 produces compounds with low potencies relative to binding affinity; this fact can be associated to reduced number of interactions of the indole ring with the AF-2 helix due to unfavorable steric interactions between the R 4 substituents and the sulfonamide group. 28Therefore, 2D contribution maps have demonstrated the molecular determinants for biological activity and emphasized important regions, where modifications of molecular groups can be strongly favorable to improve the biological activity.
It is important to say the 2D QSAR method employed in this work gives important insights on the structural requirements for the biological activity presented by the compounds studied, but the integration of information obtained using other approaches (e.g.physicochemical analyses, 3D QSAR methods and docking techniques) should be useful in the design of new PPARd activators having improved biological profile.

Conclusions
The HQSAR model obtained in this work shows both good internal and external consistency (r 2 = 0.947 and q 2 = 0.791), indicating the reliability of the 2D QSAR model in predicting the biological activity of untested compounds, which represents an important contribution to the QSAR field in the area of PPARs specifically related to isoform d.A good correlation between experimental and predicted pEC 50 values for the test set compounds further proved the reliability of the constructed HQSAR model.Besides, HQSAR analysis provided important insights on the molecular fragments directly related to biological activity, i.e. the main regions that negatively contribute to biological activity included the methyl group linked to the anthranilic acid ring, the sulfur atom of the sulfonamide moiety and the five-member ring of the indole group.This indicates that these groups could be replaced by other substituents with different structural and physicochemical features with the aim to increase the affinity and potency of the compounds studied in this work.Additionally, the main molecular fragments strongly related to biological potency were the carboxylate group and the benzene ring of the sulfonamide group.Therefore, the HQSAR model and the information obtained from the 2D contribution maps should be useful

Figure 2 .
Figure 2. Chemical diversity of the data set.

Figure 5 .
Figure 5. Individual atomic contributions obtained from the HQSAR model for the most potent PPARd ligand of the series (compound 24).

Table 1 .
Chemical structure and biological property of all compounds studied

Table 2 .
HQSAR results using various fragment distinctions and the default fragment size(4-7)

Table 3 .
Influence of various fragment sizes on key statistical parameters using the best fragment distinction (A, B, H and DA)

Table 4 .
Experimental and predicted biological property (pEC 50 ), along with residual values, for the test set containing 7 PPARd ligands Predicted vs. experimental values of pEC 50 for all PPARd ligands studied (training and test sets).