Acessibilidade / Reportar erro

QSAR Study of the Inhibitors of the Acetyl-CoA Carboxylase 1 and 2 using Bayesian Regularized Genetic Neural Networks: A Comparative Study

Abstract

Linear and non-linear quantitative structure-activity relationship (QSAR) models were presented for modeling and predicting anti-diabetic activities of a set of inhibitors of acetyl-CoA carboxylase 1 and 2 (ACC1 and ACC2). Different algorithms were utilized to choose the best variables among large numbers of descriptors and then these selected descriptors were used for non-linear (artificial neural network) and linear (multiple linear regression) modeling. The variable selection methods were consisted of stepwise-multiple linear regression (stepwise-MLR), successive projections algorithm (SPA), genetic algorithm-multiple linear regression (GA-MLR) and Bayesian regularized genetic neural networks (BRGNNs). The prediction abilities of the models were evaluated by Monte Carlo cross validation (MCCV) in variable selection and modeling steps. The results revealed that the best variables for describing the inhibition mechanism of ACC were among topological charge indices, radial distribution function, geometrical, and autocorrelation descriptors. The statistical parameters of R2 and root mean square error (RMSE) indicated that BRGNNs is superior for modeling the inhibitory activity of ACC modulators over the other methods. The sensitivity analysis together with the frequency of the selected molecular descriptors in this work can establish an understanding to the mechanism of ACC inhibitory activity of small molecules.

acetyl-CoA carboxylase; quantitative structure activity relationships; Bayesian regularized genetic neural networks; successive projection algorithm


Introduction

Diabetes mellitus is a typical metabolic disorder characterized by abnormally high levels of plasma glucose or hyperglycemia.11 Havale, S. H.; Pal, M.; Bio. Med. Chem. 2009, 17, 1783. This disease has been described as an “epidemic” of contemporary society and threatens to become a global health scourge. The total number of people with diabetes is projected to rise from 171 million in 2000 to 366 million in 2030.22 Wild, S.; Roglic, G.; Green, A.; Diabetes Care 2004, 27, 1047. Furthermore, the World Health Organization (WHO) estimates that the diabetes deaths account for 5% of global deaths and are likely to expand by more than 50% in the next 10 years without urgent action.33 http://www.who.int/mediacentre/factsheets/fs312/en/, accessed in February 2015.
http://www.who.int/mediacentre/factsheet...
Mainly two distinct clinical forms of the diabetes are recognized: the type 1 or insulin-dependent diabetes which is usually diagnosed in the children and young adolescent and is caused by destruction of insulin-producing beta cells in the pancreas, leading to a deficiency of insulin; type 2 or no-insulin dependent diabetes which is the most common form of diabetes mellitus and is caused by target cell resistance to insulin.

Many recent studies have demonstrated that the obesity is one of the most important risk factors for the prevention of type 2 diabetes and its related co-morbid conditions.44 Kramer, H.; Cao, G.; Dugas, L.; Luke, A.; J. Diabetes Complications 2010, 24, 368. In obese individuals, adipose tissue releases increased amounts of non-esterified fatty acids, glycerols, hormones, pro-inflammatory cytokines and other factors that are involved in the development of insulin resistance.55 Kahn, S. E.; Hull, R. L.; Utzschneider, K. M.; Nature 2006, 444, 840. A drug agent that would be expected to impact type 2 diabetes and obesity would have potential to positively affect health outcomes for diabetes and the obese. Meanwhile, acetyl-coenzyme A carboxylase (ACC) has crucial roles in fatty acid metabolism in most living organisms and represents an attractive target for drug discovery. This heterodimeric protein which has two known isoforms, ACC1 and ACC2, is composed of carboxyltransferase (CT), biotin carboxyl carrier protein (BCCP), and biotin carboxylase (BC) domains, whose purpose is the synthesis of malonyl-CoA (m-CoA) from acetyl-CoA in an ATP-dependent reaction via the fixation of biocarbonate.66 Polakis, S. E.; Guchhait, R. B.; Zwergel, E. E.; Lane, M. D.; Cooper, T. G.; J. Biol. Chem. 1974, 249, 6657. Inhibition of ACC offers the ability to inhibit de novo fatty acid production in lipogenic tissues (liver and adipose) while at the same time stimulates fatty acid oxidation tissues (heart and skeletal muscle) where it plays a role in modulating energy expenditure.77 Tong, L.; Harwood, H. J.; J. Cell. Biochem. 2006, 99, 1476. Therefore, an ACC1/2 isozyme nonselective inhibitor would be predicted to reduce fatty acid synthesis in liver and adipose tissue while at the same time increasing fatty acid oxidation in the liver and muscle tissues resulting in increased energy expenditure. The ACC1 and ACC2 enzymes are two important targets in modern drug discovery projects.88 Castle, J. C.; Hara, Y.; Raymond, Ch. K.; Garrett-Engele, Ph.; Ohwaki, K.; Kan, Zh.; Kusunoki, J.; Johnson, J. M.; Plos One 2009, 4, 4369. Designing effective dual ACC1 and ACC2 inhibitors is of great importance for increasing insulin sensitivity and treatment of metabolite syndrome and type 2 diabetes mellitus.99 Freeman-Cook, K.; Amor, P.; Bader, S.; Buzon, L. M.; Coffey, S. B.; Corbett, J. W.; Dirico, K. J.; Doran, S. D.; Elliott, R. L.; Esler, W.; Guzman-Perez, A.; Henegar, K. E.; Houser, J. A.; Jones, C. A.; Limberakis, C.; Loomis, K.; McPherson, K.; Murdande, S.; Nelson, K. L.; Phillion, D.; Pierce, B. S.; Song, W.; Sugarman, E.; Tapley, S.; Tu, M.; Zhao, Z.; J. Med. Chem. 2012, 55, 935.

Recently, 1010 Corbett, J.; W.; Freeman-Cook, K. D.; Elliott, R.; Vajdos, F.; Rajamohan, F.; Kohls, D.; Marr, E.; Zhang, E.; Tong, L.; Tu, M.; Murdande, S.; Doran, S. D.; Houser, J. A.; Song, W.; Jones, C. J.; Coffey, C. B.; Buzon, L.; Minich, M. L.; Dirico, K. J.; Tapley, S.; McPherson, R. K.; Sugarman, E.; Harwood, H. J.; Esler, W.; Bio. Med. Chem. Lett. 2010, 20, 2383. screening Pfizer's compound library resulted in the identification of a total 60 ACC inhibitors that exhibited good rat and human pharmacokinetics. To discover and design new and more effective compounds, it is necessary to make use of computational techniques and QSAR methods. Thus, the activities of newly designed drugs could be predicted before making a decision whether these compounds should be really synthesized or tested. The construction of a QSAR model often entails the problem of selecting the most relevant predictors from the overall set of predictors. Effective variable selection, therefore, is an integral part of the QSAR modeling process.

In the present contribution, four different variable selection techniques followed by neural network modeling have been used for describing and predicting the ACC1 and ACC2 inhibitory activities of a series of arylquinoline amide derivatives. We explored variable selection techniques such as stepwise-multiple linear regression (stepwise-MLR), successive projection algorithm (SPA), genetic-algorithm multiple linear regression (GA-MLR) and Bayesian regularized genetic neural networks (BRGNN). In each of these four methods sampling was done by the use of shuffling on the raw data. In order to deal with the overfitting in both variable selection and modeling procedure, Monte Carlo cross validation (MCCV) and Bayesian regularization formalism (during the training procedure) was used, respectively. The detailed theory behind artificial neural network (ANN) and Bayesian regularization algorithm can be found in literature.1111 Jalali-Heravi, M.; Mani-Varnosfaderani, A.; QSAR Comb. Sci. 2009, 28, 946. Section 2 devotes to description of variable selection, sensitivity analysis and MCCV algorithms. The prediction abilities of the corresponding modeling algorithms will be reported in result and discussion section and the overall results of the various techniques will be compared with each other. This paper help to have a clue about inhibition mechanism of ACC1 and ACC2 inhibitors and designing new molecules as potent energy modulators for treatment of diabetes mellitus type 2 and metabolite syndrome. The selected molecular descriptors in this work can be considered as informative markers for defining new molecular scaffold with potent ACC inhibitory activities.

Methodology

Variable selection

To build a good QSAR model, a minimal set of information-rich descriptors is required. The large number of possible indices creates several problems for the modeling procedure such as: (a) multicollinearity; (b) poor models generated from poor descriptors; (c) overfitting of the model; (d) lack of relevant molecular information from all of the descriptors (some variables may be irrelevant or unreliable); (e) chance correlation.1212 Winkler, D. A.; Brief Bioinform. 2002, 1, 73. Consequently, selecting the best descriptors from a large set of variables is crucial for improving the model performance and making better predictions. Many different types of variable selection methods are well known and fully described in the literature.1313 Anderson, C. M.; Bro, R.; J. Chemometr. 2010, 24, 728. In present article, four currently ones are briefly described, used and compared with each other.

Stepwise multiple linear regression (stepwise-MLR)

A detailed description of the stepwise regression can be found in literature.1414 Brereton, R. G.; Chemometrics: Data Analysis for the Laboratory and Chemical Plant; Wiley: New York, 1995. As a short definition, the stepwise regression method is an iterative selection procedure that starts from a variable with largest empirical correlation with the dependent variable. Each iteration includes two phases: the inclusion phase, in which each of the remaining variables is subjected to a partial F-test. If the largest F-value is larger than a critical “F-to-enter” value, the corresponding variable is inserted in the model. While in the exclusion phase, in which each of the variables is subjected to a partial F-test, if the smallest F-value is smaller than a critical ‘F-to-remove' value, the corresponding variable will be removed from the model and returned to the pool of variables still available for selection.1515 Brown, S. D.; Tauler, R.; Walczale, B.; Comprehensive Chemometrics; Elsevier, 2009.

Successive projection algorithm (SPA)

SPA is a technique specially designed to select subsets of variables with small collinearity and appropriate prediction power for use in MLR models. This method comprises two phases: In the first phase, candidate subsets of variables are constructed due to collinearity minimization criteria. In the second phase, the best variable subset is chosen according to a criterion that evaluates the prediction ability of the resulting MLR model. Finally in the last phase, the selected subset will undergo an elimination procedure to determine whether any variables can be removed without significant loss of prediction ability. More detail on the successive projections algorithm can be found elsewhere.1515 Brown, S. D.; Tauler, R.; Walczale, B.; Comprehensive Chemometrics; Elsevier, 2009.

Genetic algorithm-multiple linear regression (GA-MLR)

A genetic algorithm is a simulated technique somewhat inspired by the evolution theory presented by Darwin. Several authors have published papers about feature selection by GAs which can be found in literatures.1313 Anderson, C. M.; Bro, R.; J. Chemometr. 2010, 24, 728.,1515 Brown, S. D.; Tauler, R.; Walczale, B.; Comprehensive Chemometrics; Elsevier, 2009.,1616 Leardi, R.; J. Chemometr 2001, 15, 559. Briefly, the GA is made up by the following basic steps: (1) a vector (chromosome) containing zeros and ones (genes) is generated with the size corresponding to the number of variables; (2) an initial population of chromosomes is randomly created; (3) the value of fitness function (here MLR and ANN models) is evaluated for each new produced chromosomes; (4) the chromosomes with the best predictions (according to their fitness function value) are used to produce new populations by operations such as selection, crossover and mutation. The fitness function here is the root mean square error of the inhibitory activities calculated by MLR and ANN methods.

Bayesian regularized genetic neural networks (BRGNN)

BRGNN is another combination of genetic algorithms that is utilized for feature selection. This method has been used by Fernández and co-workers1717 Fernández, M.; Caballero, J.; Bioorg. Med. Chem. 2007, 15, 6298.

18 Caballero, J.; Tundidor-Camba, A.; Fernández, M.; QSAR Comb. Sci. 2007, 26, 27.

19 Caballero, J.; Fernández, M.; González-Nilo, F. D.; Bioorg. Med. Chem. 2008, 16, 6103.

20 Caballero, J.; Garriga, M.; Fernández, M.; Bioorg. Med. Chem. 2006, 14, 3330.

21 Fernández, M.; Caballero, J.; Fernandez, L.; Abreu, J. L.; Garriga, M.; J. Mol. Graphics Modell. 2007, 26, 748.

22 Fernández, M.; Abreu, J. L.; Caballero, J.; Garriga, M.; Fernández, L.; Mol. Simul. 2007, 33, 1045.

23 Fernández, M.; Caballero, J.; Chem. Biol. Drug Des. 2006, 68, 201.
-2424 Fernández, M.; Carreiras, M. C.; Marco, J. L.; Caballero, J.; J.Enzyme Inhib. Med. Chem. 2006, 21, 647. for modeling of enzyme inhibitors, calcium entry blockers and protein conformational stability.1717 Fernández, M.; Caballero, J.; Bioorg. Med. Chem. 2007, 15, 6298.

18 Caballero, J.; Tundidor-Camba, A.; Fernández, M.; QSAR Comb. Sci. 2007, 26, 27.

19 Caballero, J.; Fernández, M.; González-Nilo, F. D.; Bioorg. Med. Chem. 2008, 16, 6103.

20 Caballero, J.; Garriga, M.; Fernández, M.; Bioorg. Med. Chem. 2006, 14, 3330.

21 Fernández, M.; Caballero, J.; Fernandez, L.; Abreu, J. L.; Garriga, M.; J. Mol. Graphics Modell. 2007, 26, 748.

22 Fernández, M.; Abreu, J. L.; Caballero, J.; Garriga, M.; Fernández, L.; Mol. Simul. 2007, 33, 1045.

23 Fernández, M.; Caballero, J.; Chem. Biol. Drug Des. 2006, 68, 201.
-2424 Fernández, M.; Carreiras, M. C.; Marco, J. L.; Caballero, J.; J.Enzyme Inhib. Med. Chem. 2006, 21, 647. In BRGNN approach the error function of Bayesian regularized artificial neural networks (BRANN) is being used as objective function in genetic algorithm for optimization. Actually, each time that GA produces a new population, the created chromosomes serve as inputs to ANN and would be related to the biological activities by the mean of determining the weights and biases in the BRANN model. Thus, GA finds the best chromosomes in order to minimize the residual error between true inhibitory activity values and their calculated quantities. It will be shown that this variable selection method performs more desirable than the others, since it selects the descriptors based on their better correlation with dependent variables in a non-linear manner.

Monte Carlo cross-validation

In order to assess the utility of the model, an estimated model must then be validated. One of the most effective methods in this case is Monte Carlo cross-validation. By definition, this technique involves a large number of random splits of the dataset repeatedly, in each of which the available data are divided into two groups to be used for the fitting and testing. The criterion, e.g. root mean square error, is averaged over all repeated splits, so as to not tie the measure to one particular division of the data. Moreover, it is essential that the repeats in this modeling exercise incorporate all steps involved in the modeling process, including both feature selection and modeling development.2525 Hawkins, D. M.; Kraker, J.; J. Chemometr. 2010, 24, 188.,2626 Esbensen, K. H.; Geladi, P.; J. Chemometr. 2010, 24, 168.

Sensitivity analysis

There have been a lot of attempts to extract meaning from neural network models because of their complicated structure. Interpretation of ANN can be considered in broad and detailed forms. Broad interpretation characterizes how important an input neuron is to the predictive ability of the model and therefore ranks the input descriptors in order of importance, while the aim of detailed interpretation is to extract the structure-activity trends in an ANN model to indicate how an input descriptor correlates to its corresponding predicted value. Broad interpretation is essentially a “sensitivity analysis” of the neural networks. In this case Guha and Jurs2727 Guha, R.; Jurs, P. C.; J. Chem. Inform. Model. 2005, 45, 800. have presented a method to measure the importance property of the descriptor in QSAR model.

Data set

The dataset consists of 60 molecules of arylquinoline amide derivatives together with their inhibitory activities which was gathered from a recently published article by Corbett et al. in 2010.1010 Corbett, J.; W.; Freeman-Cook, K. D.; Elliott, R.; Vajdos, F.; Rajamohan, F.; Kohls, D.; Marr, E.; Zhang, E.; Tong, L.; Tu, M.; Murdande, S.; Doran, S. D.; Houser, J. A.; Song, W.; Jones, C. J.; Coffey, C. B.; Buzon, L.; Minich, M. L.; Dirico, K. J.; Tapley, S.; McPherson, R. K.; Sugarman, E.; Harwood, H. J.; Esler, W.; Bio. Med. Chem. Lett. 2010, 20, 2383. In this work, both ACC1 and ACC2 inhibitory activities were considered as dependent variable for the modeling. The activities involve IC50 and ligand efficiency (LE) values for rat ACC1 (rACC1) and also IC50 values for human ACC2 (hACC2) inhibitors. The IC50 values of rACC1 and hACC2 inhibitors were converted to the logarithmic scale (pIC50). The LE values of rACC1 inhibitors together with the pIC50 values of hACC2 and rACC1 inhibitors were then used as dependent variables, in our QSAR study. The main skeleton of arylquinoline analogues are given in Figure 1 and the list of the inhibitory activities are displayed in Table 1. Prior to the calculation of the descriptors, the studied compounds were geometrically optimized using semi-empirical AM1 method implemented in Hyperchem software.2828 Hypercube, Inc.; Hyperchem 7; A professional molecular modeling environment, USA, 2007. The 3-dimensional structure of the molecules was encoded to a diverse set of molecular descriptors (up to 1497) using Dragon software.2929 Todeschini R.; Consonni V.; Mauri A.; Pavan M.; Dragon 3 web version: Calculation of Molecular Descriptors, Department of Environmental Sciences, University of Milan - Bicocca, Italy, 2003. The first action taken on the dataset was to remove highly correlated descriptors. Therefore, descriptors with zero or constant values and also descriptors with correlation coefficients higher than 90% were removed from the whole set of independent variables. Thus, the data set consisting of 60 compounds and 657 descriptors was prepared and used for further analysis.

Figure 1
The main skeleton of the ACC inhibitors
Table 1
The structural information of the ACC inhibitors together with the experimental and calculated pIC50 values

Different variable selection techniques and BRANN modeling

Four different variable selection algorithms were utilized to choose the best variables among the large number of descriptors and then these selected descriptors were used for modeling using BRANN approach. The corresponding m-files of the algorithms were prepared and run in MATLAB software, version R2010a3030 Matlab 2010a; A sophisticated mathematical calculation environment; Mathworks; USA; 2010. and in each of these methods, in order to deal with overfitting on both variable selection and modeling procedures, Monte Carlo cross validation and Bayesian regularization formalism were used, respectively. The “neural network” and “global optimization” toolboxes in MATLAB have been used for running BRANN and GA, respectively. The specification of the parameters of GA and BRANN applied for optimization are given in Table S1 in supplementary material section. In order to run the BRGNN algorithm, the BRANN models must be stable inside the GA paradigm. The stability of the BRANN during the GA procedure has been investigated in our previous works.1111 Jalali-Heravi, M.; Mani-Varnosfaderani, A.; QSAR Comb. Sci. 2009, 28, 946.,3131 Jalali-Heravi, M.; Mani-Varnosfaderani, A.; SAR QSAR Env. Res. 2011, 22, 293. Similar to our previous works, we observed a thorough reproducibility with near zero standard deviations in BRANN models in present study.

Figure 2 shows a flowchart of the overall algorithm of feature selection and modeling process in this work. It begins by first splitting of the dataset randomly into two parts of prediction set (20% of the raw data) and calibration set (the remaining 80% of the raw data). This procedure repeats for 1000 times. The variable selection on the calibration set, again, repeats for 200 times. In each variable selection step, the calibration set is randomly divided into two subsets (80% training set and 20% test set), thus leading to the removal of overfitting with a high probability. By the mean of this sampling algorithm, the best and most frequent variables would be chosen. In each run of 1000 runs, after finishing repeating variable selection 200 times, the prediction set with the best selected variables would be applied to the regression algorithm as inputs to fit and evaluate the model's prediction ability, and then the algorithm continues this procedure until 1000 runs are completed.

It must be noticed that in the case of GA-MLR and BRGNN feature selection methods, the algorithm was defined such as to save and consider the chromosomes of hundreds of the last generations out of 200 generations in each GA run instead of running only the last chromosome. By running this GA variable selection method, adequate information was available to select the best variables among the whole pool of them. Moreover, this makes confidence that the selected variables include all essential descriptors for modeling with regression model and no important variable is missed.

Figure 2
The flowchart of the Monte Carlo sampling and modeling procedure used in the present contribution

Calculation of the importance of molecular descriptors

According to Guha and Jurs,2727 Guha, R.; Jurs, P. C.; J. Chem. Inform. Model. 2005, 45, 800. an algorithm was developed for sensitivity analysis of the constructed ANN models. After running variable selection methods for ANN based algorithms, the set of the best selected descriptors will be indicated. Then in the sensitivity analysis method, each of these input descriptors would be iteratively replaced with random vectors. At each iteration, the neural network would be trained and validated with the new set of input descriptors and the root mean square error (RMSE) and correlation coefficient would be calculated. As a result, the RMSE for these new predictions is expected to be larger than the original RMSE (when the individual descriptors were not replaced with random vectors). Each time that a descriptor would be replaced with random numbers, the difference between these two RMSE values indicates the importance of that specified descriptor to the model's predictive ability. That is, if a descriptor plays a major role in the prediction of the model, changing that descriptor will lead to greater loss in predictive ability (higher root mean square error) than for a descriptor that does not play such an important role in the model. After reporting this procedure for all of the descriptors present in the model, iteratively, we can rank the descriptors in order of importance.

Results and Discussion

Comparison of different feature selection methods

For investigating the effect of linear and non-linear variable selection methods, two different approaches were proposed: 1) stepwise-MLR, GA-MLR and SPA as the linear based methods; 2) BRGNN as non-linear one. The results demonstrated that the variables which were chosen in a way in which non-linear interactions were considered between the descriptors and the activity of compounds, were totally different compared to linear based selection methods and showed even more acceptable correlation to the activities. Table 2 represents the calibration and prediction set results for four feature selection methods followed by BRANN modeling. The results revealed that for modeling both pIC50 (rACC1, hACC2) and LE (rACC1) values, BRGNN is a robust algorithm for the variable selection and modeling procedures, simultaneously. The best selected descriptors for modeling the pIC50 and LE values are also given in Table S2 in Supplementary Information (SI) section.

Table 2
The results of the linear and nonlinear methods for modelling the ACC inhibitory activities

Moreover, it is important to notice that for modeling the rACC1 and hACC2 activities, the best variables selected after repeating GA-MLR-BRANN and stepwise-MLR-BRANN procedures were approximately the same, remarking this point that in large numbers of repetitions (1000 Monte Carlo sampling), the effect of genetic algorithm on MLR becomes negligible. For further explanation on this view point, it could be assumed that in stepwise-MLR-ANN algorithm, since splitting of the dataset was done randomly and in high repetitions (1000 × 200), the results would be somewhat near the GA-MLR-ANN results. This is actually because of random based structure of GA in finding the solution. In addition, we have discovered that in early generations of GA-MLR the selected variables are entirely different from stepwise-MLR. But, as the algorithm continues, in the “latter” generations, the results become similar in these two methods and they would give the more or less same descriptors, revealing the essential effect of crossover and mutation on finding the best solution in genetic algorithms.

Robustness of BRGNN

The combination of genetic algorithms and BRANN is named BRGNN and has superior properties to the previous BRANN techniques. In the BRGNN algorithm there is no more concern about overfitting, since a compromise is been made between the weights and RMSE by the Bayes theorem, as mentioned earlier. Moreover, this method is able to deal with the non-linear interactions in complex systems and has the potential to suitably correlate the molecular descriptors of the compounds to their biological activities. Better prediction ability, more reproducibility and generalization power of BRGNN verify the advantages of this superior technique (see Table 2). The mean values of the predicted pIC50 (rACC1 and hACC2) and LE (rACC1) values obtaining using BRGNN models are listed in Table 1. This table shows that the calculated values of pIC50 and LE are good estimates of the experimental ones. The correlations between the experimental and calculated values of pIC50 and LE for the calibration and prediction sets are shown in Figure 3. For further evaluation of the BRGNN models, the values of modified correlation coefficient (r2m)3232 Roy, P. P.; Paul, S.; Mitra, I.; Roy, K.; Molecules 2009, 14, 1660. for the molecules of the prediction set have been calculated and given in Table 2. According to the statistic summary presented in Table 2, the high values of r2 and r2m together with low RMSE values for BRGNN, confirm the robustness of this technique between the other methods.

Figure 3
The plot of the calculated pIC50 values against the experimental ones, for the calibration and prediction sets. (a) rACC1(pIC50); (b) hACC2(pIC50); (c) rACC1 (LE) as dependent variable.

In order to check the chance correlation, the Y-randomization test has been performed. The vector of activities for the calibration set was randomly shuffled and used as dependent variable for the modeling. The developed models were used for the calculation of the ACC inhibitory activities of the molecules of the prediction set. The mean values of the corrected correlation coefficients (cR2)3333 Mitra, I.; Saha, A.; Roy, K.; Mol. Simul. 2010, 13, 1067. for 100 times of Y-randomization were 0.385, 0.442 and 0.359 for the rACC1 (pIC50), hACC2 (pIC50) and rACC1 (LE) activities, respectively. Poor values for the cR2prediction revealed that the results of BRGNN models are not due to a chance correlation and the developed models are reliable.

The residuals of the calculated values of the pIC50 and LE are plotted against the experimental values in Figure 4. This figure shows that the residuals are normally distributed around zero and therefore the models are not biased with a systematic error. More visual descriptors of the BRGNN's result can be seen in Figure S1-S3 in SI section. Figure S1a is a frequency plot of the variables, showing how many times each descriptor has participated in ANN model for 1000 times repetition of BRGNN algorithm. The higher a variable's frequency, the more it has been displayed in GA generations and the better it correlates to activities. Figures S2 and S3, are histograms showing the correlations obtained from 1000 repetitions of BRGNN on calibration and prediction sets, respectively.

Figure 4
The residuals of the calculated pIC50 values against the experimental values for the calibration and prediction sets. (a) rACC1(pIC50); (b) hACC2(pIC50); (c) rACC1 (LE) as dependent variable.

Comparison of linear MLR and non-linear ANN modeling

For further investigation in this work, we have applied linear modeling as well as non-linear modeling to both linear and non-linear variable selection methods. Again, the data was randomly divided to two subsets (calibration and prediction sets) and the training set was then subjected to different feature selection techniques (including stepwise-MLR, SPA and GA-MLR) to find out the best variables correlating to the activities. This process continued with the multiple linear regression modeling. The statistical results and the best selected descriptors in this approach are shown in Table 2 and Table S2, respectively. The higher correlation values in ANN based methods, compared to the MLR's correlation values, inform that neural networks has performed more accurate than linear models.

In QSAR models, the ANN models are generally used as purely predictive tools rather than as an aid in the understanding of structure-activity trends. More flexibility in ANNs enables them to be powerful predictor tools, while in contrast, MLR models are only capable in modeling linear functions and so they have less accurate predictions. On the other hand, interpretability is a serious concern in ANN models, while linear models could be interpreted in a simple manner.

The specifications of the SPA-MLR in Table 2, indicate that this variable selection method was better than others in MLR modeling, so relative mean effect (RME) for the variables selected by this method have been calculated and given in Table 3. The calculated RME values reveal the importance of mean Sanderson electronegativities, polarity and positive charge of molecules on their rACC1 and hACC2 inhibitory activities, which is in agreement with the results of other variable selection and modeling algorithms.

Table 3
The relative mean effects of the SPA-MLR selected molecular descriptors

The coefficient of the selected molecular descriptors for SPA-MLR models together with their intercept values are given in Table 4. This information helps for deriving a simple and efficient view about the rACC1 and hACC2 inhibitory activity and also rACC1 LE of the molecules. The calculated coefficients of molecular descriptors in this table reveal that increasing the positive charge of the molecules enhances their ACC inhibitory activities.

Table 4
The summary of the SPA-MLR models for modelling the rACCl, hACC2 inhibitory activities

Sensitivity analysis

At the end of the BRGNN modeling process a set of the best selected variables were identified, based on their higher frequency in incorporating in the model development. These selected descriptors were then applied to “sensitivity analysis” to measure their relative importance in predictive ability of the model. The results are shown in Table 5. The information contained in this table is more easily seen in the descriptor importance plot shown in Figure 5. The more the RMSE enhancement for each variable demonstrates the more importance of that descriptor in the predictive ability of the model.

Table 5
The results of sensitivity analysis on BRGNN's selected molecular descriptors

The results from sensitivity analysis indicate that Galvez topological charge index of order 63434 Deconinck, E.; Zhang, M. H.; Petitet, F.; Dubus, E.; Ijjaali, I.; Coomans, D.; Anal. Chim. Acta 2008, 609, 13. (GGI6) is the most important descriptor for describing the rACC1 inhibitory activity, which approximately agrees with the results from the frequency plot in Figure S1a, mentioned earlier. The importance priority of GGI6 in BRGNN model emphasizes its role in modeling the rACC1 activity. Moreover, it is intelligible from the results that although the linear and non-linear models have no descriptors in common, the type of the most important descriptor in both cases is the same, charge indices. Inspection of Figure 5b and 5c shows that the charge descriptors play a major role for describing the hACC2 inhibitory activities and also rACC1 ligand efficiencies. This outlook may be a guide to find out a trend in the rACC1 inhibitors towards their inhibitory activity which is thereby related to the local charge of compounds.

Figure 5
Importance plot of the variables in BRGNN model. (a) rACC1(pIC50); (b) hACC2(pIC50); (c) rACC1 (LE) as dependent variable.

The appearance of the molecular descriptors of total positive charge (TPC), relative positive charge (RPC), fragment-based polar surface area (F-PSA), unipolarity3636 Todeschini, R.; Consonni, V.; Handbook of molecular descriptors, methods and principles in medicinal chemistry, Wiley-VCH: Weinheim, 2000. (UniP), polarity number3737 Platt, R. J.; J. Phys. Chem. 1952, 56, 328. (Pol), and GGI6 in the linear and non-linear QSAR models in this work shows the critical role of the charge and polarity of the molecules on their inhibition behavior.

The TPC and RPC descriptors stand for “total” and “relative” positive charge in molecules. The data in Table 3 implies that the contributions of these molecular descriptors on rACC1 (LE) and hACC2 (pIC50) activities are 62% and 74%, respectively. The positive coefficients in Table 4 suggest that the TPC and RPC descriptors have positive effect on ACC bioactivities. Moreover, the most important variable selected by BRGNN algorithm is GGI6 and represents the total amount of charge transfer in the molecules.3434 Deconinck, E.; Zhang, M. H.; Petitet, F.; Dubus, E.; Ijjaali, I.; Coomans, D.; Anal. Chim. Acta 2008, 609, 13. Selection of GGI6 together with TPC and RPC in this work implies the inhibition mechanism is an electrical based procedure. Regarding the definition of these molecular descriptors, it can be concluded that the positive charge of molecules has considerable effect on their ACC inhibitory activities and therefore this molecular property should be taken into account for designing novel and potent compounds incorporated in treatment of diabetes type II and metabolite syndrome.

The F-PSA is the fragment-based polar surface area and encodes the surface in molecule belonging to polar atoms. This molecular descriptor was shown to correlate well with passive molecular transport through membranes and, therefore, allows prediction of transport properties of drugs (penetration and intestinal absorption).3535 Ertl, P.; Rohde, B.; Selzer, P.; J. Med. Chem. 2000, 43, 3714. The appearance of this molecular descriptor in this work highlights the significant role of “diffusion” on ACC inhibitory activity.

The UniP and Pol descriptors are the unipolarity3636 Todeschini, R.; Consonni, V.; Handbook of molecular descriptors, methods and principles in medicinal chemistry, Wiley-VCH: Weinheim, 2000. and polarity number3737 Platt, R. J.; J. Phys. Chem. 1952, 56, 328. of molecules. These molecular descriptors can be easily calculated by using the distance matrix of the H-depleted molecular graph.3737 Platt, R. J.; J. Phys. Chem. 1952, 56, 328.,3838 Wiener, H.; J. Phys. Chem. 1947, 69, 17. The polarity number is the number of pairs of vertices at a topological distance equal to three.3737 Platt, R. J.; J. Phys. Chem. 1952, 56, 328.,3838 Wiener, H.; J. Phys. Chem. 1947, 69, 17. It is usually assumed that the polarity number accounts for the flexibility of acyclic structures in a molecule. The unipolarity is the minimum value of the vertex distance degrees in a molecular graph. This parameter is inversely related to the local flexibility in molecules.3939 Lieth, C. W.; Stumpf-Nothof, K.; Prior, U.; J. Chem. Inf. Comput. Sci. 1996, 36, 711.,4040 Fechner, N.; Jahn, A.; Hinselmann, G.; Zell, A.; J. Chem. Inf. Model. 2009, 49, 549. The data in Table 4, suggests the positive effect of UniP on ACC inhibitory activity, while it proposes a negative effect for polarity number. These two molecular descriptors emphasize on the role of bond flexibility on the inhibitory activity of studied compounds.

The previous structure-activity relationship studies together with homology modeling and docking investigations have found some similar outlines on ACC inhibitory activities.4141 Bhadauriya, A.; Dhoke, G. V.; Gangwal, R. P.; Damre, M. V.; Sangamwar, A. T.; Mol. Diversity 2013, 17, 139.

42 Chonan, T.; Tanaka, H.; Yamamoto, D.; Yashiro, M.; Oi, T.; Wakasugi, D.; Ohoka-Sugita, A.; Io, F.; Koretsune, H.; Hiratate, A.; Bioorg. Med. Chem. Lett. 2010, 20, 3965.
-4343 Yamashita, T.; Kamata, M.; Endo, S.; Yamamoto, M.; Kakegawa, K.; Watanabe, H.; Miwa, K.; Yamano, T.; Funata, M.; Sakamoto, J. I.; Tani, A.; Mol, C. D.; Zou, H.; Dougan, D. R.; Sang, B.; Snell, G.; Fukatsu, K.; Bioorg. Med. Chem. Lett. 2011, 21, 6314. These works suggest potent hydrogen bonding between the carbonyl group adjacent to the anthracene of CP-640186 and the main-chain amide N atom of Glu2230 in ACC structure. Another weak hydrogen bond was observed between the carbonyl group adjacent to the morpholine and the amide N atom of Gly2162 in ACC structure. Singh et al.4444 Singh, U.; Gangwal, R. P.; Dhoke, G. V.; Prajapati, R.; Damre, M.; Sangamwar, A. T.; Arab. J. Chem., in press, DOI: 10.1016/j.arabjc.2012.10.023.
https://doi.org/10.1016/j.arabjc.2012.10...
reported a comparative molecular field analysis (CoMFA) and comparative molecular similarity analysis (CoMSIA) for modeling the inhibitory activity of ACC inhibitors. They suggest the critical role of electropositive potential near carbamol functional group on ACC inhibitory activity. Some important spatial steric issues have also been found by them affecting the activity.4444 Singh, U.; Gangwal, R. P.; Dhoke, G. V.; Prajapati, R.; Damre, M.; Sangamwar, A. T.; Arab. J. Chem., in press, DOI: 10.1016/j.arabjc.2012.10.023.
https://doi.org/10.1016/j.arabjc.2012.10...
These studies represent the role of positive charge and shape of molecules on activity of compounds and this is in agreement with our findings.

Conclusions

The main aim of the present work was to develop a QSAR model for predicting the inhibitory activity of the ACC inhibitors. Since variable selection is a critical step in every QSAR study, four different algorithms based on Monte Carlo cross-validation techniques were investigated. Comparing the results of stepwise-MLR, MLR-ANN, SPA-ANN, GA-MLR, GA-MLR-ANN and BRGNN dedicates that the last model selects the best variables for predicting the inhibition action of ACC inhibitors. In addition to non-linear ANN modeling, the preceding procedure was repeated for linear MLR modeling. A sensitivity analysis was done to characterize the relative importance of descriptors. By representing the most important descriptor in ANN modeling and ranking the present variables by importance, we have reduced the black-box limitation of the neural network methodology, to some extent. The sensitivity analysis of models and relative mean effect of the molecular descriptors in this work have shown that the positive charge of the molecules has considerable effect on their ACC inhibitory activities. This is in agreement with previous studies about the inhibitors of ACC enzyme whish have emphasized on the role of electrostatic interactions on ACC inhibitory activity. Generally, the results of present contribution would help for better understanding the mechanism of the ACC inhibitory activity of arylquinoline amide derivatives and would be useful for medicinal chemists dealing with optimization of this series of compounds.

Acknowledgments

I would like to thank Ms. Elham Bakhtiary for her great comments and helpful guidelines in this project.

References

  • 1
    Havale, S. H.; Pal, M.; Bio. Med. Chem 2009, 17, 1783.
  • 2
    Wild, S.; Roglic, G.; Green, A.; Diabetes Care 2004, 27, 1047.
  • 3
    http://www.who.int/mediacentre/factsheets/fs312/en/, accessed in February 2015.
    » http://www.who.int/mediacentre/factsheets/fs312/en/
  • 4
    Kramer, H.; Cao, G.; Dugas, L.; Luke, A.; J. Diabetes Complications 2010, 24, 368.
  • 5
    Kahn, S. E.; Hull, R. L.; Utzschneider, K. M.; Nature 2006, 444, 840.
  • 6
    Polakis, S. E.; Guchhait, R. B.; Zwergel, E. E.; Lane, M. D.; Cooper, T. G.; J. Biol. Chem 1974, 249, 6657.
  • 7
    Tong, L.; Harwood, H. J.; J. Cell. Biochem 2006, 99, 1476.
  • 8
    Castle, J. C.; Hara, Y.; Raymond, Ch. K.; Garrett-Engele, Ph.; Ohwaki, K.; Kan, Zh.; Kusunoki, J.; Johnson, J. M.; Plos One 2009, 4, 4369.
  • 9
    Freeman-Cook, K.; Amor, P.; Bader, S.; Buzon, L. M.; Coffey, S. B.; Corbett, J. W.; Dirico, K. J.; Doran, S. D.; Elliott, R. L.; Esler, W.; Guzman-Perez, A.; Henegar, K. E.; Houser, J. A.; Jones, C. A.; Limberakis, C.; Loomis, K.; McPherson, K.; Murdande, S.; Nelson, K. L.; Phillion, D.; Pierce, B. S.; Song, W.; Sugarman, E.; Tapley, S.; Tu, M.; Zhao, Z.; J. Med. Chem 2012, 55, 935.
  • 10
    Corbett, J.; W.; Freeman-Cook, K. D.; Elliott, R.; Vajdos, F.; Rajamohan, F.; Kohls, D.; Marr, E.; Zhang, E.; Tong, L.; Tu, M.; Murdande, S.; Doran, S. D.; Houser, J. A.; Song, W.; Jones, C. J.; Coffey, C. B.; Buzon, L.; Minich, M. L.; Dirico, K. J.; Tapley, S.; McPherson, R. K.; Sugarman, E.; Harwood, H. J.; Esler, W.; Bio. Med. Chem. Lett 2010, 20, 2383.
  • 11
    Jalali-Heravi, M.; Mani-Varnosfaderani, A.; QSAR Comb. Sci 2009, 28, 946.
  • 12
    Winkler, D. A.; Brief Bioinform 2002, 1, 73.
  • 13
    Anderson, C. M.; Bro, R.; J. Chemometr 2010, 24, 728.
  • 14
    Brereton, R. G.; Chemometrics: Data Analysis for the Laboratory and Chemical Plant; Wiley: New York, 1995.
  • 15
    Brown, S. D.; Tauler, R.; Walczale, B.; Comprehensive Chemometrics; Elsevier, 2009.
  • 16
    Leardi, R.; J. Chemometr 2001, 15, 559.
  • 17
    Fernández, M.; Caballero, J.; Bioorg. Med. Chem 2007, 15, 6298.
  • 18
    Caballero, J.; Tundidor-Camba, A.; Fernández, M.; QSAR Comb. Sci. 2007, 26, 27.
  • 19
    Caballero, J.; Fernández, M.; González-Nilo, F. D.; Bioorg. Med. Chem 2008, 16, 6103.
  • 20
    Caballero, J.; Garriga, M.; Fernández, M.; Bioorg. Med. Chem 2006, 14, 3330.
  • 21
    Fernández, M.; Caballero, J.; Fernandez, L.; Abreu, J. L.; Garriga, M.; J. Mol. Graphics Modell. 2007, 26, 748.
  • 22
    Fernández, M.; Abreu, J. L.; Caballero, J.; Garriga, M.; Fernández, L.; Mol. Simul 2007, 33, 1045.
  • 23
    Fernández, M.; Caballero, J.; Chem. Biol. Drug Des 2006, 68, 201.
  • 24
    Fernández, M.; Carreiras, M. C.; Marco, J. L.; Caballero, J.; J.Enzyme Inhib. Med. Chem 2006, 21, 647.
  • 25
    Hawkins, D. M.; Kraker, J.; J. Chemometr 2010, 24, 188.
  • 26
    Esbensen, K. H.; Geladi, P.; J. Chemometr 2010, 24, 168.
  • 27
    Guha, R.; Jurs, P. C.; J. Chem. Inform. Model 2005, 45, 800.
  • 28
    Hypercube, Inc.; Hyperchem 7; A professional molecular modeling environment, USA, 2007.
  • 29
    Todeschini R.; Consonni V.; Mauri A.; Pavan M.; Dragon 3 web version: Calculation of Molecular Descriptors, Department of Environmental Sciences, University of Milan - Bicocca, Italy, 2003.
  • 30
    Matlab 2010a; A sophisticated mathematical calculation environment; Mathworks; USA; 2010.
  • 31
    Jalali-Heravi, M.; Mani-Varnosfaderani, A.; SAR QSAR Env. Res 2011, 22, 293.
  • 32
    Roy, P. P.; Paul, S.; Mitra, I.; Roy, K.; Molecules 2009, 14, 1660.
  • 33
    Mitra, I.; Saha, A.; Roy, K.; Mol. Simul 2010, 13, 1067.
  • 34
    Deconinck, E.; Zhang, M. H.; Petitet, F.; Dubus, E.; Ijjaali, I.; Coomans, D.; Anal. Chim. Acta 2008, 609, 13.
  • 35
    Ertl, P.; Rohde, B.; Selzer, P.; J. Med. Chem 2000, 43, 3714.
  • 36
    Todeschini, R.; Consonni, V.; Handbook of molecular descriptors, methods and principles in medicinal chemistry, Wiley-VCH: Weinheim, 2000.
  • 37
    Platt, R. J.; J. Phys. Chem 1952, 56, 328.
  • 38
    Wiener, H.; J. Phys. Chem 1947, 69, 17.
  • 39
    Lieth, C. W.; Stumpf-Nothof, K.; Prior, U.; J. Chem. Inf. Comput. Sci 1996, 36, 711.
  • 40
    Fechner, N.; Jahn, A.; Hinselmann, G.; Zell, A.; J. Chem. Inf. Model 2009, 49, 549.
  • 41
    Bhadauriya, A.; Dhoke, G. V.; Gangwal, R. P.; Damre, M. V.; Sangamwar, A. T.; Mol. Diversity 2013, 17, 139.
  • 42
    Chonan, T.; Tanaka, H.; Yamamoto, D.; Yashiro, M.; Oi, T.; Wakasugi, D.; Ohoka-Sugita, A.; Io, F.; Koretsune, H.; Hiratate, A.; Bioorg. Med. Chem. Lett 2010, 20, 3965.
  • 43
    Yamashita, T.; Kamata, M.; Endo, S.; Yamamoto, M.; Kakegawa, K.; Watanabe, H.; Miwa, K.; Yamano, T.; Funata, M.; Sakamoto, J. I.; Tani, A.; Mol, C. D.; Zou, H.; Dougan, D. R.; Sang, B.; Snell, G.; Fukatsu, K.; Bioorg. Med. Chem. Lett 2011, 21, 6314.
  • 44
    Singh, U.; Gangwal, R. P.; Dhoke, G. V.; Prajapati, R.; Damre, M.; Sangamwar, A. T.; Arab. J. Chem, in press, DOI: 10.1016/j.arabjc.2012.10.023.
    » https://doi.org/10.1016/j.arabjc.2012.10.023

Data availability

Publication Dates

  • Publication in this collection
    Mar 2015

History

  • Received
    19 Sept 2014
  • Accepted
    03 Nov 2015
Sociedade Brasileira de Química Instituto de Química - UNICAMP, Caixa Postal 6154, 13083-970 Campinas SP - Brazil, Tel./FAX.: +55 19 3521-3151 - São Paulo - SP - Brazil
E-mail: office@jbcs.sbq.org.br