Interactions between Activin-Like Kinase 5 ( ALK 5 ) Receptor and its Inhibitors and the Construction of a Docking Descriptor-Based QSAR Model

Neste estudo, simulações de dinâmica molecular (MD) e docking foram realizadas a fim de investigar o sítio de ligação e as interações de 61 inibidores com o receptor de kinase-5 do tipo ativina (ALK5). Uma simulação MD foi realizada sobre o receptor para obter sua conformação em água. A análise de docking revelou que interações do tipo ligação hidrogênio e hidrofóbicas desempenham papéis importantes no complexo inibidor-ALK5. Estas interações foram confirmadas por cristalografia de raios X. Além disso, para estudar a estabilidade da conformação do receptor, uma segunda simulação MD sobre o complexo foi realizada em um ambiente aquoso. O raio de giro para o complexo mostrou que a conformação de ALK5 não muda na presença do inibidor. Foram calculados 134 descritores obtidos da estrutura molecular e docking; os mais viáveis foram usados em relações quantitativas entre estrutura química e atividade biológica (QSAR). A regressão de vetores-suporte baseada em mínimos quadrados (LS-SVR) apresentou um modelo confiável com Q = 0,837 e R = 0,917. Finalmente, os tipos de interações e propriedades dos descritores foram usados para propor novos inibidores.


Introduction
][5] Deregulation of TGF-β signaling causes a variety of human diseases including cancer, 6 pancreatic diseases, 7 and hematological malignancies. 8TGF-βs signal is initiated through a heterotetrameric receptor complex that consists of two transmembrane serine/threonine kinase receptor, 'type I' receptor (TbRI) or activin-like kinase 5 (ALK5) and 'type II' (TbRII), which is a transmembrane serine/ threonine kinase receptor. 9Upon TGF-β binding, TbRII phosphorylates ALK5 in the GS region.The activated ALK5 in turn phosphorylates Smad2/Smad3 proteins at the C-terminal SSXS-motif, thereby causing dissociation from the receptor and heteromeric complex formation with Smad4.1][12] Clearly, inhibition of phosphorylation of Smad2/Smad3 by ALK5 could reduce TGF-β-induced pathological fibrosis. 13Studies have shown that several small molecule inhibitors block ALK5 catalytic activity.Ren et al. 14 developed 3D pharmacophore models on 106 ALK5 inhibitors reported in the literature. 14They used the best quantitative pharmacophore model (Hypo1) as a 3D search query on the known ALK5 inhibitors and, finally, selected several compounds for further in vitro and in vivo assay studies.Geldenhuys and Nakamura 15 presented a 3D-QSAR (quantitative structure-activity relationship ) model, comparative molecular field analysis (COMFA), for 23 compounds of ALK5 inhibitors, all of which share a similar central core consisting of imidazole with a pyridine ring. 15Docking results showed that hydrogen bonding groups play an important role in affinity for kinase.Jin et al., 16 Gellibert et al. 17 and Yakymovych et al. 18 have reported different classes of ALK5 inhibitors.Modeling methods such as QSAR models have been found to be valuable in further optimization and development of novel inhibitors in drug design. 19ocking descriptors are calculated based on the major interactions between inhibitors and receptor such as hydrophobic, hydrogen bonding, salt bridge, close contact and π interactions.Thus, the obtained QSAR model from these descriptors could be very useful for rational drug design.It is the objective of the present study to identify the interactions between the ALK5 receptor and these inhibitors.A structure-based QSAR model is also developed to represent relationship between descriptors originating from docking and the activities of the inhibitors.Finally, new potent inhibitors will be proposed based on the results obtained on inhibitor-receptor interactions and the properties of the descriptors.

Preparation of data set
The data set (Table 1) containing the inhibitory activities in the logarithmic scale (pIC 50 = log 1/IC 50 ) was obtained from Jin et al., 16 Gellibert et al. 17 and Yakymovych et al. 18 The basic structures of these inhibitors are shown in Scheme 1.The data set was randomly divided into two groups, the training and the test sets, consisting of 49 and 12 inhibitors, respectively.The training set was used for model generation and the test set for model evaluation.The structure of the inhibitors was drawn in the Hyperchem package version 7.0 (Hypercube, Inc.). 20Energy minimization of these inhibitors was accomplished using the molecular mechanics method with Polak-Ribiere algorithm up to a root mean square gradient of 0.1 kcal mol -1 .The optimized molecular structures were used for docking studies.

Molecular dynamics simulation
Molecular dynamics simulation was performed in two steps.In the first step, the structure of ALK5 was simulated in a water box.In the second, the result of docking of the most potent inhibitor, i.e. 3-(6-methylpyridin-2-yl)-4-( [1,2,4]  triazolo [1,5-α]pyridin-6-yl) pyrazoles-1-carbothioic acid (4-methoxy-phenyl)-amide (inhibitor 14a), was loaded into the MD simulation to obtain receptor conformation.The topology parameters of inhibitor 14a were built using the Dundee PRODRG2.5 server (beta). 22The MD simulation process was carried out by the GROMACS 4.5.1 package using the GROMOS-96 force field. 23Water molecules were added using a simple point charge (SPC216) model. 24Counterions of Na + were added by replacing solvent molecules in order to neutralize the system.The system was then placed in a cubic box with the dimensions 8.63 × 8.63 × 8.63 nm 3 containing 63413 atoms in total.Periodic boundary conditions were applied in the xyz space.Initially, energy minimization was performed before implementing the position restraint procedure.Then, NVT and NPT simulations were carried out.The NVT simulation was performed at a constant 310 K with a Nose-Hoover thermostat.Once the temperature was stabilized, the NPT simulation was performed using the Parrinello-Rahman pressure coupling under a pressure of 1 bar.The particle mesh Ewald (PME) method interaction was used. 25or numerical integrations, the velocity verlet algorithm was used 26 and the initial atomic velocities were generated using a Maxwellian distribution at the given absolute temperature. 27inally, the full system was subjected to 6000 ps MD at 310 K under 1 bar pressure.The MD simulation and result analyses were performed on the open SUSE 11.3 Linux on an Intel Core 2 Quad Q6600 2.4 GHz with 4 GB of RAM.

Molecular images and evaluation of molecular dynamics
All molecular images and animations were produced using molegro virtual docker (MVD) 28  dynamics (VMD) 29 packages.Schematic two-dimensional representations of the docking results were performed using LIGPLOT. 30

Molecular docking
Docking was performed by AutoDock 4.0 program using the Lamarckian genetic algorithm (LGA) method. 31utodock is a suitable program for actual docking simulation. 32The most potent inhibitor reported in Table 1, inhibitor 14a, was docked into the structure extracted from MD trajectories of the receptor.All inhibitors were subsequently docked into the binding site obtained from the receptor and conformation of the inhibitors with the lowest binding free energy was used to calculate molecular descriptors.

Descriptor calculation
Descriptors were calculated based on inhibitor-receptor interactions using AutoDockTools and BINANA (BINding ANAlyzer).AutoDockTools are capable of calculating 8 types of energy values that consist of: (i) estimated free energy of binding (E FreeBind ); (ii) estimated inhibition constant (k i ); (iii) final intermolecular energy (E InterMol ), which is the sum of the following three types of energy (iv) vdW + Hbond + desolv Energy (E VHD ); (v) electrostatic energy (E Elec ); (vi) final total energy (E FTot ); (vii) torsional free energy (E Tors ); (viii) unbounded system's energy (E Unb ), 33 and the Gasteiger charge descriptor.Then, the docking conformer at its minimum E FreeBind was loaded into BINANA to calculate the descriptors.BINANA is a python-implemented algorithm to characterize the binding of inhibitor-receptor complex.The BINANA accepts receptor and inhibitor files generated by AutodockTools in the PDBQT format. 34BINANA descriptors consist of (i) close contacts, BINANA determines all ligand and protein atoms that come within 2.
Theory of least squares support vector regression Support vector regression (SVR) was originally introduced by Cortes and Vapnik. 36Least squares support vector regression (LS-SVR) is an alternative method of SVR described in Suykens et al. 37 LS-SVR always fits a linear relation, y = wx + b, between the independent variable (x) and the dependent one (y), where w is a weight vector and b is the bias term.Consider a given training set {x i ,y i }, i = 1, 2,…, N, with input data x i Є R and output data y i Є R. The error in the prediction for each sample i is defined as: (1)   As in SVR, it is necessary to minimize a cost function C containing a penalized regression error, as in equation 2: (2) Subject to (3)   The parameter γ, which has to be optimized by the user, gives the relative weight of this part as compared to the first part. 38The Lagrangian for this problem is defined as: (4)   where α refers to Lagrange multipliers, which can be either positive or negative due to the equality constraints in the LS-SVR algorithm.Optimum conditions can be obtained by setting all derivatives of L(w, b, e; α) equal to zero.Upon eliminating the variables w and e, one gets the following set of linear equations: (5) (6)   The training samples are only present in equation 5 as the inner product.This allows this standard inner product to be substituted with the most popular choice of the kernel types, i.e. the radial basis kernel function (RBF), defined as: (7)   where σ is the radial basis function of the kernel parameter.RBF function can also handle the nonlinear relationship between the input and the output. 39The free LS-SVM toolbox (LS-SVM V-1.5, Suykens, Leuven, Belgium) was used in MATLAB Version 7.6 to derive all the LS-SVM models. 40

Applicability domain
In order to obtain a reliable prediction for the test compounds, the exploration of applicability domain (AD) for a QSAR model is necessary.Here a leverage approach was used to verify the prediction reliability.A simple measure of the applicability domain of the model is its leverage h i , which is defined as: (8)   where x i is the descriptor row-vector of the query compound i and X is the n × k matrix of the training set (k is the number of model descriptors, and n is the number of training set samples).[43]

Molecular dynamics simulation of the ALK5 receptor
To provide conformation of ALK5 in a water environment, a 6000 ps MD simulation was carried out on ALK5 in a water box.The stability of the system (protein, water, and ions) was tested by means of the root mean square deviations (RMSD's) of protein with respect to protein's initial structure.The RMSD values of ALK5 were plotted from 0 to 6000 ps which imply that the RMSD of the system reaches equilibrium and oscillates around 0.45 nm after 3800 ps of the simulation onset.The average RMSD value of protein backbone was calculated to be 0.45 ± 0.03 nm.The RMSD value indicates that ALK5 conformation reached equilibrium after 3800 ps in a water environment.The equilibrated conformation of the receptor was used for docking.

Molecular docking
In order to determine the binding site in ALK5 receptor, blind docking was performed on receptor with the most potent inhibitor (14a).In blind docking the grid map set to 92 × 92 × 92 Å along the x, y and z axes with 0.375 Å grid spacing.The center of grid map set to 9.218, 1.277 and 12.584 Å.The grid maps were obtained using AutoGrid.The obtained binding site from blind docking was used for docking of all inhibitors.In these dockings, a grid map of the size 60 × 60 × 60 Å with a grid-point spacing of 0.375 Å was generated.The maps were centered on the inhibitor's binding site in a center grid box 8.994, 5.928 and 12.693 Å for the x, y and z centers of the receptor, respectively.For conformational search, docking calculations were carried out using the Lamarckian genetic algorithm (LGA) method with default parameters.The residues of the binding site obtained were Ala230, Ala350, Asp351, Gly286, His283, Ile211, Leu260, Leu278, Leu340, Lys232, Ser287, Ser280, Tyr282, Val219 and Val231 in the cavity of the receptor.The LIGPLOT software was employed to show the hydrophobic and hydrogen bonding interactions.These interactions are shown in Figure 1.The hydrogen bonds (Figure 1) are generated between the His238 residue and the inhibitors.Furthermore, in the X-ray crystal structure of inhibitor 61 with ALK5, 17 there is a hydrogen bond between N1 nitrogen of the inhibitor 61 and the backbone NH of His283.His 283 is a key residue in most reported ALK5 complexes by X-ray in protein data bank (PDB ID: 2WOT and 2WOU).Therefore, obtained binding site from docking is in agreement with X-ray crystallography.Results of molecular docking indicated that the interactions between the inhibitor and ALK5 are dominated by hydrophobic and hydrogen bond.The hydrophobic interactions are generated by the close contacts between the non-polar amino acid side chains of the receptor and the lipophilic groups of the inhibitor.It may be suggested that more hydrophobic interactions at the binding site could improve the inhibitory activity of the inhibitors.

Molecular dynamics simulation of the ALK5-inhibitor complex
In order to determine the effect of inhibitor on the receptor conformation we decided to perform a MD simulation.The simulation procedure was assessed by carrying out a 6000 ps MD simulation on the 14a-ALK5 complex in a water box.Trajectory stability was investigated by analyzing RMSD.The average RMSD values of the protein backbone of the complex and ALK5, Figure 2, were calculated to be 0.47 ± 0.05 nm and 0.45 ± 0.03 nm, respectively.Furthermore, to investigate the conformational changes of the receptor, radius of gyration (Rg) values for the complex and ALK5 were also compared.The average values of Rg, Figure 3, for the complex and ALK5 were calculated to be 0.60 ± 0.05 and 0.59 ± 0.03 nm, respectively.These results indicate that the receptor conformation has not changed in the presence of inhibitor 14a.The stability of the receptor conformation in the presence of the inhibitor confirmed the docking results, since the hydrophobic intermolecular interactions between the inhibitor and the receptor are weak interactions and these interactions do not affect the receptor conformation.

Descriptor calculation and selection
As already mentioned, 132 descriptors were calculated for the inhibitors based on inhibitor-receptor interactions using AutoDockTools, BINANA and Hyperchem software.Once the descriptors had been calculated, those that remained constant for all the molecules were eliminated and pairs of variables with a Pearson correlation coefficient > 0.80 in bivariate correlations were classified as intercorrelated, one of which was eliminated.Subsequently, the stepwise multiple linear regression was carried out on the training set for selecting the most effective descriptors by SPSS software (version 13; SPSS Inc.).Since the number of descriptors should be five times less than that of the molecules in the training set, 44 eight descriptors were selected for constructing the QSAR model.
These descriptors were volume (V), estimated free energy of binding (DG), hydrophobic contacts carboncarbon β-sidechain (HCβ.C-C), summed electrostatic energy acceptor-carbon atoms (SEE.A-C), summed electrostatic energy acceptor-nitrogen atoms (SEE.A-N), summed electrostatic energy nitrogen-accepting nitrogen atoms (SEE.N-NA), summed electrostatic energy acceptor nitrogen-acceptor sulfur atoms (SEE.NA-SA) and number of close contacts between carbon and acceptor oxygen atoms in distance of 4.0 Å of each other (NCC4.C-OA).The correlation matrix obtained for the selected descriptors are given in Table 2.It can be observed that the correlation coefficient between each pair of the descriptors is less than 0.68.

LS-SVR model
The selected descriptors were used as input for constructing the LS-SVR model.In order to generate  a LS-SVR model, an appropriate set of regularization parameters, γ, and kernel parameters such as σ 2 for the radial basis function (RBF) kernel should be chosen.In this work, the radial basis kernel function was used because it is a non-linear function that can decrease the computational complexity of the training procedure. 45,46The next step in the construction of the LS-SVR model is optimization of its parameters, including γ and σ 2 , which were found to be 191.4 and 214.8, respectively, using the grid search method.The values of correlation coefficient (R) for training and test set in the LS-SVR model as a non-linear model were found to be 0.929 and 0.917, respectively.

LS-SVR model validation
The generated QSAR model was validated by root mean square error (RMSE) and cross-validation (Q 2 ) of the test set.RMSE was calculated as in equation 9, (9)   where N is the number of data patterns in the independent data set, y pre,i designates the predicted value, and y ex,i is the experimental value of one data point i.The values of RMSE for the training and test set in LS-SVR model were calculated to be 0.453 and 0.560, respectively.As recommended by Tropsha and co-workers, 47,48 several statistical parameters were used for evaluating the significance and predictive power of QSAR models.A proper and predictive model should have a Q 2 value larger than 0.5 and a correlation coefficient of prediction (r 2 ) above 0.6.The Q 2 value is calculated from equation 10, (10)   where yi and ŷ i are the measured and predicted values of the dependent variable (over the test set), respectively, andy tr is the average value of the dependent variable for the training set.The summation runs over all compounds in the test set.
The calculated statistical parameters of LS-SVR model are presented in Table 3.As can be seen in Table 3, the statistical parameters for the LS-SVR model show the excellent performance of the model.Table 1 presents the predicted values inhibitory activity.The predicted LS-SVR were plotted versus their experimental values, Figure 4, indicating a good agreement the experimental and predicted values.
Moreover, in order assess the predictive power of QSAR model, correlation coefficients between predicted and experimental values of inhibitory activity of compounds from an external prediction set (r 2 p ), correlation coefficients for regressions through the origin (predicted versus experimental inhibitory activities, or experimental versus predicted activities, i.e., r 2 p or r '2 p , respectively) and the slope of the regression lines through the origin (k and k', respectively) were also calculated.As suggested by Tropsha, a suitable QSAR model should satisfy all  49 For a model with good external predictability, the value for r m 2 should be > 0.5.The value of r m 2 was found to be 0.766 in the LS-SVR model.The results are given in Table 3.According to recommendation of Tropsha and Golbraikh, 47,48 if difference between R 2 and Q 2 did not exceed 0.3, there is no over-fitting in the model.In the present work, the difference between R 2 and Q 2 was 0.004, indicating no over-fitting in the LS-SVR model.

Applicability domain
The predictive reliability of LS-SVR model was carried out using applicability domain.In this study, the value of warning leverage was calculated (h* = 0.55).A leverage value greater than h* for the training set (h > h*) means that the inhibitor seriously influences on the model, while for the test set, it means that the prediction is the result of substantial extrapolation of the model and could not be reliable.To visualize the applicability domain of LS-SVR model, the standardized residuals versus leverage values were plotted and the results are shown in Figure 5.A compound with cross-validated standardized residual greater than 3σ (3 standard deviation units) is recognized as Y outlier.Figure 5 shows that all predictions are reliable for LS-SVR model and there is no outlier compound both for training and test sets, indicating the reliability of the predictions.
The results of statistical parameters (Table 3) and applicability domain represented that the LS-SVR model had a high predictive ability and was able to fulfill all the criteria to be accepted as a predictive model for the inhibitory activity.

Sensitivity analysis
In order to determine the relative importance of each descriptor in the LS-SVR model, a developed sensitivity analysis approach was used to rank the importance of the input descriptors of the LS-SVR model. 50The performed sensitivity analysis is based on consecutive removal of descriptors by zeroing the specific connection weights for an input descriptor in the LS-SVR model.According to this method, the differences between the root-mean-square error (RMSE) of the complete model's prediction and the RMSE obtained when the ith variable is excluded from the trained model (RMSE i ) reveals the importance of the ith input descriptor: It is obvious that the most important descriptor is the one that leads to the highest value of Rmdiff.The values of Rmdiffi for the LS-SVR model are plotted in Figure 6   For each atom-type pair of atoms that come within 4.0 Å of each other, a summed electrostatic energy is calculated using the Gasteiger partial charges assigned by AutoDockTools: (12)   where V (a,b) is the summed electrostatic interaction energy of all the atoms of types a and b, respectively, q a i is the partial atomic charge of an atom of type a, q b i is the partial atomic charge of an atom of type b, and r a i b i is the distance between a and b atoms. 34Another descriptor in the model is numbers of atom-type pair counts at a distance of 4.0Å from C-OA.This descriptor shows the number of contacts between carbon-oxygen atoms of the receptor and the inhibitor located at a distance of 4.0Å. 35In this distance there are the van der Waals interactions between carbon atoms with oxygen.The Van der Waals attractions operate over only a very limited range of interaction distances between 3 and 6 Å. 51,52 The next descriptor that represents a significant effect on the inhibitory activity of the inhibitors is hydrophobic contacts (β-sidechain) of carbon-carbon interaction between the receptor and the inhibitors obtained from BINANA.Volume is another important descriptor in interaction between receptor with inhibitor which can affect on the inhibitory activity of inhibitors.The final descriptor in this work is free energy of binding defined as follows: (13)   where, ∆G vdw = van der Waals or Lennard-Jones potential, ∆G elec = electrostatic factor with distance-dependent dielectric, ∆G hbond = H-bonding potential with directionality, ∆G desolv = charge-dependent variant of volume-based atomic solvation, ∆G tors = torsional energy based on the number of rotatable bonds and ∆G intermol = intermolecular energy of protein and inhibitor molecules. 53

Proposed potent inhibitors
New potent inhibitors were proposed based on the inhibitor-receptor interactions and descriptor properties.A new inhibitor was proposed based on the structure of inhibitor a.In this inhibitor, R was replaced by a pyrrole ring in para position, which exhibited the highest activity in this category of the inhibitors.In addition, two new inhibitors were proposed based on inhibitor b with substituents H and F in the R position with the highest activity.The values of predicted pIC 50 for these inhibitors are given in Table 1.

Conclusions
In this work molecular docking was carried out to explore the binding site.Docking analysis showed that hydrogen bonding and hydrophobic interactions are important factors in the interactions between the inhibitors and receptor.After docking of the most potent inhibitor with receptor, the residues of the binding site obtained were Ala230, Ala350, Asp351, Gly286, His283, Ile211, Leu260, Leu278, Leu340, Lys232, Ser287, Ser280, Tyr282, Val219 and Val231 in the cavity of the receptor.X-ray crystallography of ALK5-inhibitor 61 complex and ALK5-inhibitor complexes with PDB ID: 2WOT and 2WOU in protein data bank confirmed the results of the docking studies.Molecular dynamics Simulation of the complex demonstrated that the receptor conformation was not changed in the presence of the inhibitor.
Finally, 3 new potent inhibitors were proposed based on the results of the inhibitor-receptor interactions and properties of the descriptors.

Figure 2 .
Figure 2. RMSD values of protein backbone for ALK5 and ALK5 complex during 6000 ps MD simulation.

Figure 3 .
Figure 3.Time evolution of the radius of gyration (Rg) during 6000 ps of MD simulation of ALK5 and ALK5 complex.
. As shown in this Figure, the order of importance is SEE.A-C > SEE.A-N > SEE.NA-SA > NCC4.C-OA > HC β.

Figure 6 .
Figure 6.Result of sensitivity analysis for the LS-SVR model.

Table 1 .
and visual molecular Vol.23, No. 11, 2012Inhibitors and the predicted pIC 50 by the LS-SVR model (see scheme 1)

Table 2 .
The correlation matrix of the selected descriptors

Table 3 .
Statistical parameters of the external test set for LS-SVR model