A Fragment-Based Approach for the in Silico Prediction of Blood-Brain Barrier Permeation

A permeabilidade da barreira hematoencefálica (BBB, do inglês blood-brain barrier) é uma propriedade fundamental no planejamento de fármacos que atuam no sistema nervoso central (CNS) no tratamento de doenças como a epilepsia, depressão, mal de Alzheimer, mal de Parkinson, esquizofrenia, entre outras. No presente trabalho, estudos das relações quantitativas entre a estrutura e propriedade (QSPR) foram conduzidos para o desenvolvimento e validação de modelos in silico para a predição da permeabilidade da BBB. O conjunto de dados utilizado possui significativa diversidade química e ampla distribuição dos valores da propriedade alvo. Os modelos de QSPR gerados apresentaram bons parâmetros estatísticos e foram empregados com sucesso na predição de um conjunto teste de 48 compostos. Os modelos desenvolvidos são úteis na identificação, seleção e planejamento de candidatos a novos fármacos com propriedades farmacocinéticas otimizadas.


Introduction
The challenges facing the pharmaceutical industry are tremendous at every step of the drug discovery and development process.Technology-based discovery certainly is one of the most important elements to increase research and development (R&D) productivity.2][3] NCEs expected to advance into clinical trials should have a good balance of pharmacodynamic and pharmacokinetic properties.[6] Traditionally, in vivo and in vitro models are employed in the pharmaceutical industry for the evaluation of pharmacokinetic parameters.][15] Drugs to treat human central nervous system (CNS) diseases and disorders, such as epilepsy, Alzheimer's disease, Parkinson disease, schizophrenia, depression and brain tumors are required to cross the blood-brain barrier (BBB) by passive diffusion or through the help of transporters.In contrast, drugs that do not target the CNS should present limited capacity to cross the BBB in order to avoid drug-induced side effects in the brain. 16Delivering drugs into the brain is a complex process that depends on multiple factors, such as logP, hydrogen-bond acceptors and donors, molecular weight, polar surface area and other molecular properties. 17The tight junctions between the endothelial cells of the brain's capillaries make it almost impossible for anything to get into the brain around the cells.In addition, efflux pumps such as P-glycoprotein (P-gp) and the multidrug resistance-associated protein family (MRP) significantly hinder permeation across the BBB turning chemical compounds back to the way of blood. 16,180][21] Nonetheless, the complexity, costs, resources and time involved in these assays have increased the importance of in silico approaches to predict BBB permeability of lead compounds that selectively target the CNS. 7,8,22,23In the present work, robust QSPR models were developed for the consensus prediction of BBB permeation using the fragment-based hologram QSAR (HQSAR) approach. 10,14o the best of our knowledge, the majority of the models reported in the literature are associated with qualitative data (cross/not cross BBB) that offers imprecise values of BBB permeability, thus, the quantitative nature of the models generated in this work is of considerable importance in medicinal chemistry and drug design.

Data set
The relative affinity for the blood or brain tissue can be expressed in terms of the blood-brain partition coefficient, log(C brain/C blood), where C brain and C blood are the equilibrium concentrations of the drug in the brain and the blood, respectively (also known as logBB).A data set of 255 structurally diverse molecules with known logBB was collected from literature and the PK/DB -database for pharmacokinetic properties (http://www.pkdb.ifsc.usp.br).  The ta consist of in vivo measurements in rats of the compound's partition coefficient between the brain and blood.Compounds containing one asymmetric (chiral) center, for which the corresponding BBB permeation was determined for the racemate, were considered as the individual enantiomers and modeled accordingly, as previously described. 10,14The list of compounds along with the corresponding logBB data is shown in Table S1 in Supplementary Information (SI) section.This structurally diverse (Figure 1) data set consists of several important therapeutic classes, including anxiolytics (e.g., alprazolam), anti-ulcers (e.g., cimetidine), analgesics (e.g., acetylsalicylic acid), sedatives (e.g., diazepam, flunitrazepam), anti-inflammatories (e.g., ibuprofen and indometacin), antivirals (e.g., nevirapine, zidovudine, indinavir), antihypertensives (e.g., verapamil and clonidine), antihistamines (e.g., mepyramine), antidepressants (e.g., mianserin), and so on.
The 3D structures of the molecules employed in this work were constructed using CONCORD and standard  geometric parameters available in the Sybyl 8.0 molecular modeling package (Tripos, St. Louis, USA) and stored as SDF files. 49The optimization process of the chemical structures was performed by carrying out several standard operations present in ChemAxon Standardizer including 3D depiction layout, hydrogen addition and correction, salt and solvent removal, chirality and bond type normalization and harmonization of the representation of aromatic rings, and others. 50Each molecule in the set was energetically minimized using the Tripos force field. 48n this study, the original data set of 255 compounds was arranged in training (001-207) and test sets (208-255) in Table S1 (SI section) to give approximately 80% and 20% of the data set, respectively.The structurally diverse molecules having a significant coverage of property values were included in both sets, as depicted in Figure 2. Thus, the data set is suitable for QSPR model development.The training set was then used to generate the models, while the test set was hold out for the process of model external validation.

QSPR studies
All 2D QSPR (HQSAR) calculations and analyses were performed using the Sybyl 8.0 package, 48 as previously described.The HQSAR technique employed in this work required the 2D structures and the property value (logBB) as input.Initially, the calculations of the several parameters for the generation of the molecular holograms were performed using the standard parameters implemented in Sybyl 8.0. 10,14,51All generated models were investigated using full cross-validated r 2 (q 2 ) partial least squares (PLS) leave-one-out (LOO) and leave-many-out (LMO) methods.The predictive ability of the models was assessed by their q 2 values.

HQSAR analyses
The generation of the molecular fragments for the data set compounds was carried out using the following fragment distinctions: atoms (A), bonds (B), connections(C), hydrogen atoms (H), chirality (Ch), and donor and acceptor (DA).In order to assess the process of hologram generation and to seek the best predictive models, several combinations of these parameters were considered using the fragment size default (4-7) (Table 1).The Ch descriptor was considered in all fragment combinations due to the presence of several (R) and (S) enantiomers.The absence of this descriptor could lead to an over-training of the models because two different compounds would be considered as one and treated as such (i.e., calculated twice).The HQSAR analysis The best statistical results (predefined accuracy thresholds for training r 2 ≥ 0.80 and q 2 ≥ 0.60) were obtained using the fragment distinctions A/B/C/H/Ch (model 3, q 2 = 0.66 and r 2 = 0.87), A/H/Ch/DA (model 5, q 2 = 0.66 and r 2 = 0.86), A/B/H/Ch/DA (model 7, q 2 = 0.68 and r 2 = 0.88) and A/B/C/H/Ch/DA (model 8, q 2 = 0.69 and r 2 = 0.91).
The influence of different fragment sizes in the statistical parameters was further investigated for the four best HQSAR models marked with asterisk in Table 1 (models 3, 5, 7 and 8), and the results are summarized in Table 2. Fragment size parameters control the minimum and maximum length of fragments to be included in the hologram fingerprint.These parameters represent a fundamental aspect to this fragment-based approach, and should be considered to provide larger or smaller fragments into the molecular holograms. 10,14,51he results show that the variation of the fragment size did provide a considerable improvement for the majority of the models (marked with asterisk in Table 2) when compared to the results obtained using the fragment size default (4-7).The exception was model 5, for which no improvement was observed using a set of different fragment sizes.It is worth noting that model 3 exhibited improved cross-validated correlation coefficients (q 2 of 0.68 and 0.71).In the case of model 7, the q 2 value increased from 0.68 to 0.71.For model 8, the q 2 value varied slightly from 0.69 to 0.70, whereas the r 2 remained unchanged.
As the molecular structure encoded within a 2D hologram is directly related to the property value of the training set molecules, the HQSAR model should be able to predict the logBB for new compounds from its fingerprint.The q 2 LOO procedure used may give a suitable representation of the internal consistency and predictive power of the models.However, the real predictive ability of the HQSAR model derived with the 207 training set molecules was assessed by predicting logBB values of an external test set of 48 molecules (compounds 208-255, Table S1 (SI section)).Prior to prediction, the test set compounds were processed identically to the training set compounds as previously indicated.The external validation process can be considered the most valuable validation method as these compounds were completely excluded during the training of the model.The results are listed in Table S1 and show that the test set compounds, which represent the different structural features incorporated with in the training set, are reasonably well predicted by the four selected HQSAR models (marked with asterisk in Table 2).The good agreement between experimental and predicted BBB permeation values indicates the robustness of the HQSAR models.
The predictive power of the models 3, 5, 7, 8 and consensus (r 2 pred ) are also showed in Table S1.As can be seen, model 7 exhibited higher predictive ability (r 2 pred = 0.79) than that of models 3, 5 and 8 (r 2 pred = 0.72, r 2 pred = 0.69 and r 2 pred = 0.62, respectively).The consensus approach exhibited an r 2 pred of 0.75.Thus, the results indicated that models 3, 7 and consensus could provide better predictions of the property value for new compounds.The graphic representation of the experimental versus predicted BBB permeation for both training (model generation) and test (external evaluation) sets for model 7 is displayed in Figure 3. Similar graphic results were obtained for models 3 and consensus (not shown).
The models were successfully validated as shown in Table S1 and Figure 3, especially taking into account the complexity of the BBB biological system.Despite not having the highest r 2 pred , the consensus approach is also an attractive tool for the prediction of logBB, considering that the ensemble of models would allow a greater coverage of the chemical space, which, in turn, could be useful for the selection and design of new compounds with improved logBB properties.Additionally, the HQSAR technique can provide predictions for a broad scope of molecules when compared to other methods, considering that the molecular fragmentation offers a much larger range of different scaffold possibilities.

Conclusions
A key challenge in the development of drugs that act in the CNS for the treatment of a variety of human diseases and disorders is their transport across the BBB.The final HQSAR models described here possesses high internal and external consistency.In addition, the quantitative models showed good predictive power and could potentially be used to assist the processes of chemical library design and virtual screening.Compound libraries usually possess a broad chemical diversity, and therefore, in silico ADME models that are needed to screen these libraries should inevitably be able to cover a substantial portion of the chemical space.This is hard to be achieved by training the model with few hundreds of compounds.It should be noted, however, that this limitation may be overcome by the application of similarity analyses in the way of selecting appropriated compounds for screening, thus, avoid making predictions for compounds that differ substantially from the training set molecules. 12What is clear at this point is that the predictive models generated in this work are useful in the processes of early compound identification and selection, as well as in the design of lead compounds with improved BBB permeability.

Figure 1 .
Figure 1.Chemical structures and therapeutic classes of representative drugs included in the data set.

Table 1 .
Results of HQSAR analyses for various fragment distinctions on the key statistical parameters using fragment size default (4-7) performed over the twelve default series of hologram lengths of 53, 59, 61, 71, 83, 97, 151, 199, 257, 307, 353, and 401 bins.The patterns of fragment counts from the training set compounds were then related to the measured experimental BBB permeation data.

Table 2 .
HQSAR analysis for the influence of various fragment sizes on the key statistical parameters using four selected fragment distinctions: A/B/C/H/Ch, A/H/Ch/DA, A/B/H/Ch/DA and A/B/C/H/Ch/DA