TLC-Based Lipophilicity Assessment of Some Natural and Synthetic Coumarins

O caráter lipofílico de doze cumarinas foi investigado por cromatografia de camada fina de fase reversa (RP TLC) em sílica RP-18. Três diferentes sistemas de solvente binário compostos por água e o modificador orgânico (metanol, tetrahidrofurano ou acetonitrilo) foram utilizados para determinar o parâmetro de retenção (RM ) e o coeficiente de partição de octanol-água (log POW) como medida de lipofilicidade dos compostos testados. O parâmetro de lipofilicidade (log POW) foi determinado experimentalmente usando-se oito padrões de soluto com valores de log POW conhecidos, que foram analisados sob as mesmas condições cromatográficas de substâncias alvos. Parâmetros de lipofilicidade junto com descritores moleculares 2D foram submetidos à análise estatística multivariável (análise de componentes principais (PCA) e regressão por mínimos quadrados parciais (SLS)) para determinar os fatores mais importantes para retenção, ou seja, lipofilicidade dos compostos investigados. Os modelos quantitativos de relação entre as propriedades de estrutura e retenção revelam a importância de descritores referentes ao tamanho e ao formato da molécula assim como suas propriedades polares.


Introduction
The methods of relating molecular structure of solutes (expressed via descriptors) to their chromatographic (retention) behavior are commonly denoted as quantitative structure-retention relationships (QSRRs).Similarly, the aim of quantitative structure property relationship (QSPR) research is to find a functional dependence between molecule structure and its physicochemical properties.
Lipophilicity is a very important molecular parameter used in the QSR(P)R studies and plays an important role in drug discovery.Knowing the lipophilicity of potential drugs helps understanding their absorption, distribution, metabolism, excretion and toxicity (ADMET). 1 Lipophilicity is expressed by the logarithm of the partition coefficient (log P), which represents the tendency of a molecule to distribute between water and a water-immiscible solvent.Liquid chromatographic (LC) techniques can be considered as a traditional approach to fast estimation of lipophilicity.
Recently, a comparative study on several approaches for the determination of lipophilicity by means of thin-layer chromatography (TLC) was presented by Komsta et al. 2 In the case of TLC, the QSRR studies are usually based on the use of the R M value defined by Bate-Smith and Westall equation, 3 (1) where R F is the retardation factor.Generally, the R M values determined by means of reversed-phase thin-layer chromatography (RP TLC) are linearly dependent on the concentration of the organic modifier (j) in the mobile phase (2)   where m and R M 0 are, respectively, the slope and the intercept of equation 2.
The extrapolation of the R M value to pure water based on the Soczewinski-Wachtmeister model 4 allows the estimation of lipophilicity. 5 The OECD (Organization for Economic Cooperation and Development) Guidelines for the Testing of Chemicals (Test 117) 6 describes the method for the determination of the partition coefficient (log P OW ) using reversed-phase high performance liquid chromatography (HPLC).The appropriate reference substances with log P OW values which encompass the log P OW of the test substances (i.e., at least one reference substance has P OW above that of the test substance and another P OW below that of the test substance) need to be selected and chromatographed under the same conditions as test substance in isocratic mode.A calibration graph obtained by correlation of the measured retention data of reference substances with their partition coefficients is used for the determination of the log P OW value of test substances.In many articles, HPLC method is substituted by thin-layer chromatography, 7 keeping the same principles as in Test 117 with RP-18 silica stationary phase and the composition of the mobile phase that provide the best selectivity (in accordance with isocratic HPLC mode).
In the past decade, our research was focused on QSRR of various organic compounds that are believed to exhibit biological activity.][10][11] In a previous publication, the results on the chromatographic behavior of 4-hydroxycoumarin rodenticides (coumatetralyl, bromadiolone and brodifacoum) and biocidal material impurities in various normal-and reversed-phase chromatographic systems were reported. 12The results proved the RP TLC to be suitable for the estimation of the relative lipophilicity of coumarine derivatives.
Coumarins are naturally occurring benzopyrone derivatives identified in plants and are characterized by extensive chemodiversity and various pharmacological activities.The majority of coumarins have been isolated from green plants.The genus Seseli (part of Apiaceae family) is a well-known source of linear or angular pyranocoumarins, an interesting subclass of coumarins possessing antiproliferative, 13 antiviral 14 and antibacterial activities. 15Numerous species of the genus have been used in folk medicine since ancient times.
Continuing research in this field, we selected Seseli montanum subsp.tommasinii as a source of some natural coumarins.From the aerial parts of the plant, five known coumarins were isolated.They were studied together with another two natural (isolated from the roots of Seseli annuum and Achillea tanacetifolia) and five synthetic coumarins.A study here presented deals with several topics: (i) retention behavior of coumarins in the reversed-phase chromatographic systems using different organic modifiers, (ii) comparison of different modifiers in lipophilicity assessment, (iii) comparison of two experimentally obtained lipophilicity parameters (R M 0 and log P OW ) in terms of better lipophilicity evaluation and (iv) selection of a subset of descriptors that are the most relevant for retention of coumarins.Principal component analysis (PCA) and partial least squares (PLS) were selected as ones of the most widely used chemometrical methods to build QSRR models.

Isolation procedure
The chemical structures of the investigated coumarins 1-12 are presented in Figure 1.
The plant material was collected at Gorica Hill (area of Podgorica City, Montenegro, Serbia) in Autumn 2009.A voucher specimen (P167/09) was deposited at Herbarium of the Faculty of Natural Sciences and Mathematics, University of Montenegro (Podgorica City).
All relevant 1 H and 13 C NMR data and 1 H NMR spectra of compounds 1-6 are given in Supplementary Information (SI).

Reversed-phase thin-layer chromatography
The TLC experiments were performed on a commercially available RP-18 TLC plates, (Art.5559, E. Merck, Germany).The plates were spotted with 1 μL aliquots of 2 mg mL -1 solutes of each compound (dissolved in CH 2 Cl 2 ), and developed by the ascending technique, without preconditioning.The detection of the zones was performed under UV light (λ = 254 nm).The R F values were determined as an average of the three chromatograms.Three solvent systems were used as mobile phase: methanol-water, acetonitrile-water and tetrahydrofuran-water binary mixtures, with a varying content of organic modifier (from 100 to 60 vol.% in the case of methanol and acetonitrile and from 100 to 40 vol.% of tetrahydrofuran (increment 10 vol.%)).All the components of the mobile phases were of the analytical grade of purity.All experiments were performed at ambient temperature (22 ± 2 °C).

Calculations
For the geometry optimization, the structures were subjected to the Hyperchem Program (version 7.0, Hypercube).The optimization of three-dimensional structure was calculated by semi-empirical quantum chemical calculations with AM1 Hamiltonian.A set of molecular descriptors was selected to reflect geometrical, electronic and physicochemical properties of the investigated compounds.Hyperchem calculates electronic properties, optimized geometries, total energy and QSAR properties.A set of additional physicochemical parameters was generated from the optimized structures by Molecular Modeling Program Plus program (MMP Plus).Virtual Computational Chemistry Laboratory at website http://www.vcclab.org was used for the calculation of lipophilicity of the compounds by various methods based on different theoretical procedures.

Multivariate statistical analysis and modeling
PCA and PLS were performed using demo version of PLS Toolbox statistical package (Eigenvectors, Inc., version 5.7) for the MATLAB version 7.4.0.287 (R2007a) (MathWorks, Inc., Natick, MA, USA).The data were mean-centered and scaled to unit variance before any statistical operations in order to prevent highly abundant components dominating in the final result over the components present in much smaller quantities.
PCA was carried out as an exploratory data analysis by using single value decomposition (SVD) algorithm and 0.95 confidence level for Q and T 2 Hotelling limits for outliers.A limited number of PC reduces the dimensionality of the retention data space, simplifying further analysis and grouping the substances according to their intrinsic ability for specific interactions.2][23] Validation of the models was performed by leave one out cross-validation procedure.The quality of the models was monitored with the following parameters: R 2 cal (cum) (the cumulative sum of squares of the Ys explained by all extracted components), R 2 CV (cum) (the cumulative fraction of the total variation of the Ys that can be predicted by all extracted components), showing as higher as possible values, and root mean square errors of calibration (RMSEC) and root mean square errors of cross-validation (RMSECV) showing as lower as possible values, with the lowest difference in between them.Low value of RMSEC is desirable but if the high values of RMSECV are present at the same time, this can be an indication of the poor predictability of the calibration model. 24,25 nsidering the other multivariate linear regression techniques as multiple linear regression (MLR) and principal component regression (PCR), PLS was chosen as a target analysis due to a number of advantages.Namely, the number of predicted variables is greater than the number of the compounds and it is better to reduce their number to just a few latent variables (using PLS or PCR) than select a few predictor variables, by MLR.In addition, a lot of variables are correlated and have constant values, so MLR would not be appropriate method.An important feature of PLS is that it takes into account errors in both independent and response variables, while PCR assumes that the estimation of molecular descriptors are error free. 26s it is previously mentioned, the best selectivity was obtained with methanol-water mobile phase and these results were used for the evaluation of the possible relationship between the lipophilicity characteristics and the physicochemical parameters of the molecules.The lipophilicity parameter R M 0 (chromatographic system RP-18/methanol-water) and log P OW were the response variables in the QSRR study.These values were regressed against the molecular structural descriptors as independent variables.

Lipophilicity of the analytes
The retention parameters (R F and R M ) of coumarins were determined at several compositions of the three different binary solvent systems composed of organic modifier and water: methanol-water, acetonitrile-water and tetrahydrofuran-water.For each compound, the R M value was extrapolated to the zero volume of the organic modifier by using equation 2, thus obtaining the lipophilicity parameter (R M 0 ).The slope (m) and intercept (R M 0 ) values, and the statistical data (correlation coefficient (r) and standard deviation (s)) for each binary system are listed in Table 1.
The R M values were linearly dependent on the concentration of organic modifier in the mobile phase, with r ≥ 0.99.Also, the majority of substances show the highest R M 0 values in methanol, which has the lowest elution strength among all the organic modifiers applied on RP-18 silica.
Taking into account the observed retention, it can be concluded that tricyclic compounds (1-4 and 6) exhibited stronger retention compared to byciclic coumarines (5, 7-12).Also, increased retention of 1, 2 and 6 coumarins can be ascribed to the presence of 2-butenoil and 3-methylbut-2-enyloxy group.Similar chromatographic behavior was observed for compounds 6 and 7, with identical side-chain substituent, indicating that the presence of the bulky 3-methylbut-2-enyloxy group defines their chromatographic behavior.Among all investigated coumarins, bicyclic compounds with hydroxy (9 and 10) and methoxy groups (11 and 12) demonstrated decrease of retention.
The determination of log P OW by TLC is based on the linear relationship between the chromatographic retention R M and the octanol-water partition coefficient determined by shake-flask method for a set of standard compounds.For that purpose, the investigated coumarins were simultaneously chromatographed with the standard solutes, and the retention parameters were determined (R M values are presented in brackets: 4-methoxyphenon (−0.45), 2,6-dimethylphenol (−0.13), 1,3,5-trihydroxybenzene (−1.19), anthracen (0.69), 4-hydroxybenzaldehyde (−0.57), 1-naphthol (−0.10), benzophenon (0.21), and phenol (−0.52)).As the best selectivity was obtained with methanol-water (75:25%, v/v), this mobile phase was chosen for the determination of log P OW .To characterize lipophilicity of coumarins, linear calibration between R M values of eight standards and their literature log P OW values was used R M = −1.176+ 0.423 log P OW (3) r = 0.992, N = 8, SD = 0.078, P < 0.0001 R M values of the studied compounds were substituted into equation 3 to calculate log P OW values, listed in Table 2.The same table contains calculated log P values of selected coumarins.
The determination of linear dependences between lipophilicity parameters obtained in chromatographic investigations and calculated log P values is an indispensable step for QSRR.These correlations provide evidence that the chromatography based measurements of lipophilicity are valid.A number of methods based on different approaches for calculating log P from chemical structures are available.Extrapolated R M 0 values for chromatographic system RP-18/methanol-water and experimentally established log P OW values were compared with calculated log P (log P calc ), and statistical parameters of these dependences are given in Table 3.Although linear dependence exists in most cases with satisfactory correlation coefficient values over 0.93, observing the slope and the intercept of the relevant equations, it could be concluded that the deviations from the ideal correlation (slope ca. 1 and intercept ca.0) are more pronounced in the case of experimentally obtained log P OW values, i.e., R M 0 is better lipophilicity estimate.Determined lipophilicity of the investigated compounds is in accordance with their chromatographic behavior.Additional pyran and furan ring attached to 2-benzopyran-1-on aromatic core provide increased lipophilicity versus corresponding derivatives possessing no extra ring.Incorporating polar hydroxy and methoxy groups have a more pronounced negative impact on lipophilicity.Lipophilicity is also raised with increasing substitution on the basic benzopiranon, i.e., derivatives with 2-butenoil and 3-methylbut-2-enyloxy group are more hydrophobic than compounds that possess methyl, methoxy, hydroxy, acethyl and epoxide substituents.

Principal component analysis (PCA)
PCA carried out on the set of calculated molecular descriptors and retention data can reveal some similarities among studied compounds governed by both their intrinsic structural properties and specific interactions that occur in different chromatographic systems.Loading plots highlight the mostly influential variables responsible for such a clustering and provide a picture on the similarity between R M 0 values and the other molecular descriptors.
PCA applied on a set of molecular descriptors resulted in a three-component model explaining 91.79% of the data variation (first principal component comprises 71.94% of variances).The score plot of the three principal components (Figure 2) indicates that all data were lying inside the Hotelling T 2 ellipse, suggesting that there are no outliers among the analytes.
Considering the score plot, PCA reveals different classification.Samples are clustered into two main separate groups: coumarins 7-12 and 5 with different substituents attached to 2-benzopyran-1-on are positioned in one group; while coumarins with one more pyran or furan ring connected to benzopyranon core are in the second group (compounds 1-4 and 6).First principal component distinguished samples according to the number of the rings present in the molecule (bicyclic and tricyclic compounds).Second principal component separates those with hydroxyl group in the molecule (3, 4, 9 and 10) from the other investigated.The mutual projections of loading vectors are shown in Figure 3.The highest positive impact to the PC1 is recorded by the parameters which describe the size and the shape of the molecule.PC2 separates compounds mainly according to their polar properties, i.e., physicochemical descriptors such as the count of hydrogen-bond donor, hydrophiliclipophilic balance, solubility parameter, dipole moment, etc.On the loading plot, the three R M 0 variables are in the group with those relating the size and the shape of a molecule such as refractivity, polarizability, surface area, molecular volume, molecular weight, molecular depth and molecular width.These facts could indicate the most influential factors for observed chromatographic behavior of the coumarins.
Quantitative structure-retention relationship (QSRR) PLS modeling was performed in order to qualify relationships between the factors governing the lipophilicity.The number of latent variables was selected on the basis of the minimum RMSECV, and the minimum difference between RMSEC and cross-validation.In both models a minimum value of RMSECV was obtained with two latent variables.The obtained models are summarized in Table 4.
The application of PLS methods revealed that the statistical results of these two models are comparable, and that they are statistically significant.The main descriptors in both PLS models are those relating the size and the shape of a molecule such as refractivity, polarizability, surface area, molecular volume, weight, parachor, volume and mass.Observing the X loading plot of the models, it was supposed that a simpler PLS model can be obtained after removing some variables.
The contribution of descriptors that are most influential on the chromatographic behavior was done using variable importance in projection (VIP) scores.The variables with VIP scores higher than 1 were considered as the most relevant for explaining the response variable Y, while the other are of extremely low or almost no contribution.After removing the variables that only contribute to noise (variables with low values of coefficients and low VIP values), a simpler and better PLS models were obtained.The descriptors included in the final models are presented in Table 4 in order from the highest to the lowest value of their regression coefficient, with notification of the sign of their contribution on the response variable.Taking into account the parameters that represent the quality of the model, it can be concluded that both PLS models are statistically significant.The descriptors included in the final models are of similar nature and significance.
The results obtained indicate that the most relevant descriptors influencing lipophilicity parameters are: surface   area, molecular length, density, solubility parameter, Hansen polarity, Hansen dispersion and hydrophilic-lipophilic balance.From the sign of the regression coefficients, it can be observed that the descriptors describing polarity of the investigated compounds, i.e., their ability for hydrophilic interactions makes negative contribution to the R M 0 values.Solubility parameter, Hansen polarity and dispersion provide a numerical estimate of the degree of intermolecular attractions between molecules (i.e., existence of the dispersion, polar and hydrogen bonding forces), and indicate that the stronger the intermolecular interactions between molecules and the mobile phase are, the analytes are less retained on the stationary phase and the lower R M 0 and log P OW values are obtained.Surface area and molecular length influence the lipophilicity parameters on the opposite way.They have positive coefficients in models and give the higher value of R M 0 and log P OW when they are higher.The surface area of substance is a sum of all areas that cover the surface of the molecule.The higher value of this descriptor indicates the larger molecule which is stronger retained on the stationary phase causing the higher value of R M 0 , i.e., log P OW molecular length determines the size of the molecule and influences on the lipophilicity parameter on the same way as previous descriptor.Hydrophilic-lipophilic balance of a solute is a measure of a degree to what extent its hydrophilic or lipophilic properties are expressed.Its negative regression coefficients reveal the lower the values of these balances are, the greater the values of R M 0 and log P OW are observed, suggesting that more hydrophobic solutes, exhibiting stronger nonspecific dispersive interaction between their own nonpolar moieties, and those of the stationary phase are more retained under applied chromatographic conditions.

Conclusions
The focus of the present study was the estimation of the lipophilicity of twelve coumarins by simultaneous chromatographing with standard substances with known log P OW values.PCA was used for the data overview, while PLS was chosen as the multivariate regression technique for the structure-lipophilicity correlations.
Upon the presented results, it could be concluded that: (i) all reversed-phase thin-layer chromatographic systems used proved to be suitable for the lipophilicity estimation, (ii) the proposed two PLS models are statistically significant and their statistical quality is comparable and (iii) descriptors which describe the size and the shape of the molecule as well as their polar properties determine lipophilic behavior of the investigated compounds.
In terms of various model, the performance criteria parameters considered here, the obtained PLS models could be suitable for predicting the chromatographic behavior of coumarins.

Figure 2 .
Figure 2. Score values of the first, second and third principal components.

Figure 3 .
Figure 3. Projection of loading vectors for the first two PCs.

Table 1 .
Lipophilicity and statistical parameters obtained from equation 2

Table 2 .
The calculated log P values and experimental log P OW values

Table 3 .
Linear relationships between experimental and calculated lipophilicity