QSPR Modeling using Catalan Solvent and Solute Parameters

A área de correlação quantitativa entre estrutura e propriedade (QSPR) pode beneficiar-se de descritores moleculares que representam interações intermoleculares. Catalan desenvolveu um método de escalas solvatocrômicas para solventes que pode ser explorado para esta finalidade. Neste trabalho, escalas de solvente de Catalan foram usadas como descritores moleculares para o desenvolvimento de modelos QSPR, e para o cálculo de novos descritores de soluto para uso posterior em QSPR. As escalas Catalan para o solvente e os descritores de soluto derivados foram recentemente comparados com o método de descritores de Abraham, em termos da qualidade do QSPR desenvolvido. Os parâmetros Catalan para solventes, que mostraram uma correlação modesta com os correspondentes descritores de Abraham, mostraram-se bem sucedidos para modelar temperatura de fusão, temperatura de ebulição, ponto de ignição, índice de refração, tensão superficial, densidade e parâmetro de solubilidade dos solventes, com médias geométricas dos desvios relativos (GMRD) de 7,1, 6,6, 4,9, 3,8, 9,1, 6,0 e 4,2%, respectivamente. Os descritores do soluto foram obtidos a partir das equações de regressão entre a solubilidade de um soluto em diferentes solventes com um GMRD total de 30,0%. Os descritores de soluto obtidos desta maneira superam o modelo de solvatação geral de Abraham no cálculo de solubilidade em meio aquoso de 27 solutos de várias famílias químicas. Os descritores Catalan podem ser considerados como um recurso valioso para modelagem QSPR.


Introduction
Solubility of a compound in different solvents such as water and 1-octanol can be used in quantitative structureproperty relationships (QSPRs) as a measure of its property in phases similar to those solvents. 1Solubility not only can be used directly as a molecular descriptor, but also other parameters can be derived from solubility and employed as molecular descriptors of QSPR.3][4][5] The parameter set was later extended to the corresponding solute descriptors of hydrogen-bonding acidity (A) and basicity (B) scales, and polarity/polarizability (S) scale. 9,10In addition to these parameters, the general solvation equation proposed by Abraham and co-workers 9,10 (equation 1) also includes excess molar refraction (E) and the one percent of McGowan molar volume (V).PCP = c + eE + sS + aA + bB + vV (1)   In equation 1, PCP is a property under study; c, e, s, a, b, and v are the coefficients of the model determined by multiple linear regression analysis.Abraham parameters have found many applications in chemistry and pharmacyrelated fields, for example estimations of solubility, 6 partitioning, 11 chromatographic retention parameters, 12 toxicity, 13,14 and intestinal absorption. 157][18] Moreover, a method has been suggested for the back calculation of solute Abraham parameters recently, which employs the calculated E and V parameters along with the experimental solubility of solutes in several organic solvents and the previously determined solvent coefficients of equation 1 (c, e, s, a, b, and v) for partitioning in a large number of water/solvent systems, followed by fitting the appropriate values of S, A and B. 19 Catalan has expanded another set of solvatochromic parameters for a generalized treatment of the effects of solvents. 7Catalan parameters consist of solvent polarity/ polarizability scale (SPP), solvent basicity scale (SB defined as cb in this work), and solvent acidity scale (SA defined as ca in this work), 8,[20][21][22][23] which recently SPP parameter split into two separate scales: solvent dipolarity (SdP defined as cd in this work) and solvent polarizability (SP defined as cp in this work). 720][21][22][23] In formulating the independent solvent scales, the choice of an appropriate probe for the experimental determination of the scales is the major challenge.The selected probe should measure the effect of a single solvent property, for example, hydrogen-bonding basicity, without the interference of any other solvent effects.Solvatochromic scales of Catalan have employed different probes to those used for the development of Kamlet and Taft's scales.
This investigation explored the suitability of Catalan solvent parameters for use in QSPR field and the possibility of drawing new solute parameters from original Catalan scale.Therefore, Abraham and Catalan solvent parameters were first compared by investigating the relationships between the two sets of parameters.Secondly, Catalan solvent parameters were used for the development of QSPR models for several solvent properties and the validity of the resulting QSPRs was investigated.The solvent properties included melting point, boiling point, flash point, refractive index, surface tension, viscosity, density, and solubility parameter.In the next step, Catalan solute parameters were derived based on the correlations between a solute solubility in several nonaqueous solvents and Catalan solvent scales for those solvents.Finally, the applicability of these newly defined solute parameters for the prediction of the molar aqueous solubility of some compounds was investigated and the resulting QSPR was compared with the QSPR models developed using Abraham parameters.

Materials and methods
Solvent properties, Abraham and Catalan parameters were collected from the literature, as detailed below, and multiple linear regression analysis was used to investigate the relationships and to develop the QSPR models using Catalan and Abraham parameters (for more details see Table S1 of electronic supplementary information).
Development of QSPR models using Catalan solvent parameters: Melting point, boiling point, flash point, refractive index, surface tension, viscosity, density, and solubility parameter of 54 common solvents with known Catalan solvent parameters were obtained from the literature. 43Catalan descriptors were used to develop regression models for the above-mentioned physicochemical properties.
Determination of Catalan solute descriptors: Mole fraction solubility of a large set of compounds in several nonaqueous solvents was obtained from Handbook of Solubility Data for Pharmaceuticals. 44The inclusion criteria for the collected nonaqueous solubility data in this study were: (i) Only the solubility values measured at room temperature (25 ± 1 °C) were included.
(ii) Only solubility values reported in mole fraction, mole per liter or those that were convertible to one of these units were used.
(iii) For inclusion in the analysis, solubility of a solute had to be available in a minimum of eleven nonaqueous solvents.
For each solute, the logarithm of solubility in different solvents was regressed against Catalan parameters of the solvents and the regression equations were collected as below.
In equation 2, logX is the solubility of a solute in different solvents in mole fraction unit, cp, cd, cb, and ca are Catalan polarizability, dipolarity, hydrogen-bonding basicity, and acidity scales for the solvents, i Solute is the intercept, CP, DP, CA, and CB are coefficients of the regression equation.The coefficients of the regression equations for each solute were recorded to be used as the solute polarizability, dipolarity, hydrogen-bonding acidity, and basicity scales.

Application of Catalan and Abraham solute parameters in QSPR model development for aqueous solubility
Solute descriptors were calculated using Catalan solvent parameters (as explained above) for 27 solutes for which aqueous solubility and Abraham solute descriptors [24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42] were available through recent publications.For these solutes, the new solute parameters were compared with Abraham solute descriptors in terms of: the accuracy of the original equation used for the estimation of solute parameters; and the accuracy of the models developed for the estimation of aqueous solubility of 27 solutes.For this purpose, the Catalan model was: logS w = i W + a P CP + a D CD + a A CB + a B CA + i Solute (3)   By rearranging the equation as below, it allows one to perform a regression analysis: logS wi Solute = i W + a P CP + a D CD + a A CB + a B CA (4)   where i W is the intercept of regression of aqueous solubility data against Catalan solute parameters computed from equation 2; a P , a D , a B , and a A are the regression coefficients, which correspond to the calculated Catalan solvent scales of polarizability, dipolarity, basicity, and acidity for water.
The comparable Abraham solvation model 27 reported in the literature for aqueous solubility is: Equations 4 and 5 were compared in terms of the accuracy of the calculation of aqueous solubility.In the analyses of this study, relative deviation (RD), mean relative deviation (MRD), geometric MRD (GMRD) and absolute error (AE) were used as error criteria and defined as: (6)   where n is the number of data points in each analysis, PCP Exp and PCP Cal are the experimental and calculated PCP.

Results and Discussion
Table S2 of electronic supplementary information (SI) tabulates 41 solvents for which Catalan solvent parameters and Abraham solvent parameters were available from the literature.The correlation parameters between Catalan and Abraham solvent parameters for 41 solvents showed modest correlation coefficients (Table 1).
9][20][21][22][23] The Abraham solvent parameters s, a, and b are the interaction terms of the solvents with S, A, and B of the solute, respectively.As the S, A, and B are indicators of the solute's polarity, acidity, and basicity, hence the s, a, and b are indicators of solvent polarity, basicity, and acidity, respectively. 45All investigated correlations reported in Table 1 were statistically significant (p < 0.05).
Melting point, boiling point, flash point, refractive index, surface tension, viscosity, density, and solubility parameter of 54 common solvents with the known Catalan solvent parameters are listed in Table S3 in SI.The QSPRs developed using Catalan solvent scales for these physicochemical properties are reported in Table 2. Careful examinations of these results reveal very good models fit for melting point, boiling point, flash point, refractive index, surface tension, density, and solubility parameter of the solvents.However, viscosity did not fit well into the Catalan model.Figure 1 shows correlation between experimental and calculated solubility parameters for the studied solvents.
Table 3 presents, for each solute, the equations derived for the solubility in several nonaqueous solvents.Reported data in Table 3 are the coefficients of multiple linear regression (r 2 ) equations between the compounds' solubility in nonaqueous solvents and Catalan solvent parameters (data fitted into equation 2) for 37 different compounds in which the solutes solubility was expressed as mole fractions.Included in Table 3 are also the coefficients of determinations of the regression equations, number of solvents used for each solute, AE and MRD values.
We are proposing that the coefficients of these multiple regression equations are associated with the characteristics of the solutes and can be used as the corresponding solute parameters.It can be seen in Table 3 that the MRD values of the equations vary between 2.6% for methandienone solubility in 11 solvents and 776.9% for niflumic acid solubility in 23 solvents and the GMRD is 30.0%.Despite the low correlation coefficients of the models for some solutes such as niflumic acid, piroxicam and ibuprofen, the equations were statistically significant with p-values below 0.05 for the equation and p-values for the significant descriptors < 0.3.One explanation for the poor correlations observed for some solutes could be the dominant effect of crystal packing energy on the solubility of such solvents.These effects cannot be explained solely by simple parameters such as those used here, and are assumed to be related to the specific three-dimensional arrangements of molecules within the crystals.A similar pattern was observed for AE.
In assessing the resulted Catalan solute parameters, one must consider that: (i) the resulted acidic and basic scales  are based on the behavior of solute in nonaqueous solvents.It means that an acid in water could act in a different way, i.e. as a neutral or basic compound, in the organic solvents; (ii) the coefficients of the Catalan solute parameters might indicate the effect of acidic or even basic functional groups of the compound on its solubility in organic solvents, therefore the numerical values of the coefficient could be a positive or negative sign.
In order to examine the suitability of the new Catalan solute parameters for QSPR modeling, the parameters were used for the estimation of aqueous solubility.Moreover, the model was compared with the model developed using Abraham solute parameters obtained using a similar back-calculation procedure, [24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42] and also Abraham aqueous solubility model reported in the literature. 27Listed in Table 4 are molar aqueous solubility The coefficients in equation 8 might be related to the effects of the solvent used (in this case water).Catalan solvent parameters for water are cp = 0.681, cd = 0.997, cb = 0.025, and ca = 1.062, which show a similar trend in comparison with the coefficients of equation 8.This could indicate the validity and reliability of the suggested method for the calculation of Catalan solute parameters.Also it has been shown that aqueous solubility has indirect correlation with the molecular volume of the compounds. 46Based on this fact, the following equation was proposed: logS w = -0.902+ i Solute + 0.521 CP + 1.670 CD + 0.289 CA + 0.757 CB -1.851 V r 2 = 0.986 The coefficients of the regression are similar to those of equation 8, and negative coefficient of the volume variable is meaningful.
Table 5 gives the calculated logS w and relative deviations (RD) from equations 5, 7, 8 and 9 as well as the GMRD value.
It can be seen that Abraham's general solvation model (equation 5) gives the highest error of average 162.0%.This

Conclusions
In this study, we showed that Catalan and Abraham solvent parameters are rather different solvatochromic scales of solvents although similar procedures are employed for their experimental determination.The applicability of both solvent parameters in QSPR analyses was evident from the results obtained for solvents and solutes.A methodology was introduced for the calculation of new solvatochromic solute parameters based on Catalan solvent parameters.The method takes advantage of the coefficients of Catalan solvent parameters in multiple linear regression models of solute solubility in several nonaqueous solvents.The new solute parameters compared well with Abraham solute parameters for the estimation of aqueous solubility of compounds.The back-calculated Catalan parameters for water (coefficients of the model developed for aqueous solubility) were close to the experimental Catalan water parameters in their trend, which might confirm the suitability of the suggested method for the calculation of solute and solvent parameters.
The results of this study suggest that Catalan solvent parameters and the new solute parameters can be regarded as a valuable resource for applications in QSPR modeling.A further advantage of exploitation of Catalan parameters is the vast number of the solvents for which these parameters have already been measured which amounts to more than 150 solvents to date.For example, propylene glycol, among these solvents, is an important pharmaceutically interested solvent.

Figure 1 .
Figure 1.Correlation between experimental and calculated solubility parameters using Catalan solvent scales for the studied solvents.

Table 1 .
Correlation of Abraham solvent parameters vs. Catalan solvent parameters for 41 solvents

Table 2 .
Coefficients of PCP = a 1 cp + a 2 cd + a 3 ca + a 4 cb (Catalan model) for calculating some solvents' PCP a NS: Not significant.

Table 3 .
Catalan solute parameters for the studied solutes with mole fraction solubilities, coefficients of determination, mean relative deviation (MRD) and absolute error (AE) values a NS: Not significant.

Table 4 .
Abraham andCatalan solute parameters and logarithm of molar aqueous solubility data for 27 chemical and pharmaceutical compounds a NS: Not significant.

Table 5 .
Relative deviations (RD) and absolute errors (AE) of calculated aqueous solubility using different equations

Table S1 .
List of parameters used in this study