Pedotransfer functions to estimate water retention parameters of soils in northeastern Brazil

Foram desenvolvidas funcoes de pedotransferencia (PTFs) para estimar os parâmetros (α, n, θr and θs) do modelo de van Genuchten (1980), utilizados para descrever curvas de retencao de agua no solo. Os dados usados foram provenientes de diversas fontes, principalmente de estudos realizados na Regiao Nordeste pelas universidades, pela Embrapa e Codevasf, totalizando 786 curvas de retencao, que foram divididas em dois conjuntos de dados: 85 %, para desenvolvimento das PTFs, e 15 %, para teste e validacao, considerados como dados independentes. Alem do desenvolvimento das PTFs de carater geral para todos os solos conjuntamente, desenvolveram-se PTFs especificas para as classes Argissolos, Latossolos, Neossolos e Planossolos, utilizando tecnicas de regressao multipla, com o uso do procedimento stepwise (forward e backward), para selecionar os melhores preditores. Dois tipos de PTFs foram desenvolvidos: o primeiro inclui todos os preditores, densidade do solo, teores de areia, silte, argila e de materia orgânica, e o segundo, apenas com os teores de areia, silte e argila. A avaliacao da adequacao das PTFs foi com base no coeficiente de correlacao (R) e indice de Willmott (d). Para avaliar as PTFs, para o teor de agua em potenciais matriciais especificos, utilizou-se a raiz do erro medio quadrado (RMSE). A predicao da curva de retencao por PTF e relativamente fraca, exceto para o teor de agua residual. A inclusao do teor de materia orgânica como preditor da PTF melhora a predicao do parâmetro a de van Genuchten. Nao houve melhora de desempenho das PTFs especificas por classe de solo, em comparacao com uma PTF geral. Exceto no caso do teor de agua do solo saturado, estimado pela distribuicao granulometrica, modelos para a predicao do teor de agua em potenciais matriciais especificos sao bons. Predicoes do teor de agua em potenciais matriciais mais negativos do que -0,6 m, usando uma PTF contendo a distribuicao granulometrica, sao somente um pouco inferiores aquelas obtidas por PTFs, que incluem densidade do solo e teor de materia orgânica.


INTRODUCTION
The use of simulation models in agricultural sciences has increased significantly over the last decades. However, one of the major bottlenecks hampering model application is the lack of input data. In the case of water and solute balance modelling, data of soil hydraulic properties is the most relevant information (van Diepen et al., 1991;Pachepsky & Rawls, 1999). The direct determination of hydraulic conductivity and water retention characteristics is time-consuming and depends on expensive laboratory equipment (Wösten & van Genuchten, 1988). Therefore, indirect methods have been developed, e.g., pedotransfer functions -PTFs (Minasny, 2000;Cornelis et al., 2001;Rawls & Pachepsky, 2002;Tomasella et al., 2003) that correlate easily available information such as grain size distribution and organic matter content (OM) with soil hydraulic properties. Vaz et al. (2005) validated the Arya & Paris (1981) model based on 104 samples of representative soils in the South and Southeast of Brazil, and concluded that the estimates of the models of retention curves are satisfactory for those soils.
In a review on PTFs, Pachepsky & Rawls (1999, 2004 recommended the use of PTFs for regions or soil types similar to those in which they were developed. The application of the available PTFs to tropical soils would be inefficient, since these functions were developed and tested for soils of temperate climates, aside from other factors related to the different mineralogy of the clay fraction and distinct properties of OM components in tropical soils (Tomasella et al., 2000). This author developed specific PTFs for the prediction of soil water retention curves for the tropical soils of Brazil (Tomasella & Hodnett, 1998;Tomasella et al., 2000Tomasella et al., , 2003Tomasella et al., , 2008. Silva et al. (1990) proposed PTFs for the estimation of the field capacity and permanent wilting point of the semi-arid region of northeastern Brazil. Based on a large dataset, Oliveira et al. (2002) developed PTFs to estimate the water content at field capacity (-33 kPa) and permanent wilting point (-1500 kPa) for the State of Pernambuco, in the northeastern region of Brazil. Another important contribution to knowledge on tropical soils was made by van den Berg et al. (1997) who developed PTFs to estimate available water content between pressure heads of -1 and -150 m based on texture and density of Oxisols in 10 tropical countries.
Considering the importance of extending the use of hydrological and agronomic models to tropical regions, in this paper we developed and validated PTFs for the prediction of water retention characteristics from sand, silt, clay, organic matter content and bulk density data for soils from northeastern Brazil.

Data
The sampling points were distributed on representative soils in northeastern Brazil, with higher concentrations in the States of Pernambuco and Alagoas (Figure 1).
A total of 786 datasets of soil water retention and grain size distribution, OM content and bulk density were selected from studies and reports from universities, Embrapa (Brazilian Agricultural Research Corporation) and Codevasf (Corporation for the Development of the São Francisco and Parnaíba River Basins). Grain size was classified according to USDA Soil Taxonomy in clay (< 0.002 mm), loam (0.002 -0.05 mm), fine sand (0.05 -0.20 mm), and coarse sand (0.2 -2 mm). Data were selected according to the similarity of determination methods. Specifically, only water retention data of undisturbed samples were used. To improve comparability, only water retention data corresponding to the pressure heads -0.6, -1, -3, -5, -10 and -150 m were used. The saturated water content θ s was estimated from bulk density (ρ) by: (1) where ρ s is the particle density, assumed as 2700 kg m -3 .

Estimation of van Genuchten model parameters
The van Genuchten (1980) model (VG, described in equation 2), was fitted to soil water retention data of each of the 786 locations: (2) where S e (h) is the effective saturation corresponding to pressure head h (m); θ, θ r and θ s are water content, residual water content and saturated water content (m 3 m -3 ), respectively; α (m -1 ) and n are shape-fitting parameters. Saturated water content θ s was estimated by fitting equation 1, while θ r was assumed to be equal to the observed water content at h = -150 m.

Regression model fitting
Two types of PTFs were fitted and evaluated. The first (PTF-4v) included four soil variables as candidate predictors: content of sand (S, kg kg -1 ), clay (C, kg kg -1 ), organic matter (O, kg kg -1 ) and bulk density ρ (kg m -3 ). The second (PTF-2v) used only two predictors, sand and clay content: where y i corresponded to the respective van Genuchten model (Equation 2) parameters, here treated as PTF response variables: α = 10 y1 ; n = y 2 ; θ r = y 3 and θ s = y 4; and β i,n represented the linear model coefficients (parameters): β i,0 the intercept, and β i,1 , β i,2 , β i,3 and β i,4 the parameters referring to sand, clay and OM content and bulk density, respectively. ε i was the random error associated to each observation. As proposed by Vereecken et al. (1989), the response variable log(α) was used instead of α directly, to reduce variability. The response variable y 4 (corresponding to θ s ) was used only for PTF-2v, whereas for PTF-4v the θ s value was calculated as a deterministic function of ρ (Equation 1).

Figure 1. Northeastern region of Brazil (inset: Brazil)
showing locations where data of water retention and PTF estimators sand, clay, organic matter contents and bulk density were measured PTFs were developed for all four soil types together, as well as specifically for each soil class. The fitting was performed using software Statistica 8 (StatSoft, 2007). Prior to the analysis, the dataset was divided into two subsets (subset random procedure in Statistica 8, step 1 in figure 2). Both data subsets were considered to be independent: Subset 1, containing 85 % of the data corresponding to 673 locations, was used for PTF development (Table 1). Subset 2, with 15 % of the data (113 locations), was used for model validation. PTFs per soil class were developed for Ultisols, Oxisols, Entisols, and Alfisols. Cambisols and Luvisols were also represented in the dataset, but the number of locations for these soil types was insufficient for specific PTF fitting.
After fitting of the van Genuchten (1980) model (Equation 2) to the observed data obtaining 673 (development) + 113 (validation) sets of data containing S, C, O and ρ and respective values of α, n, θ r , and θ s (step 2 in Figure 2), each PTF predictor and response variable (VG parameters) was checked for possible outliers using the graphical exploratory tools of Statistica 8 (StatSoft, 2007). As proposed by Tukey (1977), an observation is classified as outlier if it does not fall in the interval between the cut-offs F L -k(F U -F L ) and F U + k(F U -F L ), where F L and F U are the lower and upper fourth quartiles of the sample and k is the outlier coefficient, customarily assumed as 1.5. Values of PTF predictors or response variables classified as outliers were excluded if consistency criteria related to soil physical properties were not satisfied.  Fitting of equations 3 and 4 parameters (PTF-4v and PTF-2v) for the prediction of each of the parameters α, n and θ r from equation 2 (step 3 in figure 2) was performed using the stepwise procedure at a significance level of 5 %. Parameters were estimated for the complete dataset as well as per soil class.

Goodness of fit (internal validation) and external validation of fitted PTFs
The agreement between VG parameters estimated by PTFs (step 4 in figure 2) and original VG parameters (step 5 in figure 2) was quantified using the validation subset (step 6 in figure 2). The following summary measures were adopted: the mean absolute error (MAE) correlation coefficient (r) and the index of agreement d (Willmott, 1982), given by the following expressions: where E i is the VG parameter estimated by PTF for location i from the validation subset and M is the respective value obtained from the original VG model fitting. E and M are the respective means. The PTF performance of estimating water content at specific pressure heads (0, -0.6, -1, -3 and -150 m) was also evaluated (Figure 3). At each pressure head, the original VG model performance was compared by its RMSE (RMSE Fit x Obs) to the performance of the VG model arising from the PTF estimation (RMSE PTF x Obs), calculating the RMSE by:

Model evaluation
In most soils used for PTF construction in this study the sand content was high, a common trait in the Northeast of Brazil, as can be seen in the texture triangles (Figure 4). Texture classes were predominantly loam, loamy sand, sandy loam, sandy clay loam and sandy clay. Some clayey, sandy and clayey loam textures also occurred.
More statistical information on grain size distribution, organic matter content and bulk densities for the two subsets can be found in table 2. The subsets used for PTF development and validation were very similar, with comparable means and standard deviations. Clay, silt and sand mean contents were close to 0.21, 0.15 and 0.64 kg kg -1 , respectively. Average bulk density was almost 1700 kg m -3 . Organic matter contents were low (0.006 kg kg -1 on average), a common feature for soils with a low clay content from this semi-arid region. Silt contents are also very low, much lower than those observed in most soils from temperate climates. Tomasella et al. (2000) evaluated PTFs for soils from several Brazilian regions and reported silt contents between 0.15 and 0.20 kg kg -1 , rarely higher than 0.50 kg kg -1 . These low silt contents are considered one of the reasons that PTFs developed in temperate climates are inefficient when applied to tropical soils. Figure 5 shows box-plots for both subsets of the observed (measured) water contents at pressure heads of 0, -0.6, -1, -3, -5, -10 and -150 m, used to establish the van Genuchten (1980) parameters (Equation 2). Mean and standard deviation were very similar in both subsets. Results of statistical analysis of estimates of the VG model parameters (Equation 2) are presented in table 3.

Fitted PTFs: predictive capacity for retention curve parameters
Parameter estimates for PTF-4v (Equation 3) and PTF-2v (Equation 4) are listed in tables 4 and 5 obtained with the complete dataset (General PTF) and for each of the soil classes Ultisols, Oxisols, Entisols, and Alfisols separately (specific PTFs). Table 6 shows the performance indices for PTF-4v and PTF-2v. For the General PTF-4v, performance is worst for parameter n (i=2) and best for θ r . The prediction of log(α) was much better with the General PTF-4v than with PTF-2v, as shown by all indicators of predictive capacity. Predictions of n and θ r differ only slightly between PTF-4v and PTF-2v, which means that there was almost no correlation between these parameters and OM content or bulk density. There was no clear advantage in using the soil-specific instead of the General PTFs, in agreement with findings by Pachepsky & Rawls (1999) and Hodnett & Tomasella (2002).
In the case of PTF-2v, the predictive capacity for θ s was poor. A possible explanation is the low OM content of these tropical soils, not correlated clearly to soil structure, texture or water retention properties. Difficulties in finding adequate PTFs for estimating parameters α and n have been reported by several other authors (Scheinost et al., 1997;Wösten et al., 2001;Pachepsky & Rawls, 2004) and should be interpreted regarding the fact that these fitting parameters are not real soil properties and their values are very sensitive to the fitting method and criteria (Wösten & van Genuchten, 1988).

Fitted PTFs: predictive capacity for water contents at specific pressure heads
The agreement between water content at specific pressure heads predicted by PTF-derived VG models (VG PTF models) and measured water content was quantified by the root mean square error RMSE (

Table 2. Descriptive statistics (mean, maximum, minimum and standard deviation -SD) for organic matter (OM), sand, silt, clay contents and bulk density (BD), for the PTF development (N = 673) and validation subsets (N = 113)
Figure 5. Box-plots of water content at specific pressure heads for the development and validation subsets. Bar minimum and maximum represent the smallest and largest observation respectively, box minimum and maximum represent the lower and upper water content sample quartile, respectively, and the dot represents the median value. 7). In this table, these RMSEs are shown together with the RMSE from the original fitted VG models. While the original and the VG PTF-4v models show, by construction, no error for estimating water content at h = 0 corresponding to θ s (θ s was assumed to be equal to the value for h = 0 calculated from bulk density), for VG PTF-2v predictions, the error was highest at saturation. From 0.6 to 150 m, RMSE values decreased for both VG PTF models, and were generally around two to three times higher than those from the original curves. The range of the RMSE for VG PTF-4v was 0.02 -0.046 m 3 m -3 and 0.029 -0.051 m 3 m -3 for VG PTF-2v, similar to the range found by Tomasella (2000) for tropical soils. Tomassela & Hodnett (1998) reported a range of 0.04 -0.06 m 3 m -3 when using texture information alone. Reference values for errors in water content estimation for several parts of the world were reported by Pachepsky & Rawls, (2004) and Wösten et al. (2001), showing errors between 0.02 and 0.11 m 3 m -3 .
Correlations between observed water contents and predicted water content by the VG models fitted to the original data and by the retention curves estimated by PTF-4v or PTF-2v are graphically represented in figure 6 (external validation) and figure 7 (internal  Table 3. Descriptive statistics for estimates of equation 2 parameters a, n, q r and q s of the fitted original van Genuchten model for the PTF development (N = 673) and validation subsets (N = 113) SD: standard deviation.   (1) Willmott (1982).     validation). Similarly to table 7, the correspondence between PTF predictions and observed values was reasonably good, except for θ s estimated by VG PTF-2v .

Regression model parameter van Genuchten parameter (PTF response variable)
Based on model quality summary measures (Table  6), it was observed that the predictive capacity for water contents at specific pressure heads (r-values from figures 6 and 7), with the exception of θ s estimated by PTF-2v, was much better than VG parameter prediction. This is important for the interpretation of the overall results of PTF development: although the predictive capacity for fitting parameters may seem low, water contents calculated using these parameters agree fairly well with observed water contents, encouraging the use of PTF-derived water retention models for soil water prediction. Generally, PTFs for specific pressure heads (h) lead to better results than PTFs based on estimation of parameters of water retention curves (Pachepsky & Rawls, 2004). Tomasella et al. (2008) suggested that, even when fitting PTFs for retention parameters, the water content should be estimated for specific h and those estimates used to obtain water retention curves via interpolation methods to reduce uncertainty.

CONCLUSIONS
1. The PTF prediction of retention curve parameters is generally relatively poor, with best estimates for the residual water content.
2. Including organic matter content as a PTF predictor improves predictions of the van Genuchten a parameter.
3. The performance of soil-class-specific PTFs is not clearly better than that of the general PTF. 4. Except for the saturated water content estimated by grain size distribution alone, the model performance for water content prediction at specific pressure heads was good, with r values of 0.89 -0.94, versus r of 0.94 -1 for the original water retention model. 5. Predictions of water content for pressure heads more negative than -0.6 m using a PTF based on grain size distribution alone are only slightly inferior to those obtained by PTFs including bulk density and organic matter content.