MORE ADEQUATE PROBABILITY DISTRIBUTIONS TO REPRESENT THE SATURATED SOIL HYDRAULIC CONDUCTIVITY

The saturated soil hydraulic conductivity (Ksat) is one of the most relevant variables in studies of water and solute movement in the soil. Its determination in the laboratory and in the field yields high dispersion results, which could be an indication that this variable has a no symmetrical distribution. Adjustment of the normal, lognormal, gamma and beta distributions were examined in order to search for a probability were density function that would more adequately describe the distribution of this variable. The experiment consisted in determining the saturated hydraulic conductivity, through the constant head permeameter method, in undisturbed samples of three soils of different textures from the central western region of the São Paulo State, Brazil, and submitting the results to the statistical tests for identification of the most adequate asymmetrical distribution to represent them. Ksat presented high variability, non normal distribution and lognormal, gamma and beta distributions fit. The lognormal probability density function was the most indicated to describe the variable, due to the verified greater agreement.


INTRODUCTION
The lack, or inadequacy of information on variables related to water and solute flow in soils makes the rational use of agricultural resources a difficult endeavor.Among the variables that interfere with this flow, a very prominent one is the hydraulic conductivity (K), which represents the facility of the soil in transmitting water.In a general sense, the greater the hydraulic conductivity, the easier for the water to move from one site to another.Its maximum value is reached when the soil is saturated, and then it is referred to as the saturated hydraulic conductivity (Reichardt, 1990).
It is possible to determine the soil hydraulic conductivity based on the saturated hydraulic conductivity (Ksat) and by using mathematical models, thus being able to follow water and solute movement.
Population probability curves that describe a phenomenon are unknown and they must be estimated through a sample frequency curve (Assis et al., 1996).This process will always contain errors and, therefore, the Scientia Agricola, v.59, n.4, p.789-793, out./dez.2002 problem consists in finding a probability function that minimizes this estimation error.The normal distribution is, a priori, the generally adopted solution but, if data does not follow this distribution the result can lead to erroneous conclusions.Only when frequency distributions are analyzed, quantitative results can be obtained more safely (Biggar & Nielsen, 1976).
In relation to Ksat, their distribution can adjust to the gamma and beta functions (Moura et al., 1999), and also to the lognormal distribution (Logston et al., 1990, Mohanty et al., 1991, Jarvis & Messing, 1995and Clausnitzer et al., 1998), which justifies a more detailed study about the adequacy of these distributions, enabling a better characterization of the Ksat variable, as well as its representative parameters.
The objective of this study is to present an analysis on the characterization of the soil saturated hydraulic conductivity, based on data adjustments to fit to the gaussian, lognormal, gamma and beta density probability functions, in order to indicate the best to represent this variable measured in a given area.

MATERIAL AND METHODS
Three soils of different textural classes were used in this study: a Typic Hapludox (LVAd), sandyclayey texture; a Rhodic Hapludox (LVdf), very clayey texture; and a Typic Quartzipsament (RQo), sandy texture.The soils came from the central western region of the State of São Paulo, Brazil, at 22° 41' South latitude, 47° 39' West longitude, and 550 m above sea level, approximately.Undisturbed samples were collected from the 0 to 0.20 m soil layer, by using a Uhland-type sampler, with a metal cylinder having mean diameter and height of 72 mm.Seventy samples were collected from soil 1, and 30 samples from soils 2 and 3.
The saturated hydraulic conductivity was determined by using the constant head permeameter method (Youngs, 1991), with distilled and de-aerated water, according to Faybishenko (1995) and Moraes (1991).Three Ksat determination replicates were considered for each sample, thus allowing the arithmetic mean to be used.
The statistical analyses consisted of a descriptive study of the data (Clark & Hosking, 1986), followed by the Kolmogorov-Smirnov test and graphical analyses of the normal, lognormal, gamma and beta distributions fit (Isaaks & Srivastava, 1989), and finalizing with robust techniques to model comparisons as discussed by Zacharias et al. (1996) and Sentelhas et al. (1997).
The UMVUE (Uniformly Minimum Variance Unbiased Estimators) method was utilized to calculate the lognormal distribution parameters, as recommended by Parkin et al. (1988), and the methodology indicated by Parkin et al. (1990) was used to calculate the confidence limits.

RESULTS AND DISCUSSION
The difference between mean and median values of Ksat was substantial (Table 1).For the LVAd, the mean is nearly 25% greater than the median, for the LVdf approximately 75% greater and for the RQo 14% greater.These observations evidence a greater dispersion relative to position measurements.Ksat is characterized as possessing high variability (Warrick & Nielsen, 1980;Kutilek & Nielsen, 1994), having high coefficients of variation, as found in this experiment.
The high and positive value of the coefficient of asymmetry demonstrates that the distribution is nonsymmetrical.This is enough per se to characterize the distribution as nonnormal.This condition is further reinforced by the high coefficient of kurtosis, greater than three the reference value for normal distribution.
Once the nonnormality of the data has been demonstrated, a different distribution that describes the property must be sought for.The Kolmogorov-Smirnov test was applied to other asymmetrical distributions cited in the literature in order to verify, among them, which is the most indicated; according to this test, it was verified that the probability of the data being distributed following a normal is less than 1% (P < 0.01**) for LVAd and RQo, and less than 5% (P < 0.05*) for LVdf, reassuring that the data does not follow the assumptions required by the normal distribution, i.e., they do not have the necessary characteristics to be considered as normally distributed regardless of the type of soil studied and, therefore, this distribution cannot be considered as representative of the variable.
The differences between the observed and the expected results relative to the lognormal, gamma and beta distributions were not significant for soils LVAd, LVdf and RQo, i.e., they do adjust to these probability distributions, according to Kolmogorov-Smirnov test.
The fact that the three distributions can represent the samples leads us to discuss other criteria in order to decide in favor of one particular distribution, and therefore, obtain the parameters necessary to represent the variable.One immediate criterion is the facility by which the data can be understood/operationalized by the chosen specific distribution.By this criterion, the beta distribution is the most complex in its basic foundation, presenting greater difficulty for data manipulation and parameter calculation; therefore, its use becomes less desirable for the practical purposes of obtaining information to be applied in agricultural projects.For these reasons and because of the greater differentiation relative to the observed data, expressed by the difference found with the Kolmogorov-Smirnov test, we decided to disconsider this distribution as an option to express the Ksat distribution.
Left with the lognormal and gamma distributions, the first one being frequently cited in the literature, and the second mentioned in recent projects as the work by Moura et al. (1999), the next criterion to decide between these two distributions would be the use of robust techniques, according to Zacharias et al. (1996) and Sentelhas et al. (1997), verifying the agreement between the theoretical Ksat distribution and the lognormal and gamma distributions, according to probabilities of occurrence estimated in each case (Table 2).According to these techniques, the agreement index (AI), the coefficient of determination (CD) and the efficiency (EF) should be equal to 1, and the mean absolute error (MAE), the maximum error (ME), the coefficient of residual mass (CRM) and the square root of the normalized quadratic mean error (SRME) should be equal to zero for a 100% agreement between observed values and values anticipated by the adopted distribution model.
The lognormal probability density function presented a value of AI closest to 1, the same happening with CD and EF, while MAE, ME, CRM and SRME were closer to zero as compared to the respective coefficients of the gamma distribution (Table 2).This allows us to conclude that the data adjustment was better to the lognormal distribution.
The gamma probability density function presented values of AI, CD and EF near one; however, the difference between these coefficients and the reference one was greater as compared to those of the lognormal probability density function.MAE was much greater for the gamma distribution when compared to the lognormal, which indicates that the gamma distribution, even not being significantly different by Kolmogorov-Smirnov test, is less close to the observed data than the lognormal.This is also evidenced by the CRM which, even being close to zero, is greater than the CRM for the lognormal distribution.ME and SRME were similar to those determined for the lognormal, but greater.This allows us to conclude that the data adjustment was better to the lognormal distribution.
Once the most adequate function to represent the distribution for the three soils has been defined, the rest of the discussion is restricted to the soil with intermediate texture (LVAd) to avoid unnecessary repetition, since the same comments are applicable to the other two soils.Figures 1a and 2a show, respectively, the frequency histogram, the lognormal probability curve, and the QQplot chart for visual inspection of adequacy of the lognormal distribution to represent the Ksat distribution for the LVAd, with Figures 1b and 2b showing the same for the gamma distribution.The lognormal distribution provides a better coverage of the area represented by the histogram bars when compared to the gamma distribution (Figures 1a and b), supporting the previous calculations with the Kolmogorov-Smirnov test and the various comparative indices shown in Table 2.
Even though the QQ-plot is a recommended technique to compare distributions, the visual inspection of Figures 2a and b does not show differences as clear as those observed between Figures 1a and b.The use of a single criterion to decide over the adequacy of distributions can be rather unsatisfactory.the set of utilized criteria, Kolmogorov-Smirnov, graphical, and by robust techniques, establishes without question the superiority of the lognormal distribution under these statistical criteria.
In addition, the fit function depends on the precision of estimation of parameters α e β, which are directly linked to the shape of distribution of the observed values; this makes the gamma distribution difficult to use.
The occurrence of soil properties with nonnormal distribution is common, and statistical procedures have been applied without the complete attention required by their foundation and limitations (Menk & Nagai, 1983).Many times, data are accepted as being normally distributed without appropriate questioning.In the present case, it would be equivalent to accepting the mean, median and standard deviation values presented in Table 1, which were obtained based on the normal distribution, but since the observed Ksat data are not normally distributed, those values cannot be used; otherwise they can lead to errors in the formulated conclusions.Parkin et al. (1988) and Parkin & Robinson (1992), evaluating sample data estimation methods for a lognormal population, concluded that the UMVUE method yields estimates with least errors.By this method, the characteristic values observed for Ksat, considering them lognormally distributed, are: mean 0.0157 x 10 -2 m s -1 , median 0.0127 x 10 -2 m s -1 , standard deviation 0.0114 x 10 -2 m s -1 , coefficient of variation 73%, lower and upper limits for the confidence interval of the mean (95%) 0.0127 x 10 -2 m s -1 and 0.0175 x 10 -2 m s -1 , respectively.These parameters should then be analyzed, and utilized in the future as statistical parameters for the variable.
If the sampling values are lognormally distributed we must choose, among the position parameters (mean and median), the one which is to be used as a statistical summary, because the values are not the same and Where: LVAd = Typic Hapludox; LVdf = Rhodic Hapludox; RQo = Typic Quartzipsament.P.D.F.= Probability density function; AI = agreement index; CD = coefficient of determination; EF = efficiency; MAE = mean absolute error; CRM = coefficient of residual mass; ME = maximum error and SRME = square root of the normalized quadratic mean error.Scientia Agricola, v.59, n.4, p.789-793, out./dez.2002

Table 1 -
Descriptive statistics for the Ksat variable for the soils under study.

Table 2 -
Robust techniques for model comparison for the Ksat variable (m s -1 ), for the soils under study.