Multivariate analysis and Geostatistics of the fertility of a huMic rhodic hapludox under coffee cultivation ( 1 )

the spatial variability of soil and plant properties exerts great influence on the yeld of agricultural crops. this study analyzed the spatial variability of the fertility of a humic rhodic hapludox with arabic coffee, using principal component analysis, cluster analysis and geostatistics in combination. the experiment was carried out in an area under Coffea arabica l., variety catucai 20/15 – 479. the soil was sampled at a depth 0.20 m, at 50 points of a sampling grid. the following chemical properties were determined: p, K+, ca2+, Mg2+, na+, s, al3+, ph, h + al, sB, t, t, v, m, oM, na saturation index (ssi), remaining phosphorus (p-rem), and micronutrients (Zn, fe, Mn, cu and B). the data were analyzed with descriptive statistics, followed by principal component and cluster analyses. Geostatistics were used to check and quantify the degree of spatial dependence of properties, represented by principal components. the principal component analysis allowed a dimensional reduction of the problem, providing interpretable components, with little information loss. despite the characteristic information loss of principal component analysis, the combination of this technique with geostatistical analysis was efficient for the quantification and determination of the structure of spatial dependence of soil fertility. in general, the availability of soil mineral nutrients was low and the levels of acidity and exchangeable al were high.


introduction
Knowledge on the spatial variability of soil and plant properties that control crop yields is indispensable in modern agriculture, since minor changes can lead to great yield differences. the behavior of these properties in the cultivated areas was characterized by significant changes, associated with significant spatial and temporal variability, due to the agricultural management and other factors (silva et al., 2010a).
an analysis of the variability of soil properties in defined landscape, geomorphic surface and soil mapping units, must take the influence of crop management into account (sanchez et al., 2005).this variability affects the course of boundaries between soil classes, the true potential of production areas (Marques Júnior & lepsch, 2000) and guides decision making for optimized yields.in the specific case of coffee production, the development of high yielding varieties, the search for specialty coffees and production on highly heterogeneous soils require a better understanding of the soil-plant system, including spatial nutrient variability and its dynamics (silva et al., 2010b).these measures can prevent nutritional deficiency, which considerably hamper the efficiency of fertilization programs, reducing yields (reis Jr. & Martinez, 2002).however, to analyze the structure of soil fertility and the nutritional status of plants, it is necessary to evaluate a number of nutrients involved, which often increases the size of the problem, for being performed by univariate methods.one possibility of analysis of this kind of data is multivariate analysis, in particular, principal component analysis (Pca), to reduce the problem and thus facilitate interpretation (Boruvka et al., 2005).
the Pca is a technique based on linear combinations of original variables.the first principal components explain most of the total variance contained in the data set and can be used to represent it (valladares et al., 2009).the practical utility, especially for soil mapping, is the possibility of identifying the most important variables in the space of principal components, generating interpretable components, to define and quantify the spatial behavior of soil fertility, using geostatistics and a reduced number of variables.
this study analyzed the spatial variability of fertility of a humic rhodic hapludox under coffee, using principal component analysis, cluster analysis and geostatistics in combination.

Material and Methods
the experiment was carried out in an area of approximately 1.2 ha of Coffea arabica l. variety r.Bras. ci. solo, 36:467-474, 2012 yellow catucai 20/15 -479, spaced 2.5 x 0.6 m, with steep slopes (30 cm m -1 ), located in the Zona da Mata of the state of Minas Gerais,in reduto (20º 45' 45,4' s and 41º 32' 9,75' W). the soil of the experimental area was classified as humic rhodic hapludox (soil survey staff, 2006).the climate of the region is aw, humid subtropical, with humid summers (mean temperature > 27 °c) and dry and cool Winters (mean temperature < 20 ºc) (Köppen & Geiger, 1928).soil fertility was mapped in a grid of 50 georeferenced sampling points (figure 1). the soil samples were collected under the canopy of three coffee trees in the soil layer 0-0.20 m, using a steel probe (internal diameter 0.05 m). the three soil samples from each point were mixed, corresponding to the representative sample of the sampling point.
the values were determined based on the position and dispersion in the descriptive and exploratory statistical analysis.data normality were evaluated by the shapiro-Wilk test (p < 0.05).
the principal component analysis (Pca) was performed based on the correlation matrix between the components and the actual properties, to identify new variables that explain most of the variability, generating new values for each sampling point corresponding to principal components.these components can be seen as "super variables", based on the combination of the correlation between variables and are extracted in decreasing order of importance, in terms of their contribution to the total data variance.
cluster analysis was performed according to the Ward method, to classify the soil chemical properties and principal components into homogeneous groups.this multivariate analysis does not take into account the data distribution, but clustering is based on the similarity measures between plants (Johnson & Wichern, 2002).
the number of principal components was determined based on the quality criterion of the correlation matrix, using the components associated with eigenvalues of over 1 (Johnson & Wichern, 2002).the correlation of the components with soil chemical properties was considered significant if greater than 0.7, according to Zwick & velicer (1986).
Geostatistics were used to check and if detected, to quantify the degree of spatial dependence of data generated by Pca. the fitting of theoretical functions to experimental variogram models was based on the assumption of stationarity of the intrinsic hypothesis and according to the equation: where N (h) is the semivariance, n (h) is the number of pairs of experimental observations Z (xi), Z (xi + h), separated by vector h. to fit theoretical models to experimental variograms the nugget effect (c 0 ), sill (c 0 + c 1 ), structural variance (c 1 ) and range (a) were determined.spherical, exponential, Gaussian and linear models were tested.the variogram was determined by the least square method and adopted as criterion for selecting the highest r 2 value (determination coefficient), the smallest ssr (sum of square residues) and the highest correlation coefficient obtained by the cross validation method.
after the identification of spatial dependence, the interpolation method of ordinary kriging was used to estimate values at unmeasured locations.

results and discussion
the table 1 shows the exploratory analysis of the values found for the properties.Measures of central tendency (mean and median) were similar for more than 80 % of the properties, indicating a data distribution with little variation around a central value.
Phosphorus, K + , na + , ca 2+ , Mg 2+ , sB, ssi, fe, and cu showed asymmetric right-sided distribution, indicating a deviation from the normal distribution, by the shapiro-Wilk test (p < 0.05).the remaining properties showed asymmetry coefficients close to zero, tending to normal distribution, as confirmed by the normality test.
Paz-Gonzalez et al. ( 2001) stated that, in the case of data normality, estimates using the kriging method are more efficient and obtain better results than other methods.according to cressie (1993), even for data with non-normal distribution, kriging is efficient, in situations where the tails of the normal distribution are not very long.
the kurtosis of all properties was nearly zero, except for ph, v %, and ssi. the distribution of ph and v % was platykurtic, but did not affect data normality, since their standard deviations were low. the distribution of ssi was leptokurtic, but non-normal, due to the non-zero asymmetry and high standard deviation.vieira (2000) observed the coefficient of kurtosis should be analyzed together with the errors and standard deviations, in order to confirm the trend to normal or non-normal distribution.
analyzing the variation coefficient (cv) according to the classification proposed by Warrick & nielsen (1980), the values were grouped as: low -cv < 12 %; moderate -12 % ≤ cv ≤ 60 %; and high -cv > 60 %.except for P (high), ph, t, v % and oM, all other properties were in the range 12-60 % and variation was therefore considered moderate.
the variations in soil chemical properties were related to successive changes caused by agricultural activities and, consequently, by erosion processes.the different behavior of the properties across the landscape can also be explained by successive and irregular fertilization and liming (silva & chaves, 2001).
the mean values of soil chemical properties were classified, according ribeiro et al. (1999) for the state of Minas Gerais, as: high acidity for ph, using principal component analysis, seven principal components were extracted that cumulatively explained 79.81 % of the total data variability and with an eigenvalue greater than 1 (table 2). the other components had an eigenvalue less than 1 and were not used.Generally, the first components account for most of the variability contained in the data set (Johnson & Wichern, 2002).
according to the selection criteria applied in this study, the first two principal components were used in the analysis, mainly due to the presence of such correlations with the real properties, as shown in the circle of correlations presented in figure 2. the components represent 38.14 % and 22.01 % respectively, both of which correlated with a satisfactory number of soil properties.Boruvka et al. (2005) reported that, in general, few components account for about 60-90 percent of the total variance in an analysis, which are the only components to be used for data representation.Johnson & Wichern (2002) stated that in principal component analysis, although reducing the data volume, the loss of information should be as small as possible, accepting a loss of less than 40 %.Greater losses affect the subsequent analysis and interpretation.
the first component correlated with 10 of the 22 soil properties analyzed.the properties al and "m" correlated negatively with the first component, while ph, K + , ca 2+ , Mg 2+ , sB, t, v, and Zn, positively (p < 0.05) (figure 2).this component represents soil fertility in terms of macronutrients, since many of these were correlated with the first component according to Johnson & Wichern (2002), the opposition of signals indicates that when one variable increases, the other decreases.the t is the product of sB + (h + al), and sB represents the bases K + , ca 2+ , Mg 2+ ; it is to be expected that increasing nutrient concentrations elevate the cec.
the second principal component represents the micronutrients, since, with the exception of B, all nutrients involved in this class were correlated with that component.Zn, fe and cu, followed by na and ssi, correlated negatively, while Mn, followed by s, positively.the technique of cluster analysis by Ward´s method (figure 3) allowed to efficiently complement the results of principal component analysis.this multivariate classifying can be used to explore the similarities between plants or variables.the set of 22 original properties (figure 3a), with the similarity measure given by the euclidean distance of 600 (cutoff), created two homogeneous groups.to form two homogeneous groups with seven principal components (figure 3b), explaining 79.81 % of the data variability, a cutoff distance of 25 was used.a shorter distance would represent a considerable increase in the percentage of disagreement, which would hamper the comparison between groups.figure 4 represents the distribution of sampling points, explaining the two groups formed by cluster analysis with real data (4a) and principal components (4b).figure 4 shows a similarity of 70 % between the actual set of properties and the seven principal components which together account for 79.81 % of the variability, thus validating the use of an analysis that reduced the number of variables, facilitating interpretation.comparatively, the cluster analysis across the seven principal components reduces the complexity of interpretation of soil properties of the analysis with the 22 original data. in this case, as most of the variability is contained in the first and second components, the analysis is even simpler.the geostatistical analysis was performed using the values of the first two principal components, and adjusted to the spherical variogram model (table 3 and figure 5).
the spatial dependence increased gradually until the sill (range of 17-23 m), depending on the principal component considered. in such cases, the points located in an area of radius less than or equal to the range are more similar; they depend on each other spatially and can be used to estimate values for unsampled locations.the range of values can influence the quality of the estimates, since it determines the number of values used in interpolation, so estimates based on a greater range of ordinary kriging values tend to be more reliable, with maps that represent reality (corá et al., 2004).
nugget effect values (c 0 ) were close to zero for the two principal components, which increased the accuracy of estimates by kriging in unmeasured areas.c 0 represents the component of spatial variability that cannot be related to a specific cause (random variability), and represents the variability at smaller than the shortest sampling distance (vieira et al., 1983).
the spatial dependence index (sdi) shows a strong dependence for the two principal components, according to the classification proposed by r.Bras.ci.solo, 36:467-474, 2012 cambardela et al. (1994). according to vieira et al. (1983), the higher the proportion of structural variance (c) for the sill (c 0 + c), the higher the similarity between the neighboring values, the continuity of the phenomenon and the smaller the  variance of the estimate, and therefore, the greater the confidence in estimates values in non-sampled areas by the interpolation method by ordinary kriging.
after defining the parameters of variogram models the data were interpolated by ordinary kriging for soil fertility mapping, based on the principal components (figure 6).
the variability in macronutrients (Pc1) is higher, as observed previously by its lower range of semivariance.the presence of patches of higher fertility characterize this variability.however, in general, fertility is high the upper portion of the area, despite reduced nutrient values, as shown in the descriptive analysis (table 1).for macronutrients, soil acidity is lowest in the upper part of the area.this underlies the distribution of acidity versus nutrient availability in the upper part of the area, since most nutrients (K + , ca 2+ , Mg 2+ , s, B and P) are less available at higher acidity.
the availability of all micronutrients, except Mn, was high in the area, especially for Zn, fe and B. higher acidity values favor the availability of Zn and fe.therefore, the Zn and fe distribution and low amplitude in the area (Pc2) were related to these acidity values.higher values can result in phytotoxicity, resulting in multiple physiological disturbances.figure 6. thematic maps based on the first two principal components (pc1, pc2).r.Bras. ci. solo, 36:467-474, 2012 often, low fertility is a constant in highly weathered soils such as rhodic.however, it is not a rule that in these soils fertility is low, cec low, acidity high, al 3+ levels toxic and the capacity high to retain P in forms unavailable to plants (fontes & carvalho Jr., 2005).these characteristics, according to oades & Waters (1991), are usually observed at greater intensity in inappropriately cultivated soils with oM reduction which contributes to the reduction of t. among other functions, oM could inactivate toxic al 3+ and avoid the adsorption of phosphate.conclusions 1. the principal component analysis allowed a dimensional reduction of the problem, providing interpretable components with low information loss.
2. despite the information loss, characteristic of principal component analysis, the combination of Pca with geostatistical analysis was efficient to quantify and determine the spatial dependence structure of soil fertility.
3. in the soil studied, reduced availability of mineral nutrients and high levels of acidity and exchangeable al were generally observed.literature cited

figure 1 .
figure 1. digital elevation model (deM) of study area with the distribution of sampling points.

figure 3 .
figure 3. dendrogram resulting from cluster analysis of soil properties (a) and the seven main components (b).

table 3 .
Models and parameters of the variograms adjusted for the two principal components c0: nugget effect; c0+c: sill; sdi: spatial dependence index + a: range; r 2 : determination coefficient the model; r 2 (vc): determination coefficient of cross validation.

table 2 . summary of the main components of the multivariate analysis of soil properties figure 2. circle of correlation of the plane formed by principal components 1 and 2 with the ori- ginal variables.
r.Bras.ci.solo, 36:467-474, 2012