Establishment of homogeneous zones in a soil of alluvial origin

Soils have the ability to maintain plant growth and biological activity due to their physical and chemical properties. The aim of this study was to observe the spatial distribution of some chemical properties of the soil, such as pH, organic matter (OM), electrical conductivity (EC), effective cation exchange capacity (ECEC), sulfur (S) and aluminum (Al) content and to establish zones with homogeneous chemical characteristics using the MULTISPATI-PCA technique and the fuzzy c-means algorithm. The study area was located in the Tundama and Sugamuxi Valleys (Boyacá, Colombia) with an area of 8,017 ha. Chemical properties such as pH, OM, EC, S, Al, and ECEC were indicators of the chemical degradation of these soils. Four homogeneous zones were identified. The first zone represents areas with acidity and excessive sulfur, with a pH of 4.54, 15.88% OM, 3.19 dS m-1 EC, 2.47 meq 100 g-1 Al and 365.59 meq 100 g-1 S. In contrast, the second zone represents areas with a high self-neutralizing capacity, with a pH of 5.98, 4.22% OM, 0.75 dS m-1 EC, 0.20 meq 100 g-1 Al and 44.64 meq 100 g-1 S. Zone three showed a high similarity with the first two, except for its EC and S contents. Finally, zone four showed similarity with the first, except in OM, EC and S contents. These data show that S and EC influenced the homogeneous zones because the soils in this area are called acid sulfate soils.


INTRODUCTION
The ability of soils to maintain plant growth and biological activity lies in their physical and chemical properties (Lal, 2002). These properties are the result of specific interactions among the five formation factors in a given place (McGraw, 1994) and of dozens of pedogenetic processes, thus generating spatial variability in their chemical, physical, biological, and mineralogical properties, among others (Jaramillo, 2014).
The study of the spatial variability of agricultural soil properties is important to make appropriate management decisions and to improve soil quality (Rosemary et al., 2017). The variability has a strong relationship with the soil use (Wang;Shao, 2013), i.e., a soil without human intervention shows less variation than one under agricultural use. Moreover, in the latter, management practices affect the change of soil properties (Jaramillo, 2012). Once the source of variation is known, a higher efficiency in establishing homogeneous zones is achieved (Mzuku et al., 2005), allowing the implementation of differential management. This management improves efficiency and sustainability in production through fertilization, irrigation and tillage practices, among others, that are specific to each place Sadeghian;Lince, 2013).
The use of several statistical tools to study the spatial variability of soil properties has been recorded (Reichardt;Timm, 2008). Classical statistics is the discipline that started these studies. However, classical statistical evaluations only include generalizations regarding the magnitude of the variation and cannot be used to evaluate autocorrelated data (Stoyan et al., 2000). As an alternative, Bachmeier and Bufa (1992) mention that the theory of regionalized variables allows the measurement of the spatial dependence of edaphic properties and that geostatistics provide a means to define an autocorrelation with semivariograms (Stoyan et al., 2000;Díaz, 2002).
The use of geostatistics to establish the variability of the chemical properties of soils has been widely studied in various types of soils and production systems worldwide (Rahal, 2015;Aghasi et al., 2017). Within geostatistical techniques, interpolation with ordinary kriging (OK) is the most widely used method, and it assumes intrinsic stationarity. However, if the spatial stochastic process shows a tendency, the assumption is unsustainable, and a more robust model that can better explain the variation is needed (Li et al., 2015). As a solution, Matheron (1969) introduced so-called universal kriging (UK), where the trend is removed using surface models that turn out to be linear combinations of spatial coordinates (Díaz, 2002).
Principal component analysis (PCA) is commonly used at the multivariate level to construct linear combinations with study variables (Moral;Terrón;Da Silva, 2010); however, this method does not consider the dependency structures expected in spatial data. As an alternative to this analysis, a method called MULTISPATI-PCA (Dray;Saïd;Débias, 2008) has been designed with good results in the study of soils. This method incorporates spatial information and uses the Moran index (MI) to measure the dependence or spatial correlation between observations at a site and the average in their neighborhood (Arrouays et al., 2011). Thus, this analysis considers spatial structure (autocorrelation) in the original variables to produce synthetic variables and the relationships between the variables is measured (covariability analysis) (Córdoba et al., 2013).
MULTISPATI-PCA has been used in the delineation of management zones based on soil and terrain variables (Peralta et al., 2015). In Córdoba, Argentina, this technique was used for multivariate zoning at the regional scale, involving edaphic and climatic data (Giannini et al., 2018). In France, it was used across the entire country to study the main soil characteristics of the topsoil and to assess their multivariate spatial patterns (Arrouays et al., 2011). Regarding the definition of management zones of the different cluster analyses, the fuzzy c-means algorithm has been widely used (Rodrigues;Corá, 2015;Tripathi et al., 2015;Gavioli et al., 2016). This algorithm has been determined to be preferable for grouping properties in the soil continuum (Odeh;McBratney;Chittleborough, 1992) and has been used in the grouping of similar points using the main spatial components as inputs for site classification (Córdoba et al., 2013).
Accordingly, the aim of this study was to observe the spatial distribution of some chemical properties of the soil, such as pH, organic matter (OM), electrical conductivity (EC), effective cation exchange capacity (ECEC), sulfur and aluminum contents. Furthermore, the study aims to establish zones with chemical homogeneous characteristics using the MULTISPATI-PCA technique and the fuzzy c-means algorithm in a soil of alluvial origin.

Description and location of the study area
This study was carried out in an area of 8,017 ha in the department of Boyacá (Colombia), specifically in the region known as Valles del Tundama and Sugamuxi. These areas are located geographically between the parallels 5º 43' 28.8228" N and 5º 50' 32.4162" N, and the meridians 73º 6' 38.4798" W and 72º 56' 0.3264" W of Greenwich, at an average altitude of 2,500 m a.s.l. (Figure 1). Soils within the study area are mostly used for livestock grazing, and only approximately one thousand hectares are used for agricultural production, where potato, wheat, corn, beans, peas, beans, carrots, and bulb onions, among others, are the main crops produced Castillo, 2016).
The study area remains flooded most of the time because it is located in a "valley" geomorphological position. These soils are characterized by continuous chemical degradation due to various forms of sulfur, high concentrations of soluble aluminum and iron, production of sulfuric acid, low availability of phosphorus and low base saturation. For this reason, they are known as acid sulfate soils (Bernal;Forero, 2014).

Soil sampling and laboratory analyses
The data set used in this study was obtained under the Agreement 20110060 (Internal Code 1723) between Ministerio de Agricultura y Desarrollo Rural de Colombia (MADR) and Corporación Colombiana de Investigación Agropecuaria (Agrosavia). The sampling design was stratified according to the type of soil, land uses, origin and timing of floodwaters and ponding. Two hundred ninety-five observation points were sampled at a depth between zero (0) and 20 cm, and their allocation was randomized in each unit. The chemical soil properties measured were pH (soil:water ratio of 1:2.5 determined by potentiometry) and organic matter (OM) measured by the Walkley and Black method. Available phosphorus (P) was measured by Bray II through VIS spectrophotometry, and calcium (Ca), magnesium (Mg), potassium (K) and sodium (Na) were measured using 1N ammonium acetate at pH 7.0, employing atomic absorption spectrophotometry. Exchangeable aluminum (Al) was measured by extraction with KCl when the pH was <5.5 through volumetry, available sulfur (S) was determined by calcium monobasic phosphate through spectrophotometry, and Fe, Mn, Cu and Zn contents were determined by the modified Olsen method through atomic absorption spectrophotometry. Available boron (B) was measured using calcium monobasic phosphate through VIS spectrophotometry, and electrical conductivity (EC) and effective cationic exchange capacity (ECEC) were calculated by the sum of the cations of Ca, Mg, Na and K.

Statistical analysis
Initially, a descriptive and exploratory analysis was carried out with all the study variables to calculate central tendency and dispersion measures (Rodríguez;Rubiano, 2016). Additionally, the significance of Pearson's product-moment correlation coefficients was evaluated using the Clifford, Richardson and Hémon (1989) procedure and Moran's index to estimate the spatial autocorrelation. The coefficient of variation (CV) was analyzed using the Warrick and Nielsen (1980) criterion, in which values lower than 12% are considered to have low variability, those between 12% and 60% to have medium variability, and those higher than 60% to have high variability.
Subsequently, an exploratory analysis was carried out through visual diagnostic tests to evaluate the assumptions of normality using box diagrams, histograms and qq-plots. Stationarity was evaluated with scatter plots of the variable versus latitude and longitude.
Stationarity was verified with a spatial trend analysis estimating a polynomial model through a multiple regression. In this regression, the variable under study was the dependent variable, and the sampling point coordinates were the independent variables (Jaramillo; Sadeghian; Lince, 2013). Once a significant regression model was obtained (p<0.05), residuals were extracted to carry out semivariance analysis.
A normality test was performed with the residuals employing the Shapiro-Wilks test at 5% significance; when this assumption was not met, the variable was transformed into a natural logarithm (Li;Webster;Shi, 2015). In this case, the analysis described above was performed again with respective trend removal according to each case (Jaramillo, 2009). Semivariograms were created, adjusting each one according to the model found for each variable. The first step was to determine the initial parameters with the eyefit function using the interactive Tcl-Tk interface. In these, different models were tested (exponential, spherical, Gaussian, Matern, among others).
The values obtained from the initial parameters were used to estimate the semivariance models, adjusting these according to the model found for each spatial stochastic process, establishing the sill (C), range (A) and nugget effect (C0) (Hernández et al., 2018). The estimated parameters were calculated through ordinary least squares, weighted "n pairs", weighted "cressie", maximum likelihood and maximum restricted likelihood (Cressie, 1993;Selby;Kockelman, 2013;Li;Webster;Shi, 2015).
To establish the goodness of the predictions made by different methods, cross-validations were made, and the best model was selected by the highest cross-validation coefficient (CVC), the smallest root of the mean square error (RMSE), the reduced error (RE) value closer to zero, the value of the standard deviation of the reduced errors (SDRE) closer to one (Faraco et al., 2008;Johann et al., 2010;Cortés;Giraldo, 2016) and the best degree of spatial dependence (DSD) of each of the chemical soil properties, according to the classification proposed by Cambardella et al. (1994). These authors consider the degree of spatial dependence as strong when DSD ≤ 25%, moderate when 25 < DSD ≤ 75%, and weak when DSD > 75%.
For interpolations, the polygon of the zone was used, and this was calculated from the sample size and total number of pixels with the equation of inspection density suggested by Hengl et al. (2006) used when predictions are created. Each map should have approximately an equal sample density per area. Therefore, an approximate pixel size (TP) of 40 was established for a total of 50,106 pixels used in the maps per variable, as well as the main interpolated spatial components. The predictions were carried out by implementing UK, and for variables transformed by natural logarithm, the inverse transformation was applied with the correction of Laurent (1963) to estimate the original scale. The correction consists of adding a value of 0.5 to the estimated regionalized variable and multiplying it by the estimated variance in the exponential function.
For the principal spatial components (sPC), the MULTISPATI-PCA method was used. A weighting matrix Wnxn, which is a mathematical representation of the geographical distribution of the study sites, was created. The neighborhood network was defined based on the Euclidean distance between adjacent neighboring points. Subsequently, sPCs were calculated, and the associated eigenvalues equivalent to the spatially structured variance were obtained. The presence of autocorrelation in sPCs was analyzed with the Moran index (MI). UK interpolation was applied based on the MULTISPATI-PCA sPC1 and sPC2 semivariograms to obtain multivariate spatial variability maps. (Dray;Said;Debias, 2008, Córdoba et al., 2012. The fuzzy c-means cluster algorithm was employed with the interpolations of the first two sPCs to classify the homogeneous zones using the Euclidean distance, a fuzziness exponent of 1.3 (Córdoba et al., 2016) and a maximum number of iterations of 500. To validate the results, internal measures of four indexes, partition density (PD), Xie and Beni (XB), Fukuyama and Sugeno (FS) and partition coefficient (PC) values were used. Between two and eight homogeneous zones were tested, which were compared using the lowest values of the first three indexes and the value closest to one for the PC (Meyer et al., 2017). The analyses described above were performed with the statistical package R version 3.4.4 (R Core Team, 2018) and Pearson's product-moment correlation with the software PASSaGE (v.2) (Rosenberg; Anderson, 2011).

Spatial analysis
In Table 1, descriptive statistics for the study properties can be observed. The CV showed a high variability in all the evaluated properties, except for pH and Fe, which showed medium variability; for the latter, its interpretation agrees with Hernández et al. (2018). Regarding pH, studies have shown that its variability range fluctuates between 2 and 15% (Cox;Gerard;Abshire, 2006); however, in the current study, a 21.43% range was found. This can be explained by the physical, mineralogical and biochemical changes in the acidity-related processes of this type of soil, which directly affect this property (Rosicky;Sullivan;Slavich, 2004).
Six chemical properties were selected: pH, OM, EC, S, Al and ECEC, for the univariate interpolation by UK, and their significant spatial correlation (p <0. The above agrees with studies of the area where changes in these properties have been highlighted, establishing that pH, Al and S are important indicators in the diagnosis of acidity and sulfation status of this type of soil (Castro et al., 2006;Rincón;Castro;Gómez, 2008). pH showed a lower value of skewness (0.17) in comparison with the values between 1.53 and 2.91 that OM, EC, S, Al and ECEC presented; these last variables were transformed by natural logarithm. Regarding stationarity, all the variables presented a tendency, and it was necessary to remove it through a model as a function of the coordinates. The semivariograms obtained for the chemical properties studied were adjusted to exponential models, with ranges above 345.9 m, being higher for S with 1013.5 m ( Table 2). The restricted maximum likelihood (REML) method was used to adjust the OM, S and ECEC, ordinary least squares (OLS) was used for pH and Al, and maximum likelihood (ML) for EC (Figure 2).
The degree of spatial dependence (DSD) was classified as strong (DSD ≤ 25%) for four properties (OM, EC, Al and ECEC) and moderate (25 < DSD ≤ 75%) for the two remaining properties; however, all of them were less than or equal to 37.0%. According to Cambardella et al. (1994), the variables with strong spatial dependence are more influenced by soil formation factors ( Table 2). The evaluation of UK interpolation was intermediate, with a range of cross-validation coefficients between 0.53 and 0.73, i.e., lower than those found by Varón-Ramírez, Camacho-Tamayo and González, (2018), and similar to those found by Cortés, Camacho-Tamayo and Giraldo (2016).  The pH interpolation range fluctuated between 3.5 and 7.4, with a mean of 5.49 and a standard deviation of 0.98. Figure 3A shows that some areas of the Tundama and Sugamuxi Valleys have soils with ultra-acidic to extremely acidic reactions (pH 3.1-4.0) (Castro et al., 2006). The Pearson's product-moment correlations were significant, namely, low pH values are related to high levels of Al (r = -0.75, p-value= <0.0001) ( Figure 3E), increases in OM (r = -0.52, p-value= 0.0002) ( Figure 3B), increases in EC (r = -0.29, p-value= 0.0995) ( Figure 3C) and excessive levels of S (r = -0.45, p-value=0.0079) ( Figure 3D), which generate undesirable conditions for vegetative development.
In contrast, areas with pH > 5.5 and lower contents of Al, S, OM, EC and ECEC were observed. In these areas, according to Welch et al. (2009), corrections have generally been used with CaCO 3 , with the purpose of neutralizing the acidity caused by hydrogen, aluminum, manganese, iron and organic matter acids, to allow the normal growth of plants. This implies that as the CaCO 3 doses increase, the pH stabilizes to 7.0, indicating that the high load of sulfates and iron exerts self-neutralization with joint formation of gypsum and Fe and Al hydroxides, a specific phenomenon of acid sulfate soils (Dent, 1986).

Homogeneous zones
A spatial principal component analysis (sPCA) was performed, which showed significant spatial autocorrelation values (p-values <0.001) and MI (0.49 and 0.26) for the first and second spatial components, respectively. Figure 4A shows eigenvalues that suggest two main structures. The eigenvalues are the product between the variance and the spatial autocorrelation of the components. These first two axes collected 78.89% of the cumulative spatial variance. SPC1 includes properties related to soil acidity pH, showing pH, Cu, Zn and Mn opposed to OM, Al, S, Ca, B, EC and ECEC, which explains 61.68% of the cumulative spatial variance. SPC2 is represented by the Al content of the soil, contrasted with pH, Ca, Mg, ECEC, which explains the 17.21% value ( Figure 4A). Finally, Figure 4B shows the sPC3 that is explained by Mn, Fe and Mg as opposed to the pH, explaining the 8.37% value.
After carrying out the sPCA, the semivariograms obtained were adjusted to exponential models, with the maximum likelihood (MV) and maximum restricted likelihood (REML) methods ( Figures  4C-D). The estimated parameters obtained and the validation measures are presented in Table 3. SPC1 showed a strong DSD and a 0.74 CVC, in contrast to sPC2, which exhibited a moderate DSD and a 0.46 CVC.
The spatial distribution of sPC1 ( Figure 4E) showed that areas with low values (blue scale) have acidic pH, low levels of Cu, Mn and Zn and high values of OM, Al, S, Ca, B, EC and ECEC.   In these areas, the excessive Ca content makes the absorption of all metallic micronutrients (Mn, Zn) difficult due to a decrease in their solubility (Gómez et al., 2007). SPC2 ( Figure 4F) recorded areas with negative values (blue scale) with slightly acidic pH, low Al content, and high Ca, Mg, ECEC values, contrary to areas with positive values (yellow and red scale) that showed high Al content, acidic pH, and low Ca, Mg and ECEC values.
Homogeneous zones for the Tundama and Sugamuxi valleys were defined through the fuzzy c-means cluster algorithm for the interpolations of the first two spatial principal components. The number of optimal homogeneous zones (HZ) was established according to the registered indexes for the algorithm, where a number of classes of four were selected due to the lowest values found in the first three indexes and the value closest to one of the partition coefficients (PC) ( Figure 5).
From this point, four homogeneous zones (HZs) were established ( Figure 6). Subsequently, the sampling points of the properties were located on the map of these areas, where the HZ1 concentrated 15%, HZ2 39%, HZ3 22% and HZ4 concentrated 25%. Table 4 shows the main group characteristics. The chemical property with the highest discrimination among the HZs was S, which agrees with what was reported by Combatt, Palencia and Marin (2003).
HZ1 was named "Areas with excessive acidity and sulfur levels". This type of area is characterized by having an acid pH (> 4.0), high Al content and excessive sulfur levels that appear neutralized by high Ca and ECEC concentrations. The highest limitations for this area are the high EC values and Na levels due to the effect of groundwater loaded with calcium and sodium sulfates in depressed areas (Castro et al., 2006). These zones are composed of materials rich in sulfides whose aeration  produces a decrease in the pH to levels not tolerable by plants. The extreme acidity has determinant implications for soil toxicity, including an increase in aluminum and iron solubility. For this reason, the nutritional quality of this soil can become severely deficient. Similarly, in the root zone, acidity causes a loss in plant productivity, and acid runoff causes adverse environmental impacts (Rincón;Castro;Gómez, 2008).
HZ2 was established as "Areas with a high capacity for self-neutralization". This zone differed because its average pH was 5.98, that is, slightly acid, its sulfur levels were lower, as well as its contents of all variables, such as Al, Ca, Na and OM, in contrast with higher values in P, Zn, Mn, Cu and Fe.
HZ3 was designated as "Sulfated areas with slightly acidic pH". Its main characteristics are high levels of S and an average pH of 6.11. Moreover, Ca levels are higher compared to those of HZ2, with a consequent decrease in Cu, Zn and Mn.
Finally, HZ4, "Areas with high acidity and sulfur levels" is characterized by having an average pH of 4.47, high sulfur levels, although lower compared to those of HZ1, high Al content and lower EC and ECEC values in relation to those of HZ1.

CONCLUSIONS
The homogeneous zones (HZs), considering the soil index of the two sPCs, were spatially related to the behavior of the chemical properties. The Moran index (MI) and the incorporation of a distance matrix determined the analysis of the spatial variability in the components generated. Because the soils in this area are called acid sulfate soils, the homogeneous zones were strongly influenced by S, as well as by properties associated with constant crop management, such as acidity (pH), Al contents, % OM and ECEC. In the case of the cultivable areas of HZs1 and HZs4, practices such as washing before liming, liming and green coverings are recommended.