MANAGEMENT ZONES USING FUZZY CLUSTERING BASED ON SPATIAL-TEMPORAL VARIABILITY OF SOIL AND CORN YIELD

Clustering soil and crop data can be used as a basis for the definition of management zones because the data are grouped into clusters based on the similar interaction of these variables. Therefore, the objective of this study was to identify management zones using fuzzy c-means clustering analysis based on the spatial and temporal variability of soil attributes and corn yield. The study site (18 by 250-m in size) was located in Jaboticabal, São Paulo/Brazil. Corn yield was measured in one hundred 4.5 by 10-m cells along four parallel transects (25 observations per transect) over five growing seasons between 2001 and 2010. Soil chemical and physical attributes were measured. SAS procedure MIXED was used to identify which variable(s) most influenced the spatial variability of corn yield over the five study years. Basis saturation (BS) was the variable that better related to corn yield, thus, semivariograms models were fitted for BS and corn yield and then, data values were krigged. Management Zone Analyst software was used to carry out the fuzzy cmeans clustering algorithm. The optimum number of management zones can change over time, as well as the degree of agreement between the BS and corn yield management zone maps. Thus, it is very important take into account the temporal variability of crop yield and soil attributes to delineate management zones accurately.


INTRODUCTION
A management zone in a crop field expresses a sub-region that has a relatively homogeneous combination of yield-limiting factors (VRINDTS et al., 2005;LI et al., 2007).Within-field management zones have many uses, for example, as an alternative for grid soil sampling and nutrient maps of fertilizer application variable rate or even to relate yield to soil parameters for crop-modeling evaluation (LI et al., 2007).
Several types of information have been used to delineate management zones, such as: terrain attributes, electrical conductivity, soil resistance to penetration, chemical soil attributes, crop indices like NDVI (Normalized Difference Vegetation Index ), and crop yield maps (FRAISSE et al., 2001;JIANG et al., 2011;ROSALEN et al., 2011;DELALIBERA et al., 2012).Among these types of information, some studies in Brazil (MOLIN, 2002;AMADO et al., 2007) have defined management zones based on yield maps from data collected over time.This is due to yield mapping being a simple, inexpensive tool for monitoring crop yield at fine spatial resolutions ; in addition, it provides the best information for time and spatial variability.Some studies (GUEDES FILHO et al., 2010;DIACONO et al., 2012) have correlated temporal crop yield maps with soil attributes maps to define management zones correctly.These studies have considered only the temporal variability of crop yield.However, the dominance of factors that influence crop yield variability can change from year to year because of seasonal weather variation.Therefore, temporal variability or stability of both crop yield and soil attributes should be taken into account when delineating management zones (LI et al., 2007).
One important question arises when considering field management by zones, which is, how many unique zones should a field be divided.This is because if the user defines a different number of classes, the results of the classification will be different and consequently, different management zones will be produced (FRAISSE et al., 2001).In order to overcome this problem, cluster analysis procedure, which groups similar individuals into distinct classes through an iterative process called clustering, can be used (FRIDGEN et al., 2004) and the number of management zones can be chosen based on the fuzziness performance index (ODEH et al., 1992) and/or normalized classification entropy (BEZDEK, 1981).
Of the different cluster analyses, fuzzy c-means algorithm has been widely used to delineate management zones (FRAISSE et al., 2001;LI et al., 2007;JIANG et al., 2011, VALENTE et al., 2012).In addition, software developed by FRIDGEN et al. (2004) on the basis of a fuzzy c-means clustering algorithm called Management Zone Analyst (MZA) can be used easily by researchers and farmers, who need only to input data into the software to obtain rapid results (JIANG et al., 2011).MZA uses the clustering of soil and crop data as a basis for the definition of management zones by grouping the data into clusters based on the similarity of interaction between soil and crop data (FRAISSE et al., 2001).
Therefore, the objectives of this study were 1) to identify management zones using fuzzy cmeans clustering analysis based on the spatial and temporal variability of soil attributes and corn yield, and 2) to test the reliability of using MZA software to delineate management zones.

Site description
The experiment was conducted at Jaboticabal, São Paulo State, Brazil (21º 14' 05'' S, 48º 17' 09''W, 613 m asl).Climatologically, the area belongs to the tropical/megathermal zone or Köppen Aw (a tropical climate with dry winters and average temperatures of the coldest month above 18 ºC).The mean annual rainfall  is 1417 mm, peaking in the period from October-March and a relatively dry period from April-September.The soil of the experimental area is a clayey Distroferric Red Oxisol.
The experimental area was managed in a corn-fallow rotation under no-tillage for 12 years.Before corn seeding, weeds were eliminated with non-selective herbicides.Liming was carried out before starting the experiment in order to obtain 60% soil base saturation.

Corn yield, soil sampling, and climatic data
The size of the experimental area was 18 by 250m with the longer dimension oriented in the East-West direction.Each of the 100 sampling cells had a dimension of 10 by 4.5 m and were arranged in a 25 by 4 m grid.Therefore, each sampling cell encompassed an area of 45 m 2 and consisted of five 10 m long crop rows (0.9 m), in which the geo-referenced points were the center.The experimental scheme is depicted in Figure 1.Corn (Zea mays -triple-hybrid Syngenta Master) was planted at 65,000 plants ha -1 with 0.9 m row spacing in early December between the 2001 and 2010 growing seasons, although the data were collected only in the 2001/2002, 2002/2003, 2007/2008, 2008/2009 and 2009/2010 growing seasons.The starter fertilization consisted of 30kg of N, 70 kg of P 2 O 5 and 50kg of K 2 O ha -1 .Nitrogen fertilizer (urea) was applied at 100 kg N ha -1 when plants had four to six leaves totally developed.Corn was harvested about 150 days after planting with a 1-row plot combine that deposited the grain into a burlap bag.Grain weights were obtained for each cell with a manual weighing scale in the field.The grain for each cell was sub-sampled for moisture and grain yields were determined at 13% gravimetric moisture.
Each year, five soil sub-samples were collected within each plot using a Dutch auger (0.1m depth) and were composited.One of the soil sub-samples was collected in the middle of the plot and the other four samples were collected 2 m apart from the middle in all four cardinal directions from the centroid.The support of soil measurements was assumed similar to the one of crop yield (45 m 2 ).Each soil composite sample was analyzed for particle size (pipette method) (GEE & DANI, 2002), pH (1:1 soil/water mixture), organic matter (OM) (Walkley-Black method), P (ion-exchange resin), K + , Ca 2+ and Mg 2+ (1 M NH 4 O Ac. extractable at buffered at pH 7) according to PAGE et al. (1982).From the analytical determinations, cation exchange capacity (CEC = K + + Ca 2+ + Mg 2+ + H + Al +3 ) and percentage of soil base saturation (BS = (K + + Ca 2+ + Mg 2+ / CEC) x 100) were calculated.Soil texture was analyzed once, since there are no significant changes in a short period.
Monthly cumulative rainfall, growing degree-days (base temperature of 10°C), average temperatures, relative humidity and number of days with rainfall were recorded by a weather station located 30 m from the experimental site from December 2001 to April 2010 (Figure 2a and b).

Descriptive statistics analyses and mixed model analysis
Descriptive statistical analyses (mean, standard deviation, minimum, maximum, coefficient o f skewness, and coefficient of kurtosis) were calculated.In order to test the hypothesis of normality, the SHAPIRO and WILK (1965) test was conducted.
The heteroscedastic spatial-temporal autocorrelation model was used to identify whic h variable(s) most influenced the spatial variability of corn yield over the five study years.This procedure was chosen because it takes into account the spatial and temporal variability of crop yields and soil attributes, as well as the heteroscedasticity of variables and because it shows better results in determining the yield-limiting factors than traditional regression analyses, using ordinary least square.A detailed description of this analysis for the present study area was given in RODRIGUES et al. (2013).

Data processing
Semivariogram models were fitted to soil attributes (chosen with mixed model) and corn yield data for all study years.The semivariograms were validated with cross-validation analysis, and isotropy in all adjusted models observed.The soil attributes and corn yield data were interpolated using ordinary kriging in a raster data format of 1 m pixels.All these analyses were computed using ArcGIS (Redlands, CA) with the Geostatistical Analyst Extension and afterwards data files were imported as a comma delimited text file with the first row as labels for the columns to MZA software.

Data analyses using MZA software
Fuzzy clustering algorithms were used to classify data into homogenous zones by MZA software (FRIDGEN et al., 2004).The options settings chosen were similar to those of Euclidean; fuzziness exponent = 1.3, maximum number of iterations = 300, convergence criterion = 0.0001, minimum number of zones = 2, and maximum number of zones = 6.
A number of validity statistics were used to determine the best combinations of fuzziness index and number of clusters, as well as the overall clustering performance.The fuzziness performance index (FPI) is a measure of the degree to which different classes share membership (fuzziness) and values are constrained between 0 and 1 (ODEH et al., 1992).The normalized classification entropy (NCE) is used when deciding how many clusters are most appropriate for creating management zones, which can help the user obtain management zones simply and quickly (BEZDEK, 1981).By minimizing the value for NCE and FPI, the optimum number of clusters can be found.

Analytical method for comparing the soil manage ment zone maps with the corn yield manage ment zone maps
In order to compare the relationship between soil attributes maps and corn yield in ever y study year, Kappa index coefficient was used (JENSEN, 1996).Kappa coefficient of agreement, a common index used for accuracy assessment in remote sensing, measures pair-wise agreement between the margin pixel in a cross-classification contingency table, then corrects for chance agreement (KITCHEN et al., 2005).Thus, cross-tabulation between the soil management zone map and its correspondent corn yield management zone map, with the same number of classes, was performed in order to obtain a confusing matrix of data (VALENTE et al., 2012).After that, the kappa coefficient was obtained as described by JENSEN (1996).

Descriptive statistics of soil attributes
All soil fertility data was classified as low, medium, or high according to criteria determined by RAIJ et al. (1997) and 2009/2010.The levels of soil K + (1.6-3.0 mmolc dm -3 ) as well as the soil P levels were medium (16-40 mg dm -3 ) in all the study years.The levels of soil Ca 2+ were high (greater than 7 mmolc dm -3 ) in all the study years, whereas soil Mg 2+ were high (greater than 8 mmolc dm -3 ) in the 2001/2002, and 2002/2003, and medium (5-8 mmolc dm -3 ) in the 2007/2008, 2008/2009, and 2009/2010.Medium values of pH (5.1-5.5) were observed in the 2001/2002 and 2002/2003 and low (4.4-5.0) in the 2007/2008, 2008/2009, and 2009/2010.Soil base saturation values were greater than 50% in the first two years, indicating eutrophic conditions.Medium soil organic matter contents (OM) were observed in all the years.Most variables showed large variability based on their range of variation (Table 1).In general, the so il fertility variables were normally distributed and coefficients of asymmetry and kurtosis varied over years.However, this did not prevent geostatistics analyses, since normality is not a requirement.The clay (332 g kg - 1 , standard deviation = 19 g kg -1 ) and sand (623 g kg -1 , standard deviation = 17 g kg -1 ) content of the surface samples did not vary substantially throughout the experimental area and were considered normally distributed according to the Shapiro-Wilk test.

Descriptive statistics of crop yield and mixed model analysis
Corn yields ranged from 6.9 to 7.8 Mg ha -1 (Table 2), and showed normal data distribution in all the years, except in the 2008/2009 growing season, as indicated by the Shapiro-Wilk test (Table 2).However, their coefficients of skewness and kurtosis were close to zero, indicating no substantial departure from normality so the yield was then assumed normal.The preliminary statistic analyses with mixed models indicated that pH was the variable that better related to corn yield spatially at all study years.High correlation between base saturation (BS) and soil pH is often observed in tropical soils (CATANI & GALLO, 1955) and is also found in our study with Pearson correlation coefficient (r) greater than 0.97 (data not shown).Therefore, we chose BS instead of soil pH because it is a more practical attribute for agronomic management purposes.A complete

Geostatistics analyses of soil attributes and corn yield
Spatial dependence was observed for corn yield for all study years (Table 3).The spherica l model was used for corn yield in all five years (Table 3).Also spatial dependence was observed for BS in all study years and the Gaussian model was used for these variables (Table 3).In order to match with the corn yield maps, five years of BS data were used for clustering analyses (Figure 3).

Fuzzy c-means clustering analyses
The minimum least membership sharing (FPI) and amount of organiza tion (NCE) were similar in that the minimum value was obtained in the same zone for BS and corn yield at all years, except for BS in the 2009/2010 growing season and for corn yield in the 2007/2008 growing season, which were dissimilar (Figure 4).The optimal number of clusters for each comp uted index is when the index is at the minimum, representing the FPI and NCE because of the clustering process.The optimum number of clustering zones, considering both indices, varied from 2 to 4. These results indicate that the optimum number of manageme nt zones can change over time, thus the choice of how many management zones should be based in the temporal data set.FRAISSE et al. (2001) also pointed out that the optimal number of zones may vary from year to year and is mainly a function of the weather and the crop planted.Although in general, the FPI and NCE index minimum values indicated two management zones, it was observed, based on zone maps and prior knowledge in this area, that only two management zones would not be sufficient to verify all corn yield variab ility.Therefore, three management zones were chosen instead of two.Moreover, it appears that there were little difference in the index values between the number of two and three zones, but a much greater difference in the index values between the number of three zones and other zones (Figure 4), so the choice of three management zones can be scientifically accepted.year, the kappa coefficient showed no agreement between the BS and corn yield management zone maps.This is likely to be because it was the first year after liming and therefore soil base saturation was not a yield-limiting factor and it did not determine yield spatial variability that year.The kappa coefficient was classified as having poor agreement in 2008/2009, and fair agreement in 2007/2008and 2009/2010, and good agreement in 2002/2003. KITCHEN et al. (2005)), who found that kappa coefficient values varied from 0.07 to 0.34 between some soil attributes (e.g., Electrical conductivity) and yield crop management zone maps in two crop fields in the USA, obtained similar results.The kappa coefficient between BS and corn yield management maps varied from -0.03 to 0.66 (Table 4).These findings demonstrate that the degree of agreement between the BS and corn yield management zone maps can change over time, which is in agreement with LI et al. (2007), who pointed out that the defining of management zones should rely on spatial information that is stable or predictable over time and is related to crop yield.Therefore, a temporal data set of crop yield and yield-limiting factors should be used to delineate management zones.
Based on the kappa coefficient, BS can be used to delineate management zones in the present study area, since the values were classified from having fair to good agreement in three out of the five study years (0.31; 0.4 and 0.66) (Table 4).This is expected because the main factor that impacts crop yields in most Brazilian agricultural fields is soil acidity (DALCHIAVON et al., 2011;NOGARA NETO et al., 2011).
The results of the kappa coefficient (Table 4) did not show any direct correlation to the climatic variables assessed.However, this dissimilarity between BS and corn yield zone maps was most likely due to temporal variability and many other variables such as germination rates, weeds, insects, diseases and climate, which are related to crop yield (RODRIGUES et al., 2012), but which were not assessed in this study.
For farmers to adopt site-specific management, the development of management zones must be simple, functional, and economically feasible (LI et al., 2007), thus, we proposed to divide the study field into three zones based on the zone maps obtained from the clustering analyses using MZA software for the five study years -depicted in Figure 5.
Zone 1 is the location in the study area with higher BS and corn yield values; zone 2 is the location with medium BS and corn yield values; and zone 3 is the location with lower BS and corn yield values.This zone delineation allows the use of site-specific management in these areas such as an alternative to grid soil sampling (LI et al., 2007) and developing lime requirement recommendation maps for variable rate fertilizer application.When compared to the size of commercial fields, the area in the present study may be considered small, however the fuzzy cmeans clustering analysis using a temporal data set showed potential to delineate management zones, which may be spread out over larger crop fields.In this study crop yield was related only to soil acidity (i.e., BS), however the fuzzy c-means technique can cluster many yield-limiting factors (i.e., soil attributes) at the same time as shown by FRIDGEN et al. (2004) andTAGARAKIS et al. (2013).Management zone analytics (MZA) software proved to be an easy and feasible tool to delineate management zones based on fuzzy c-means clustering as found by FRIDGEN et al. (2004), LI et al. (2007), and JIANG et al. (2011).

CONCLUSIONS
It was possible to delineate using fuzzy c-means cluster analysis based on the spatial and temporal variability of soil attributes and corn yield in a crop field in Brazil.Soil acidity, as assessed by soil base saturation, was the soil effect that most influenced corn yield over time in this study.It is very important to take into account the temporal variability of crop yield and soil attributes to delineate management zones accurately as these factors can change over time.Management zone analytics was a reliable and simple tool to delineate management zones and would be useful for agronomists and farmers.

FIGURE 1 .
FIGURE 1. Experimental field divided into 100 cells for corn yield and represented by their central points for sampling soil attributes and corn yield in an Distroferric Red Oxisol under no-tillage system.
FIGURE 2. Average monthly rainfall values for the period December-April of the studied years and 30 years average (a); Days with rainfall for the period December-April of the studied years (b).

FIGURE 3 .
FIGURE 3. Management zone maps using fuzzy c-means clustering analyses of base saturation (BS) and corn yield of five growing seasons in an Distroferric Red Oxisol under notillage system.

FIGURE 4 .
FIGURE 4. Fuzziness performance index (FPI) and normalized classification entropy (NCE) as calculated by Management Zone Analyst for base saturation and corn yield of five growing in an Distroferric Red Oxisol under no-tillage system.

FIGURE 5 .
FIGURE 5. Management zones based on the fuzzy c-means clustering analyses of soil base saturation and corn yield of five growing seasons in an Distroferric Red Oxisol under no-tillage system.*Zone 1: h igh yield potential; Zone 2: mediu m y ield potential; Zone 3: lo w yield potential.

TABLE 1 .
Descriptive statistics of soil chemical attributes from 0 to 0.1 m depth in an Distroferric Red Oxisol under no-tillage system over five growing seasons.

TABLE 2 .
Descriptive statistics of corn yield (Mg ha -1 ) of five growing season in an Distroferric Red Oxisol under no-tillage system.

TABLE 3 .
Variogram model parameters of corn yield and base saturation of five growing seasons in an Distroferric Red Oxisol under no-tillage system.

TABLE 4 .
Degree of agreement (kappa coefficient)between the base saturation (BS) and corn yield management zone maps of five growing seasons in an Distroferric Red Oxisol under notillage system.