REDUNDANT VARIABLES AND THE QUALITY OF MANAGEMENT ZONES

Precision agriculture (PA) allows farmers to identify and address variations in an agriculture field. Management zones (MZs) make PA more feasible and economical. The most important method for defining MZs is a fuzzy C-means algorithm, but selecting the variable for use as the input layer in the fuzzy process is problematic. BAZZI et al. (2013) used Moran’s bivariate spatial autocorrelation statistic to identify variables that are spatially correlated with yield while employing spatial autocorrelation. BAZZI et al. (2013) proposed that all redundant variables be eliminated and that the remaining variables would be considered appropriate on the MZ generation process. Thus, the objective of this work, a study case, was to test the hypothesis that redundant variables can harm the MZ delineation process. BAZZI This work was conducted in a 19.6-ha commercial field, and 15 MZ designs were generated by a fuzzy C-means algorithm and divided into two to five classes. Each design used a different composition of variables, including copper, silt, clay, and altitude. Some combinations of these variables produced superior MZs. None of the variable combinations produced statistically better performance that the MZ generated with no redundant variables. Thus, the other redundant variables can be discredited. The design with all variables did not provide a greater separation and organization of data among MZ classes and was not recommended.


INTRODUCTION
The spatial and temporal variability of soil attributes and weather conditions may affect soybean development (GUASTAFERRO et al., 2010).The spatial and temporal variations that affect production must be elucidated so that these variations can be considered in crop management.Precision agriculture (PA) is an agricultural management system based on spatial variations of soil and crops attributes in production fields.PA aims to optimize profitability, sustainability and environmental protection (MOLIN et al., 2010) by employing localized application of inputs to an agricultural field.Among studies seeking to evaluate the economic viability of PA, those that incorporate management zones (MZs) aim to analyze and provide recommendations on the division of production areas into smaller MZs that are treated differently.Such practices allow farmers to use the same systems used in conventional agriculture during crop management (RODRIGUES JUNIOR et al., 2011).According to ROUDIER et al. (2011), MZs simplify the representation of spatial variability in a cropping field.The management of such small and uniform regions (MZs) is considered by MOLIN & CASTRO (2008) one of the most challenging stages of PA.
Clustering methods are highly recommended for defining MZs (YAN et al., 2007;ILIADIS et al., 2010) and include the use of several attributes such as electrical conductivity, elevation, slope and soil texture, and nitrogen alone and in combination.Although any attribute may be related to crop yield, for DOERGE (2000), the ideal attribute is the correlation of predictable spatial information sources with yield.Clustering techniques for MZ generation include algorithms such as K-Means and Fuzzy C-Means (ILIADIS et al., 2010;VALENTE et al., 2012 andLI et al., 2013), which offer good results (VITHARANAet al., 2008;MORARI et al., 2009;MORAL et al., 2010;RODRIGUES JUNIOR et al., 2011;DAVATGAR et al., 2012;KWEON, 2012;BANSOD & PANDEY, 2013), which permit the automatic division of the studied field.In this approach, different data sources that are related to crop development factors can be used to generate MZs.
The most used unsupervised clustering algorithm is Fuzzy C-means (also known as Fuzzy Kmeans).This algorithm uses an iterative process to recalculate the cluster means and assign data points to clusters.Fuzzy C-means uses a weighting exponent to control the degree to which membership sharing occurs between classes (BEZDEK, 1981).This approach is important because it allows individuals to exhibit partial membership in each of a number of sets, thus enabling the study of continuous variability in natural phenomena (BURROUGH, 1989).Before a data cluster can be formed, an appropriate measure of similarity must be established, which is typically the normalized distance from an observation to the cluster mean in attribute space ( TOU & GONZALEZ, 1974;JOHNSON, 1998).One of the more frequently used measures of similarity is the Euclidean distance, which gives equal weight to all measured variables and is sensitive to correlated variables (BEZDEK, 1981).Geometrically, the Euclidean distance generates clusters with a spherical shape, which rarely occurs in a real soil system (ODEH et al., 1992).
The influence of a variable (attribute) on yield must be measured before it can be selected and considered in the MZ determination process.The criteria to select a variable must consider its required spatial autocorrelation and the spatial correlation of the variable with yield.BAZZI et al. ( 2013) used Moran's bivariate spatial autocorrelation statistic to propose a procedure for selecting variables as input data for the Fuzzy C-Means algorithm.
Thus, this trial aims to test the hypothesis that redundant variables can harm the MZ delineation process.

MATERIAL AND METHODS
This work was conducted in a 19.6-ha commercial field from April 2009 to January 2011.The experimental field was cropped with soybean.The field is located in southern Brazil, western Paraná, municipality of Cascavel (Figure 1).Its geographical coordinates are 24°57'19" S and 53°33'59" W, with an average altitude above sea level of 706 m.
Soil samples were collected at 55 different sites in the field (2.8 points ha -1 , Figure 1) to obtain data on soil resistance to penetration (SRP) and soil chemical properties at a depth of 0 to 0.20 m.SRP was measured with a penetrometer Falker SoloTrack PLG1020.The sampling elements of soil texture were collected from 45 of the 55 sites.Soil sampling was conducted with the aid of an auger at a depth of 0-0.2 m, and for each of the sampling points, eight sub-samples were collected within a radius of 3 m from the point determined by the sampling grid (adapted from WOLLENHAUPT et al., 1994).
The quantitative values obtained were potential hydrogen (pH), phosphorus (P), potassium (K), copper (Cu), zinc (Zn), iron (Fe), manganese (Mn), calcium (Ca), magnesium (Mg), carbon (C), aluminum (Al) and the contents of sand, clay and silt.The soybean yield data were collected from 130 sampling points manually collected and corrected at 12% water content.Each sample point was represented by the total mass collected on two lines in a path of one meter and, because the spacing was 0.45 m, each sample point was represented by an area of approximately 0.9 m 2 .Based on statistical descriptive analysis of the data, measures of central tendency (mean and median) and measures of dispersion (variance, standard deviation and coefficient of variation) were determined.To evaluate the data normality at 5% probability, Anderson-Darling and Kolmogorov-Smirnov tests were performed, and those with normality for at least one of the tests were considered normal.To evaluate the correlation among chemical and physical properties of soil, topography and soybean yield, Moran's bivariate spatial autocorrelation statistic I YZ (BONHAM et al., 1995), shown in [eq.( 1), was applied.The hypothesis of no spatial autocorrelation was tested at the 5% significance level.
where, w ij is an element of matrix W of spatial association, which measures the association among elements at positions i and j (calculated by w ij =(1/1+D ij )) so that D ij is the distance between i and j points; , where Y is the sampling mean of Y variable; Z j is the value of Z variable transformed at point j (j=1,…,n) to obtain a zero mean from the formula , where Z is the sample mean of Z variable; S is the sum of all degrees of spatial association, obtained from elements w ij for i ≠ j; 2 Y m is the sample variance of variable Y i , and 2 Z m is the sample variance of variable Z j .
The spatial correlation matrix was generated for the analyzed variables; in the matrix, the autocorrelation values of each variable are the elements on the main diagonal, whereas the offdiagonal elements are the cross-correlations among variables.The coefficients were tested at the 1% and 5% significance levels.BAZZI et al. (2013) proposed the following procedure for selecting variables as input data for the Fuzzy C-Means algorithm.1) Variables with spatial dependence and no significance at 95% significance are eliminated.2) From variables with spatial dependence, those with no correlation with soybean yield are removed.3) The descending order is calculated considering the degree of correlation with yield.4) The redundant variables are eliminated (those that correlate with each other), giving preference to those variables with lower correlation with yield.
In this work, the redundant variables were not discarded, but different combinations of the variables were built, and each combination was defined as a MZ design.The combinations were constructed to assess whether redundant variables can harm the MZ delineation process.
According to the sample data for the selected variables, values were predicted for nonsampled regions in the studied field by the inverse of square distance.The data for the selected variables were normalized (Equation 2, MIELKE & BERRY, 2007) to ensure that no variable had greater weight in the process of dividing the field into MZs.

Amplitude Median
where, P i is the pixel to be normalized and P n is the normalized pixel.
For each design, the field was divided into MZs according to an unsupervised classification fuzzy C-means algorithm (CANNON et al., 1986) using the software FuzMe (MINASNY & MCBRATNEY, 2002).The software parameters included a minimum of two and a maximum of five MZ classes, Euclidean distances, a fuzzy exponent of 1.30, a maximum number of 300 interactions, and stopping criterion or error ɛ = 0.0001.The Euclidean distance was adopted because the data were already normalized.The others constants were used as suggested.
The yields of the MZ designs were evaluated to determine the best way to divide the field.The following rating methods were used: 1) Variance Reduction: (PING; DOBERMANN, 2003;XIANG et al., 2007), (Equation 3) for yield was carried out, with expectations that the sum of the variances of the data of the MZs is lower than the total variance.
where, c is the number of management zones; i W is the proportion of area each management zone; i ZM V is the variance of data each management zone, and is the sample variance of data to field.
2) Analysis of variance: used to verify that there are significant mean differences in yield among MZ classes in each proposed design (XIANG et al., 2007;MOLIN & CASTRO, 2008;XIN-ZHONG et al., 2009).The HSD (Honest Significant Difference) Tukey test was applied at the 5% level of significance.In each internal MZ class, the yield was considered an independent variable and normally distributed.
3  2008;FU et al., 2010;GUASTAFERRO et al., 2010;ARNO et al., 2011).The cluster class (MZ) with the greatest differentiation was the one in which these two indices reached approximately the minimum in each design.
where, c is the number of clusters; n is the number of observations, and ij  is element ij of fuzzy matrix U.
4) The Kappa index has been used to measure the accuracy of thematic classifications (BAZZI et al., 2008;BASTIANI et al., 2012;DALPOSSO et al., 2012) and is recommended as an appropriate measure of accuracy for all elements from the error matrix.It was used to evaluate the spatial agreement among maps of more efficient MZs (Equation 6).r is the number of classes in the error matrix; x ii is the number of combinations of row i and column i; x i+ is the total observations of row i; x +i is the total observations of column i.
The classification degree for Kappa index was defined according to LANDIS & KOCH (1977) as follows: very strong, values between 0.81 to 1.00; strong, values from 0.61 to 0.80; moderate, values between 0.41 to 0.60; weak, values between 0.21 to 0.40; and no agreement, values between 0.00 and 0.20.

RESULTS AND DISCUSSION
The contents of C, pH, clay content and altitude (Table 1) were highly homogeneous (PIMENTEL GOMES & GARCIA, 2002), whereas soybean yield and the content attributes of Fe, Mn, Ca, silt, and soil resistance to penetration into layers at depths of 0 -0.20 m (SRP_0_20) and 0.10 -0.20 m (SRP_10_20) exhibited average homogeneity.The sample data for Mg content exhibited low homogeneity.Finally, Cu, Zn, P, Al, K, sand and soil resistance to penetration in the 0 -0.10 m layer (SRP_0_10) exhibited heterogeneity.
The soil texture (Table 1) was at least 60% clay, indicating a loamy soil.The field also exhibited significant compaction in the 0.10 -0.20 m layer and greater resistance to penetration, as indicated by an average value of 1.94 MPa and a maximum value of 2.40 MPa.The lowest soil compaction in the field was 0.31 MPa in the layer from 0 to 0.10 m.Most of the studied variables exhibited a normal distribution at 5% significance, including yield, Cu, Fe, Mn, C, pH, Ca, Mg, Al, K, SRP_0_10, SRP_0_20, SRP_10_20, altitude and silt (Table 1).The exceptions were the contents of Zn, P, clay and sand.Previous studies have also observed non-normality for clay (CORÁ et al., 2014), silt (ZUCOLOTO et al., 2011), P and Zn (LEÃO et al., 2010).No transformation was necessary for the variables that were not normal because these variables were not correlated with yield.
The average yield obtained in the study area was 3.29 t ha -1 .The yield map (Figure 2) was generated by ordinary kriging interpolation, based on the best semivariogram, obtained with the Gaussian model (nugget effect in 0215 and reach 397 m).For a soil with the same characteristics as the soil studied here, BAZZI et al. ( 2013) also observed autocorrelation among these attributes (Cu, clay and silt), but only clay and Cu were correlated spatially with soybean yield.
Based on the variables that exhibited both spatial autocorrelation spatial correlation with yield (copper, silt, altitude and clay, in order of decreasing importance), fifteen MZ designs (all possible combinations) were produced (Table 2).This table shows the calculated Moran's statistic.Because the values are not normalized, even small values of this statistic can be statistically significant (nonzero).The important outcome is whether the statistic is significant at 0.01 or 0.05.The generated maps represent the division of the field in two, three, four and five classes (Figures 4-7) after the design of the division into MZs with FuzME.Thus, the maps present the MZs based on the previous interpolated variables, and thus no new interpolation was necessary to generate the maps.Good correlation can be seen between the yield map (Figure 2) and the MZs.Each MZ class (Figures 4-7) represents a group of data based on the variables used in each design.Because the studied variables exhibited a significant spatial correlation with soybean yield, the formed sub-regions (classes) tend to represent different potentials of soybean yield within the total field.The use of different combinations of the selected variables to generate MZs influences how the field is divided.
The soybean yield data were analyzed using Tukey's test to confirm that the MZs classes had different production potentials.The MZs were divided into two classes with significantly different averages for some designs (Table 3).The MZs that belonged to different classes were 01-CuSiAlCl, 02-CuSiAl, 03-CuSi, 04-Cu, 05-CuSiCl, 06-CuAlCl, 07-SiAlCl, 08-CuAl, 09-CuCl and 12-ClAl.ARNO et al. (2011) also determined by ANOVA that differences in yield were evident only when the field was subdivided into two MZs.TABLE 3. Soybean yield averages in the MZ classes and comparison using Tukey's test at the 5% level of significance.As shown in Figure 8, none of the MZs exhibited RE values less than one.The following MZs had relative efficiencies of 1: 11-SiCl, which contains two classes; 13-Si, which contains 2, 3 and 4 classes; and 15-Cl, which contains two classes.Thus, these forms of division into MZs will not reduce the variance of the yield, in agreement with Tukey's test (Table 3), which revealed no significant differences among the yields.The other MZs with RE > 1 exhibited a variance reduction, suggesting that these divisions can be used as a source of recommendation and analysis.The division of the 09-CuCl MZs (Figure 9) into 5 classes resulted in the highest RE (1.16) among all the MZs.For division into four MZs, 04-Cu and 09-CuCl designs reduced the yield variance most effectively (RE = 1.13).Combinations 01-CuSiAlCl and 06-CuAlCl exhibited higher RE (1.12) compared to the other divisions in three classes.By contrast, for divisions into two classes, the 04-Cu design had the highest RE (1.10).In general, the RE values were highest in divisions with higher numbers of classes, although for 3, 4 and 5 classes, not all means differed from each other within each design (Table 3).
According to the calculated values of FPI and MPE (Figure 9), the designs in which fewer variables were used exhibited better separation and organization of data among the MZ classes, as indicated by the low values for these indices.Thus, designs such as 04-Cu, 13-Si, 14-Al and 15-Cl, in which only one variable was used to divide MZs, exhibited the lowest levels of FPI and MPE.However, only the 04-Cu design had a high RE, whereas the 13-Si, 14-Al and 15-Cl designs had, in general, the lowest RE values.
When the variable Cu was not included in the evaluation, the MZs were less effective (low RE, Figure 9) and did not all exhibit differences in means in each design (Tukey test), such as the following cases: 10-SiAl 11-SiCl, 12-ClAl, 13-Si, 14-Al and 15-Cl.An exception was the 07-SiAlCl design (Table 3).Six designs of MZs were compared using the Kappa index (Table 4).These MZs used the variable Cu and in general had higher RE values in divisions of 2 to 5 classes (01-CuSiAlCl, 02-CuSiAl, 03-CuSi, 04-Cu, 06-CuAlCl and 09-CuCl).The objective was to evaluate the degree of visual agreement among the maps of 2, 3, 4 and 5 classes.
For the MZs that were divided into two classes, there was greater visual agreement among the compared maps.Eight comparisons were in very strong agreement, whereas seven comparisons yielded strong agreement.There were lower levels of agreement among the visual maps for MZs divided into 3, 4 and 5 classes.However, the difference was greatest for MZs divided into 5 classes; two comparisons exhibited strong agreement, seven moderate and six weak.The 04-Cu experimental design displayed good results for RE, FPI and MPE, very strong agreement with the 03-CuSi and 09-CuCl designs and strong agreement with the 01-CuSiAlCl, 02-CuSiAl and 06-CuAlCl experimental designs for the division of MZs into 2 classes.For MZs divided into three classes, the Cu-04 design exhibited very strong agreement with the 03-CuSi design.For MZs divided into 4 and 5 classes, the 04-Cu design exhibited strong agreement with the 03-CuSi design.

CONCLUSIONS
Some combinations of these variables that were cross-correlated produced better MZs.None of the variable combinations produced a statistically better performance than the MZ generated using only copper, i.e., the design with only no redundant variables.Thus, the other redundant variables can be discredited.The design with all variables did not provide a greater separation and organization of data among MZ classes.

FIGURE 1 .
FIGURE 1. Experimental field and its sampling grid.All red and blue elements represent the 130-point grid ; red-only elements represent the 55-point grid.Municipality of Cascavel, Paraná, Brazil.

FIGURE 4 .
FIGURE 4. Designs of MZs divided into two classes.

FIGURE 7 .
FIGURE 7. Designs of MZs divided into five classes.

TABLE 1 .
Descriptive statistics of the data.

TABLE 2 .
Combination of variables to generate management zones.