MANAGEMENT ZONES DEFINITION USING SOIL CHEMICAL AND PHYSICAL ATTRIBUTES IN A SOYBEAN AREA

Several equipments and methodologies have been developed to make available precision agriculture, especially considering the high cost of its implantation and sampling. An interesting possibility is to define management zones aim at dividing producing areas in smaller management zones that could be treated differently, serving as a source of recommendation and analysis. Thus, this trial used physical and chemical properties of soil and yield aiming at the generation of management zones in order to identify whether they can be used as recommendation and analysis. Management zones were generated by the Fuzzy C-Means algorithm and their evaluation was performed by calculating the reduction of variance and performing means tests. The division of the area into two management zones was considered appropriate for the present distinct averages of most soil properties and yield. The used methodology allowed the generation of management zones that can serve as source of recommendation and soil analysis; despite the relative efficiency has shown a reduced variance for all attributes in divisions in the three subregions, the ANOVA did not show significative differences among the management zones.


INTRODUCTION
The continued growth of precision agriculture technology (PA) promoted the emergence of machines equipped with sensors and equipment aiming at reducing costs and improving performance of production processes as well as allowing more detailed analysis of both soil and plants (chlorophyll meter, penetrometer, electrical conductivity meter, yield monitors, meters of vegetation index, among others).
According to KHOSLA et al. (2008), who conducts a research related to PA viability, there are some unknowns from producers that ultimately hinder its use in a more expressive way.Among the questions asked by the researcher, there are included some as: 1) Is it possible to determine the spatial variability of attributes in a less costly way? 2) Is the PA economically feasible?These questions are the focus of much research worldwide, aiming not only at answering them, but developing techniques and procedures that make affirmative answers to these questions.Thus, new forms of research on how to apply this technology study on lower sampling costs associated with sample size and the definition of management zones (TAYLOR et al., 2007;ROUDIER et al., 2008;RODRIGUES JR. et al., 2011;SUSZEK et al., 2012) emerges for making practical and economic PA.
A research to define management zones aims at dividing producing areas in smaller management zones should be treated differently, serving as a source of recommendation and analysis.Yield data, physical and chemical data of soil, electrical conductivity, topography and combination of them are routinely used to define these sub-regions, in addition to the use of statistical modeling of such attributes (RODRIGUES JR. et al., 2011).
For PEDROSO et al. (2010), there are several techniques to define management zones proposed in literature, and XIANG et al. (2007) make a division among techniques, considering two approaches: 1) Using empirical methods, which use frequency distribution of yield to divide the plot, usually in three or four management zones (SUSZEK, et al., 2012); 2) By means of cluster analysis methods such as K-means and Fuzzy C-means.
Quite feasible and obtaining satisfactory results (SUSZEK, et al., 2012), empirical techniques (normalized and standatized yield) make use of crop yield data for definition of management zones and assume that this figure corresponds to crop response, i.e., it does not aim at identifying those attributes affecting the yield, this is a later stage of the process.They also consider that techniques become more accurate the larger it is the number of used crops.
Clustering methods are highly suggested for defining management zones (YAN et al., 2007;ILIADIS et al., 2010) and include the use of several attributes such as electrical conductivity, slope and soil texture, nitrogen, among others and simultaneous use of these attributes.Although there is a possibility of any attribute can be related to crop yield, for DOERGE (2000), the ideal one is using predictable spatial information sources correlated with yield.

MATERIAL AND METHODS
The study was carried out in a 19.8 ha commercial farming area, in Serranópolis do Iguaçu/PR, under the following geographical coordinates: 25º26'49''S and 54º04'59''W and 280 m average altitude (Figure 1).The area delimitation and sampling points' sites were obtained with a GPS receiver Trimble Geo Explorer XT 2005 and PathFinder software.Fifty eight sampling points were defined (2.93 points ha -1 , roughly equivalent to the 2.2 points ha -1 suggested by FRANZEN & PECK, 1995) using an irregular grid (Figure 1), 30 m minimum distance and 497 m maximum distance, where data were collected on soybean yield, soil chemical parameters (C, pH, H + Al, Ca, Mg, K, Cu, Zn, Fe, Mn) and textural (silt, sand, clay) properties, in addition to water content and soil penetration resistance at the layers 0 to 0.1 m (SPR010), 0.1-0.2m (SPR1020), 0.2-0.3m (SPR2030) depth and average soil compaction at 0 to 0.3m (SPRAVG030) depth.The irregular grid was created considering the planting line.
Each sample point was represented by the total mass collected on two lines in a path of one meter and, as the spacing was 0.45 m, each sample point was represented by an area of approximately 0.9 m 2 (TAVARES-SILVA et al., 2012).The harvested product was referred to the threshing process and water content determination; subsequently yield values were corrected for 13% water content.Soil samples were collected with a hoe and shovel.They were packed in bags and sent for chemical analysis.
Four measurements of soil penetration resistance were performed from 0 to 0.30 m layer at each sampling point, by a Falker PGL 1020 electronic meter of soil penetration resistance.
Data were statistically analyzed by means of exploratory analysis, by calculating the mean, median, minimum, maximum, standard deviation and coefficient of variation (CV).The CV was considered low (homoscedasticity) when CV ≤ 10%, medium when 10% < CV ≤ 20%, high when 20% < CV ≤ 30% and very high (heteroscedasticity) when CV > 30% (PIMENTEL & GARCIA, 2002).In order to evaluate the data normality at 5% probability, Anderson-Darling and Kolmogorov-Smirnov tests were performed, and those with normality for at least one of the tests were considered normal.
Usually, the definition of management unit aims at serving as a source of recommendation and analysis for several years (DOERGE, 2000).Therefore, stable and predictable sources of spatial information should be used, which are also correlated with yield.Non-stable sources of information can also be used to define management zones and adjust the amount of nutrients in a certain year.Thus, two approaches were used to generate management units: 1) statistics, where all the studied Source: Google Earth software attributes (not all that are considered stable) are used; and 2) agronomic, taking into account only the attributes of soil resistance to penetration and texture ones, considered as stable.
Aiming at evaluating the spatial correlation between the attributes and yield, the crosscorrelation was used (BONHAM et al., 1995, Equation 1).It make possible to verify which attributes influenced positively or negatively on soybean yield, and if a sample is spatially autocorrelated.
where YZ I -Degree of spatial association between Y and Z variables, ranging from -1 to 1, as it is followed: positive correlation 0 I YZ  and negative correlation 0 I YZ  ,ij W correspond to the ij element of spatial association matrix, calculated by , so that, j i D is the distance between i and j points; i Y -transformed Y value at point i.The transformation is to have a zero average, by the formula: , where Y is the sample average of Y, j Z variables - transformed Z variable at j point.Transformation occurs to have a zero average by the formula: , where Z the sample average of variable Z. Wcorresponds to the sum of spatial association degrees obtained by the matrix ij W , for j i  .After the YZ I calculation, the spatial correlation matrix was generated, which also presents the significance of the test in addition to the calculated index YZ I , considering data permutation, where several correlation tests are performed by permuting values of one of the variables to be compared.It remains the points' sites and the values of a position are randomly replaced by one from another point.This technique requires a high computational degree.Under the hypothesis H 0 , Y i random variables are independent and identically distributed and thus, all the permutations of Yi values among areas are equally likely.Therefore, the p value of the test is given by Equation 2.
where n is the number of permutations (default is 999 permutations) I -corresponds to the index calculated by the equation (1).
The null hypothesis is rejected with significance α if p-value < α, i.e., H 0 is rejected with 0.05 of significance if p-value < 0.05.
For both studied situations (statistical and agronomic), matrices of spatial correlation were generated.In order to select the layers that will serve as input data for the Fuzzy C-Means algorithm, the following procedure was used: 1. Elimination of layers with spatial dependence and not significant at 95% significance; 2. From layers with spatial dependence, those with no correlation with soybean yield were removed.
3. The descending order was calculated, considering the degree of correlation with yield; 4. The redundant layers were eliminated (those which correlated with each other), preferring those with lower correlation with yield. (1) (2) In order to draw the thematic maps, the sample data were interpolated and selected in the analysis of spatial correlation as self-correlated and influence on soybean yield.The surface was represented by 5 x 5 m polygons, representing 25 m 2 .It was used the interpolation method of inverse distance, with an interpolation window of 10 neighbors.
The generation of management zones was performed by Fuzzy C-Means clustering technique (BEZDEK, 1981), which considers a data set where k x corresponds to a feature vector , and P R is the p-dimensional space.It is aimed at finding a fuzzy pseudo partition which corresponds to a family of fuzzy sets of X c, which represents the data structure as best as possible and is denoted by and n represents the number of elements (lines) matrix X.
The algorithm is guided with parameters on the number of desired clusters (C), a distance measure that defines the allowable distance among points and centroids ( ) and error was used as a stopping criterion ( 4 10    ).The position of each centroid is calculated considering the distance passed initially by the parameter (preferably to be m = 1.3).For each C, calculated by (Equation 3) for (t) P partition, being n} {1,.., t  the iteration.The vector i v corresponds to the clustering center and i A is the data weighted average.The weight of k x is the m-th power of its pertinence degree to the i

Calculation of the pertinence degree of k
x to the class i A (Equation 4) is performed for each X x k  and for all c} {1,..
represents the distance between k x and i v .
As stopping criterion, (t) P and 1) (t P  are compared, and if , the algorithm is terminated and the classification is performed by considering the pertinence generated in the last iteration. Since we worked on several data layers, data were normalized (Equation 5) (MIELKE & BERRY, 2007), whereas there can have different attributes measured and that influence on the clustering process.

Range Median
where P iNorm is the normalized pixel value; P i is the pixel value i to be normalized. (4) (5) Eight thematic maps were generated, classified according to the division in management zones, considering respectively 2, 3, 4 and 5 sub-regions in the plot.
In order to carry out an evaluation, management zones were used for physical and chemical data and yield of soybean aiming at identifying if the generated zones are significantly different for each attribute.Two parameters were used in this evaluation: 1. Relative Efficiency (RE): evaluates whether there was reduction of total variance of the attribute evaluated with the division into management zones (Equation 6); it was considered an adequate clustering when RE > 1 so, the more efficient the higher RE; 2. Variance analysis: evaluates whether the management zones are associated with statistically different attributes, assuming that data have normal distribution and are internally independent in each sub-region.

RESULTS AND DISCUSSION
The attributes P, K and Zn were classified as heteroscedastic (CV > 30%) and Fe, soil penetration resistance 0 to 10 cm (SPR_0_10), and sand as high CV.For other attributes, only pH could be classified as homoscedastic.
Ca, C, Cu, H+Al, Mg, Mn, yield, soil penetration resistance from 10 to 20 cm (SPR1020), from 20 to 30 cm (SPR2030), and average soil penetration resistance (SPRAVG030) (MPa), moisture, silt, and clay were classified with moderate CV.Cu, P, K and Zn presented some positive asymmetry, silt negative asymmetry and other attributes, symmetrical distribution.Only C and average SPR showed kurtosis classified as platikurtic, and Cu, P, K and Zn were classified as leptokurtic (Table 1).Only P attribute did not show normal distribution.Considering the classification proposed by COSTA & OLIVEIRA (2001) for soil chemistry, the attributes Ca, Cu, Mg and Mn presented high availability throughout the area (Table 2).C and Fe presented low availability at a sampling point, but in most sampling points (93% and 96.5%, respectively) could be classified as medium level of availability.Zn also presented with medium level in 84.2% of samples.

Selected Layers
Statistic approach: Using the spatial correlation matrix, Mg layer was selected to generate management zones, considering that: 1. Withdrew layers: Ca, C, Fe, P, H_AL, Mn, SPR010, pH, K, due to non-significance of the spatial dependence.(Result matrix Figure 2); 2. Withdrew layers: average SPR, SPR1020, SPR2030, and silt, due to lack of correlation of them with soybean yield; 3.After ordering the remaining layers (Cu, Mg, moisture, Zn, clay, sand) in ascending order by degree of correlation with yield, redundant layers were eliminated.Agronomic approach: Using the spatial correlation matrix (Figure 3), sand layer was selected for generation of management zones, considering that: 1. Withdrew layers: SPR010, due to non-significance of the spatial dependence; 2. Withdrew layers: SPRAVG030, SPR1020, SPR2030, and silt, due to their lack of correlation with soybean yield; 3.After ordering the remaining layers (clay and sand) in ascending order by degree of correlation with yield, redundant layers were eliminated.Four thematic maps were generated for each approach, for divisions in 2, 3, 4 and 5 management zones, respectively (Figures 4 and 5 For all performed divisions, relative efficiency (RE) was calculated (Tables 3 and 4), which showed some reduction of the total variance and that the division into management zone was satisfactory (RE > 1) only in statistic approach.Despite this fact, when comparing yield data using FIGURE 3. Spatial correlation matrix to agronomic approach.
analysis of variance, only the divisions in 2 and 3 sub-regions were considered different on average, indicating that each class has different productive potential.Thus, divisions in 2 and 3 zones in statistic approach were considered significant and maintained to evaluate whether they can serve as source of recommendation and analysis of other attributes.
The agronomic approach did not show good results for the studied divisions as RE < 1, while ANOVA showed that the data sets had equal averages for almost all cases.For the division into two management zones (Table 5), there was no reduction in variance for attributes P, K, and all levels of compaction, i.e., when the area was divided into two sub-regions, it was not correct to perform the analysis or the recommendation.For these attributes, only K showed significantly, but negative, correlation with yield, reinforcing the conclusion obtained by the RE statistics.As for Fe attribute, despite having RE = 1.03, analysis of variance results showed equality among the management zones.Silt, despite having ER > 1, showed the same averages according to the analysis of variance.-Total variance, Means followed by same letter do not differ among themselves by Tukey test at 5% probability of error; RE -Relative Efficiency; soil penetration resistance from 0 to 10 cm (SPR010), from 10 to 20 cm (SPR1020), from 20 to 30 cm (SPR2030), and average soil penetration resistance (SPRAVG030).
For other studied attributes, they were all different by analysis of variance and with R > 1, indicating that the division can be taken as a source of recommendation and analysis.
When three management zones were taken into account (Table 6), RE > 1 was verified for all the studied attributes, except for silt, i.e., division was satisfactory for all other attributes with reduced variance.Despite this fact, data sets were different only for the attributes Ca, Mg, Mn, pH and moisture, as well as for silt.Thus, it is concluded that the division into three management zones did not show good results for analysis and recommendation, since most of the attributes presented equality among management zones.SD -Standard deviation, CV -Coefficient of Variation, N -Number of sample elements; Total Var.-Total variance, Means followed by same letter do not differ among themselves by Tukey test at 5% probability of error; RE -Relative Efficiency; soil penetration resistance from 0 to 10 cm (SPR010), from 10 to 20 cm (SPR1020), from 20 to 30 cm (SPR2030), and average soil penetration resistance (SPRAVG030).soil penetration resistance from 0 to 10 cm (SPR010), from 10 to 20 cm (SPR1020), from 20 to 30 cm (SPR2030), and average soil penetration resistance (SPRAVG030).SD -Standard deviation, CV -Coefficient of Variation, N -Number of sample elements; Total var.-Total variance.Means followed by same letter do not differ among themselves by Tukey test at 5% probability of error; RE -Relative Efficiency; soil penetration resistance from 0 to 10 cm (SPR010), from 10 to 20 cm (SPR1020), from 20 to 30 cm (SPR2030), and average soil penetration resistance (SPRAVG030).

CONCLUSIONS
• The used methodology allowed the generation of management zones that can serve as source of recommendation and soil analysis; • The subdivision into two management zones was considered the best by allowing zones with a greater number of different attributes among classes; • The program "Software for Definition of Management Zones-SDUM" enabled a quick and easy creation of management zones, despite the complexity of the applied methodology; • Despite the relative efficiency showing reduced variance for all attributes in divisions in the three sub-regions, ANOVA showed no consistent results.
• The agronomic approach did not show good results for the division of management units, so, it was not possible to identify different productive potential sites of the studied area.
FIGURE 1. Visualization of the experimental area and experimental grid.
variance of the Y variable.
to the sample variance of Z variable.
of the variance in yield of each management unit, calculated separately, considering the proportion of the total area representing the management zone; 2 AREA S -variance of yield related to the area.

FIGURE 5 .
FIGURE 4. Division of the area into management zones using Fuzzy logic and Mg layer, selected by means of cross spatial correlation matrix to statistic approach.

TABLE 2 .
Levels of interpretation of soil chemical properties with the percentage of the studied sampling points.

TABLE 3 .
Descriptive statistics and relative efficiency of yield data (t ha -1 ) separated by management zone generated by the Fuzzy C-Means technique for statistic approach.Standard deviation, CV -Coefficient of Variation, N -Number of sample elements; Total Var.-Total variance, obtained by the representation of each management unit and the variance found.Means followed by same letter do not differ among themselves by Tukey test at 5% probability of error.

TABLE 4 .
Descriptive statistics and relative efficiency of yield data (t ha -1 ) separated by management zone generated by the Fuzzy C-Means technique for agronomic approach.Standard deviation, CV -Coefficient of Variation, N -Number of sample elements; Total Var.-Total variance, obtained by the representation of each management unit and the variance found.Means followed by same letter do not differ among themselves by Tukey test at 5% probability of error.

TABLE 5 .
Descriptive statistics of the data separated into two management zones.Standard deviation, CV -Coefficient of Variation, N -Number of sample elements; Total Var.

TABLE 6 .
Descriptive statistics of the data separated into three management zones.Standard deviation, CV -Coefficient of Variation, N -Number of sample elements; Total var.-Total variance.Means followed by same letter do not differ among themselves by Tukey test at 5% probability of error; RE -Relative Efficiency;