SOYBEAN YIELD MAPS USING REGULAR AND OPTIMIZED SAMPLE WITH DIFFERENT CONFIGURATIONS BY SIMULATED ANNEALING

This study aimed to compare thematic maps of soybean yield for different sampling grids, using geostatistical methods (semivariance function and kriging). The analysis was performed with soybean yield data in t ha in a commercial area with regular grids with distances between points of 25x25 m, 50x50 m, 75x75 m, 100x100 m, with 549, 188, 66 and 44 sampling points respectively; and data obtained by yield monitors. Optimized sampling schemes were also generated with the algorithm called Simulated Annealing, using maximization of the overall accuracy measure as a criterion for optimization. The results showed that sample size and sample density influenced the description of the spatial distribution of soybean yield. When the sample size was increased, there was an increased efficiency of thematic maps used to describe the spatial variability of soybean yield (higher values of accuracy indices and lower values for the sum of squared estimation error). In addition, more accurate maps were obtained, especially considering the optimized sample configurations with 188 and 549 sample points.


INTRODUCTION
With the advance of mechanized technologies and the growth of worldwide agricultural production, many researchers have been investigating the production process searching for sustainable farming practices that can radically improve agricultural production, for instance by reducing the use of agricultural inputs.In particular, soybean production is of extreme economic importance to Brazil.In the 2010/11 crop-year, global soybean yield was 263.7 million tonnes, with a planted area of 103.5 million hectares, where Brazil is the second largest soybean producer (USDA, 2011).In the same year, Brazil had an average yield of 3.125 t ha -1 over an area greater than 24 million hectares, totaling a yield of 75 million tonnes (CONAB, 2011).
Research on the spatial dependence structure of agricultural georeferenced variables, such as chemical and physical soil properties and crop yield, is an analysis tool which provides information to support a decision in favour of better management of production areas (BORSSOI et al., 2011;GREGO et al., 2011;URIBE-OPAZO et al., 2012;NESI et al., 2013;ASSUMPÇÃO et al., 2014;BERNARDI et al., 2014).This can be accomplished by means of geostatistical techniques that retrieve from a set of sample elements, information about the spatial variation of the phenomenon in the whole area through the construction of thematic maps of variability (DIGGLE & RIBEIRO JUNIOR, 2007).
The number of sample elements available to conduct research on the spatial dependent variables and their respective sample configuration, may affect the description of the spatial dependence structure, or more, the spatial estimates of non-sampled values obtained by the kriging interpolation technique, and consequently the reliability of the results shown by the thematic map (URIBE- OPAZO et al., 2007;COELHO et al., 2009;ODA-SOUZA et al., 2010;GUEDES et al., 2011;RIFFEL et al., 2012;SOUZA et al., 2014).
When financial resources are limited, the definition of the shape and size of sampling strategy to be used to study the spatial dependence structure are crucial, in an effort to both minimize operating costs and maximize the results of spatial prediction.Thus, it is necessary to study how the sampling strategy used in the study area affects the estimation of the parameters of the spatial model that describes the spatial dependence structure of the georeferenced variable, the spatial estimation of this variable in non-observed locations and, as a consequence, the thematic maps to be generated for such estimation (SPOCK & HUSSAIN, 2012).
In this context, the aim of this study was to evaluate the influence of different sampling grids on the description of the spatial dependence structure of soybean yield.The configurations used in this study were regular grids and optimized configurations by the optimization method called Simulated Annealing, both with different sample densities.

MATERIAL AND METHODS
The soybean productivity (t ha -1 ) data used in this study were collected in a commercial area of 57.16 ha, located in Cascavel city, in western region of Paraná State, Brazil.The area has approximate geographic coordinates of 24.95° S and 53.57° W, with an average elevation of 650 m above sea level (Figure 1).The soil of the region is classified as Oxisols with clayey texture and deep soils with good water storage capacity, porosity and permeability.The climate in the region is very wet and classified as mesothermal, Cfa (Köeppen), with average annual temperature of 21ºC (IAPAR, 2007).
Regular sampling grids used in this study measured 25x25 m, with 549 sampling points, 50x50 m with 188 sampling points, 75x75 m with 66 sampling points, and 100x100 m with 44 sampling points (Figure 2), obtained from a database generated by harvest monitors, totaling 7588 points (Figure 1).Therefore, this database with a large number of sample points represents a discretization of the spatial distribution of the yield in the study area.First of all, analysis was made on the data sets of soybean yield in t ha -1 , obtained in different sampling grids and by the harvest monitor.They underwent descriptive statistics (measures of location, dispersion and shape).Then models that represent the spatial dependence structure were fitted by maximum likelihood method (ML), which determines the estimated nugget effect and range   a of the model, in order to maximize the log-likelihood Soybean yield maps using regular and optimized sample with different configurations by simulated annealing Eng.Agríc., Jaboticabal,v.36,n.1,  , which represents the intensity of spatial dependence (CAMBARDELLA et al., 1994).
Considering the locations of the original mesh (7588 points), yield values in these locations were estimated by ordinary kriging method, using values as the results of each regular grid (44, 66, 188 and 549 sampling points) and estimating the parameters selected by the models selection criteria.
The estimated yield values for the grids 100x100 m, 75x75 m, 50x50 m and 25x25 m were compared with the actual yield values obtained by the harvest monitor, using the sum of squared errors (SQE) of the spatial estimation and the accuracy measures described in Table 1.
To calculate the accuracy measures shown in Table 1, the error matrix (Table 2) as described by DE BASTIANI et al. (2012) was adapted in this study, considering ten classes of intervals of values (m = 10).Each element of the error matrix ( ) represents the number of estimations classified into class i ( ) of the model map (set of estimated values at the locations of the original mesh, obtained through the sampling points of the study grids) and class j ( ) of the reference map (values of the original mesh, obtained by the harvest monitor).
This study also developed optimized sample configurations with the same sample sizes used in regular grids, for its efficiency in spatial prediction at locations not sampled by the method called Simulated Annealing (SA) (RUIZ-CÁRDENAS et al., 2010), using the accuracy measure called Overall accuracy as an objective function to be maximized (Table 1).
The algorithm that determined the optimal sample configuration by Simulated Annealing was implemented through the following steps: Step 0: From i = 0, some measures of the algorithm were pre-determined, based on initial testing to ensure that the process avoids optimal locations and searches for more promising regions of the solution space.These measures are: stopping criterion equal to 1000 interactions, value for initial temperature equal to  : number of classes of error matrix.
Step 1: A random sampling configuration i S with reduced size d 0 (44, 66, 188 and 549 sampling points) was selected from the initial mesh.
Step 2: For this sample configuration, a spatial model was fitted by maximum likelihood method, and spatial estimation was performed of yield values at the locations of the original mesh, using the geostatistical interpolation technique called kriging.Next, the objective function was calculated for i S .
Step 3: A new random sampling configuration 1  i S was obtained, from the original mesh, and the objective function was calculated for 1 Step 4: The variation of the objective function that occurred between the two sampling schemes was calculated, expressed by S will be accepted with the probability described in [eq.( 1)]: (1) Step 5: The optimization process was completed if the stopping criterion was met.Otherwise, the value of the current temperature was decreased by the cooling schedule described in step 0, 1   i i was calculated, and step 3 was followed again.
For optimized sample configurations, a spatial model was fitted, following the criteria of cross-validation and LML.Then, based on the estimation of the spatial model and the optimized sample configuration, the soybean yield value was estimated at the locatio ns of the original mesh by the ordinary kriging method.
The estimated yield values for optimized grids were compared with the actual yield values, obtained by the harvest monitor, using the SQE of the spatial estimation and by the accuracy measures described in Table 1.
Regular sampling grids with different sample densities were compared with the sampling grids generated by the optimization process, considering the same sample sizes of regular grids.These sample configurations were compared for the following measures: estimates of accuracy measures described in Table 1, the SQE of spatial estimation and estimates of the parameters of the spatial model.

RESULTS AND DISCUSSION
Figure 3 shows the locations of the sampling configurations optimized for Simulated Annealing, considering the same sampling densities of regular grid (Figure 2).Comparing the arrangement of the points, between the regular and optimized grids, it is noted that the arrangement of sampling points in the optimized sampling settings shows that there is an attempt of the optimization process to distribute the points to gain a better coverage of the study area.The same results were observed in simulated datasets and chemical soil properties by GUEDES et al. (2011).Moreover, it is observed that with increasing sample size, there is a similarity between optimized sampling settings and regular grids as for the arrangement of the points.Table 3 shows descriptive statistics of the variable soybean yield in t ha -1 , for the different sampling grids analyzed, for the optimized sample configurations and for data from the harvest monitor.
It is important to note that some of these descriptive statistics were similar for all sampling grids and for the harvest monitor.The results show that soybean yield, at the different sampling grids, obtained by the monitor showed low dispersion (SD) and homogeneity in the frequency distribution of data from different sampling grids, with low (CV ≤ 10%, GOMES, 2000) and medium (10 % < CV ≤ 20%, GOMES, 2000) dispersions.
Table 4 shows the estimated parameters of the model fitted to the semivariance fucntion by ML.According to the results presented, the model chosen according to the criteria of crossvalidation, was the model of the Matérn family (parameter k = 2), for configurat ions with 44, 66 and 188 sampling points, and the exponential model was chosen for sampling configurations with 549 points.
The estimates presented for these models showed that there is a similarity in the estimates of the nugget effect and their standard deviations in all sample configurations and densities.It was also observed that most models fitted to different sampling patterns showed moderate spatial dependence (25% ≤ RNE ≤ 75%; CAMBARDELLA et al, 1994), with the exception of the model fitted to the data of the regular grid with the lowest sample size (44 points), which showed weak spatial dependence (RNE > 75%; CAMBARDELLA et al, 1994).The weak spatial dependence obtained for the data sets with the smallest sample size may have been influenced by the small sample size.This result corroborates the study of URIBE-OPAZO et al. (2007), which compared regular sampling grids with different sample densities and different methods of estimation of the spatial model, and found that the estimation of the parameters that describe the spatial dependence structure of a georeferenced variable, depends not only on the estimation method but also on the number of samples used for geostatistical analysis.In this way, the estimated models that best describe the spatial dependence structure of soybean yield were the models obtained for the optimized sample configurations, because these models had a higher estimated range, showing a larger radius of spatial dependence of yield in the area under study; less variability in the estimation of range (smaller standard deviations), which is Soybean yield maps using regular and optimized sample with different configurations by simulated annealing Eng.Agríc., Jaboticabal, v.36, n.1, p.114-125, jan./fev. 2016 121 indicative that the range was estimated with greater efficiency in these models; higher values for contribution and, as a consequence, lower values for estimates of the relative nugget effect, thus indicating an increase in intensity for the presence of spatial dependence.
Table 5 shows the results for accuracy measures and the SQE of spatial estimation, when comparing the productivity gained by harvest monitors with their estimates obtained by kriging.Using the configurations under study, it was observed that for the two types of sampling configuration under study (regular and optimized), spatial efficiency estimation was increased with increasing sample size because there was an increase of accuracy measures and a decrease in the SQE.This result confirms the conclusions obtained by HEDGER et al. (2001), in their study of regular sampling configurations as for their efficiency in the estimation of water quality; by SOUZA et al. (2014) in their study about the accuracy in geostatistical analysis and interpolation maps for precision agriculture in the sugarcane area; and by GUEDES et al. (2011), when studying the determination of efficient optimized sample configurations for the prediction of chemical soil properties in a smaller area with a much smaller number of sampling points in the initial mesh.ANDERSON et al. 1976;KRIPPENDORFF, 1980).These results are directly influenced by the low number of sampling points established in the present study, which respectively account for 0.58%, 0.87%, 2.48% and 7.24% of the total points obtained by the harvest monitor.
Despite these low levels of accuracy, it is noted that the optimized sample configurations showed the best values for these measures, i.e., they showed the highest values for the accuracy indices and the lowest values for the SQE of spatial estimation.
The maps that describe the spatial variability of yield in the study area also highlight the differences in the type of sampling configuration used and in the sampling density specified (Figure 4).These maps show a difference in the formation of sub-regions that describe the spatial variability of yield, when compared with the maps designed through the regular sampling configurations (Figures 4a to 4d) with the designed maps based on the optimized sampling configurations (Figures 4e to 4h).However, the most noticeable visual difference is when, in both types of sampling configuration, the resulting maps are compared with the sampling configurations whose sizes are larger.It was observed that the increasing of sample size is related to a more detailed characterization of sub-regions that describes the different intervals of yields.Studies conducted by URIBE-OPAZO et al. (2007) andCOELHO et al. (2009) using regular grids in soybean fields had similar conclusions about the best detailed information of the map that describes the spatial variability of georeferenced variables, with increasing sample size.
Soybean yield maps using regular and optimized sample with different configurations by simulated annealing Eng.Agríc., Jaboticabal, v.36, n.1, p.114-125, jan./fev. 2016 123 These same conclusions were also found by RIFFEL et al. (2012) in their study of the spatial and temporal distributions of pest insects on soybean crop.Moreover, RIFFEL et al. (2012) emphasized that better detailed thematic maps generates a greater amount of informations obtained for the study area, promoting a decision with greater scientific basis.

CONCLUSIONS
It was observed that factors such as sample size and sampling density influenced the description of the spatial distribution of soybean yield and consequently the information presented by the thematic map generated for the description of the productivity in the study area.It was also verified that the increasing of the sample size leads to the increasing of the spatial efficiency estimation (increase of accuracy measures and decrease of SQE) and a better decision regarding whether there is or not spatial dependence.
Moreover, for each sample size, the best results, in terms of efficiency in the spatial estimation of yield, were obtained by the optimized sampling configurations.
Thus, among the sample sizes and configurations studied, the optimized sampling configurations with 188 and 549 sampling points, and the regular configuration with 549 sampling points, showed the best results for the estimation of the spatial model that describes the structure of spatial dependence and the best results for the spatial estimation of soybean yield.

FIGURE 1 .FIGURE 2 .
FIGURE 1. Location of the commercial area and database generated by harvest monitors totaling 7588 points.
n : number of estimations classified into to class i of the model map and class j of the reference map; : number of estimations classified into class i of the model map; : number of estimations classified into class j of the reference map; : number of estimations classified into the same class in both maps; : total values in the original mesh (harvest monitor); The software R (R DEVELOPMENT CORETEAM, 2013) and its geoR package(RIBEIRO  JR. & DIGGLE, 2001;DIGGLE & RIBEIRO JR., 2007) were used as computational tools for the Soybean yield maps using regular and optimized sample with different configurations by simulated annealing Eng.Agríc.,Jaboticabal,v.36, n.1,  jan./fev.2016 119 estimation, fitting of models, spatial estimation and development of the optimization routine.This software program is open source and have GPL (General Public license).
jan./fev.2016 117 function.The criteria for model selection were cross-validation and maximum value of the loglikelihood function (LML).

TABLE 2 .
General error matrix.
 m

TABLE 3 .
Summary of statistics for the four regular grids analyzed, for the harvest monitor and for the optimized sample configurations.

TABLE 4 .
Parameters estimates for the spatial models, where the standard deviation values of such estimates are shown in parentheses.

TABLE 5 .
Overall accuracy ( ), Kappa's ( ) and Tau's ( ) concordance indices and sum of squared error (SQE) for the spatial estimation performed by the regular grids and optimized.