Model to estimate the sampling density for establishment of yield mapping

Yield mapping represents the spatial variability concerning the features of a productive area and allows intervening on the next year production, for example, on a site-specific input application. The trial aimed at verifying the influence of a sampling density and the type of interpolator on yield mapping precision to be produced by a manual sampling of grains. This solution is usually adopted when a combine with yield monitor can not be used. An yield map was developed using data obtained from a combine equipped with yield monitor during corn harvesting. From this map, 84 sample grids were established and through three interpolators: inverse of square distance, inverse of distance and ordinary kriging, 252 yield maps were created. Then they were compared with the original one using the coefficient of relative deviation (CRD) and the kappa index. The loss regarding yield mapping information increased as the sampling density decreased. Besides, it was also dependent on the interpolation method used. A multiple regression model was adjusted to the variable CRD, according to the following variables: spatial variability index and sampling density. This model aimed at aiding the farmer to define the sampling density, thus, allowing to obtain the manual yield mapping, during eventual problems in the yield monitor.


INTRODUCTION
The spatial variability of soil properties in an agricultural area is something already known by producers.With the advent of precision agriculture (PA), spatial and temporal variability can be considered at large scale aiming to improve the implementation and use of inputs, increase productivity, reduce production costs and environmental impact (Farias et al., 2003).PA provides necessary accuracy and precision in its tools and technologies, making it possible to increase productivity and profitability of crop production while lowering environmental impacts (Corwin & Plant, 2005;Koch et al., 2004).This variability is observed through yield maps that provide parameters to diagnose and correct the causes of lower yields in some areas and study why yield is high in other areas.
However, when a combine is operating in a given area and monitoring yield, problems may occur with the monitor, which prevent the monitoring of the remaining area.In this case, one solution would be to obtain the yield map from a manual sampling.However, there are doubts concerning the sampling density to be used and what would be the loss of information regarding the map that hypothetically would be produced by the monitor.This loss of information, however, can be measured from the comparison of maps produced by both methodologies.From a georeferenced database of yield and using a Geographic Information System (GIS) it is possible to generate yield maps through interpolation.However, one aspect still to be explored is the influence of different types of interpolators in the preparation of thematic maps.Jones et al. (2003) cites that many articles have been published comparing different interpolation methods in a variety of data types.Most of these studies involved comparisons of two-dimensional interpolation methods.The most studied methods were kriging and inverse distance weighting (IDW).Eight studies showed kriging as the best; even when kriging was better "on average", IDW was higher under certain circumstances.Two of the studies showed IDW superior to kriging, and six studies showed little difference between kriging and IDW.
There are algorithms available to construct maps but not for comparison among maps (Lourenço & Landim, 2004).One way is to compare maps by multiple linear regression analysis as Brower & Merriam (2001) when comparing structural contour maps in order to understand the geological history of a region.According to EMBRAPA (2007), another way of comparing maps is by using the kappa index of agreement, which tests the association among maps.The analysis of accuracy is achieved by confusion matrix or error matrix, and subsequently calculated the kappa index of agreement (Congalton & Green, 1993).According to Coelho et al. (2009), another parameter for comparing two thematic maps is the coefficient of relative deviation (CRD), which expresses the average difference, in module, of interpolated values in each map considering one of them as standard map.The lower the percentage found, the higher the similarity between them.
In this context, the objective was to determine the influence of sample density and the type of interpolator on the accuracy of yield maps to be produced from manual sampling of grains.

MATERIAL AND METHODS
Yield data used in this study were collected from a farm located in Cascavel, Paraná, with geographic center under the coordinates 24º 58 ' 44.4'' S and 53º 31' 26.4'' W, 650 m average altitude.The maize crop with physiological cycle of about 120 days was planted between 25 to 30 January 2004 under notillage system and 0.20 m plant spacing and 0.70 m row spacing.The crop was harvested from June 30 to July 2, 2004, separating from a 13.2 ha sub-area to be analyzed in this study.The whole area was harvested with combine and harvest monitor and data used to simulate a manual harvest.The yield monitor used was AgLeader® PF 3000®, installed in a combine harvester New Holland TC 57®, corn platform with six row.14,760 points were collected, which passed through filtration and removal of inconsistent data following the methodology used by Bazzi et al. (2008).In this analysis, spots with times of filling and emptying the harvester less than six seconds and delay time of less than twelve seconds were eliminated.It was also eliminated points with incorrect platform width, failure to GPS differential signal, outliers or zero water content of grains (below 12% and above 40%), positioning error, and discrepant yield values (through the box-plot graphs).When concluded the elimination of data considered inconsistent, remained 13,473 points, which originated the map (Figure 1A) of observed yield.From this, maps were constructed performing the points interpolation, resulting in contour maps which allowed a better analysis (Figure 1B).
Maize yield maps using manual sampling are performed by harvesting 1 to 2 linear meters of two adjacent rows.However, in this work, no manual samplings were performed, which were simulated using six grids (Figure 3) from the elimination of points of maps corresponding to areas 1, 2 and 3.This estimate considers that the manual sample (usually between 0.7 and 1.4 m 2 ) has average productivity similar to the sample taken with the yield monitor (15 to 20 m 2 ).For more information sources, four replications were performed in each area, taking care not to repeat the same points in each repetition and that they stay as far away as possible.Once four replications were performed in each one of the seven grids and at three areas, it reached a total of 84 data sets.The objective of this simulation was to study the sampling density needed to produce maps of productivity from manual sampling, when sampling for some reason cannot be carried out with a harvest monitor.
Data normality was verified by tests of Anderson-Darling and Kolmogorov-Smirnovs at 5% significance level.Data showing normality in at least one of the tests were considered as having normal probability.The outliers were checked through the box-plot graphs.The coefficient of variation (CV) was considered low when  10% (homoscedasticity), medium when 10% < CV  20%, high when 20% < CV  30%, and very high when > 30% (heteroscedasticity) according to Gomes & Garcia (2002).A.
In geostatistics analysis semivariograms were constructed to verify the spatial dependence on the data.To estimate the experimental semivariance function, the estimator proposed by Cressie & Hawkins (1980) was used.Experimental semivariograms were obtained by applying methods of ordinary least squares adjustment (OLS), adopting the isotropic model (unidirectional semivariogram) with 50% cut-off the maximum distance.In this analysis, the computer program VESPER ® 1.6 Demo was used.Similarly to the spatial dependence index (SDI), proposed by Cambardella et al. (1994), spatial variability index (SVI) Eq. 1, was defined in order to define a proportional index to the spatial variability and not inversely proportional as the SDI.The SVI classification adopted was: very low for SVI < 20%; low for 20  SVI < 40%; medium for 40  SVI < 60%; high for 60  SVI < 80%; and very high for SVI > 80%. .100 In generating thematic maps three types of interpolators were used: inverse of distance (ID), inverse of square distance (ISD), and ordinary kriging and, using the computer program SURFER ® 8.0 were prepared the yield maps.In comparing the maps generated from each grid with the original map were used the coefficient of relative deviation (CRD) (Coelho et al., 2009) Eq. 3, which expresses the average difference in module of interpolated values in each map, considering one of them as a standard map and the kappa index (Cohen, 1960) Eq. 4, which uses a confusion matrix for further index calculation.The aim was to evaluate the quality loss of yield maps when reducing the number of sampling points.n -number of points P ij -productivity at point i to the map j P ist -productivity at point i for the standard map (kriging with original data) r -number of rows in a cross-classification table x ii -number of combinations on the diagonal x i+ -total observations in row i x +i -total number of observations in column i n -total number of observations La ndi s & Koch (1977) suggested the foll owi ng interpretation for kappa index values (K): no agreement < 0, poor agreement for 0  K  0.19, partial agreement for 0.20  K  0.39, moderate agreement for 0.40  K  0.59, excellent agreement for 0.60  K  0.79, perfect agreement for 0.80 d" K d" 1.00.In practice, the kappa index represents the proportion of pixels that were coincident beyond those that would be by pure chance.To evaluate the behavior of similarity between maps, as measured by the coefficient of deviation (CRD), depending on the density and spatial variability, it was fitted the multiple regression model to the CRD variable, depending on the variables SDI (spatial dependence index) and SD (sampling density).The variable selection method used was the best subset.0.05 significance for F distribution was used to control entry and exit effects.The adjusted coefficient of multiple determination (adjusted R 2 ) was used as criterion for selection of the best models.

RESULTS AND DISCUSSION
For space reasons, only data for Area 1 (chosen by lottery) and those for the three areas together will be shown.The maize's average yield (second season) from Area 1 ranged from 5,795 to 6,082 kg ha -1 , with a mean value of 5,919 kg ha -1 , 38.7% above the average for Paraná state (4,266 kg ha -1 ) and 94.6% above the average for Brazil (3,041 kg ha -1 ) (SEAB, 2007).The coefficients of variation (CV) ranged from 13.8% (medium) to 25.2% (high), with 16.7% average (medium), thus characterizing relative data homogeneity (Pimentel- Gomes & Garcia, 2002).Through the normality tests performed was found that 50% yield data sets showed normal distribution at 5% significance level (Table 1).
Regarding the three areas studied, the average yield ranged from 4,491 to 6,082 kg ha -1 (Table 2), averaging 5,440 kg ha -1 , 78.9% above the national average and 27.5% above the Paraná average.CVs ranged from 11.8% (average) to 26.1% (high), with 18.5% average (medium), thus characterizing relative data homogeneity.Through the normality tests performed was found that 64% yield data sets showed normal distribution at 5% significance level.
Models and parameters adjusted to the semivariograms for Area 1 are shown in Table 3.
The nugget effect (C 0 ) for data set of the sampling scheme 61_A1_R1 (61-point grid of Area 1 and repetition 1) was 454,602, which compared to other nugget effects for different purposes sampling schemes showed the lowest value.However, the sampling scheme with data set 15_A1_R3 had showed highest C 0 , (1,969,444).The range was the greatest for 121_A1_R3, (361.6 m) while the least range was 25.9 m for 61_A1_R1.The sill ranged between 663,333 for the data set 30_A1_R2 and (1) Table 2. Global overview of the data descriptive analysis used in the three areas studied 2,223,757 for 15_A1_R3.The spatial variability was between very low (10.6%) and medium (45.0%) according to the classification adopted for the spatial dependence index.
Regarding the three areas studied, 77% cases used the spherical model while 23% used that exponential.The nugget effect (C 0 ) ranged between 434,977 and 1,969,444.The major and minor ranges were 361.6 m and 17.0 m respectively.The sill varied between 488,275 and 2,223,757.Spatial dependence showed between very low (5.4%) and high (61.7%).
Based on the fitting parameters and models fitted to individual semivariograms (Figure 4, Area 1 and repetition 1) thematic maps were constructed by applying the interpolation inverse of square distance, inverse of distance and kriging for the study variable (maize yield in kg ha 1 ).With seven grids made in each area (1, 2 and 3), four replicates per area, and three interpolators, 252 maps were obtained.For similarity analysis of thematic maps was calculated the average difference in module, ie the coefficient of relative deviation (CRD).As examples, Figure 5 shows yield maps generated by kriging method corresponding to the semivariograms in Figure 4, where one can observe the loss of information with decreased number of points used.
Coefficients of relative deviations (CRD) derived from the comparison between simulated and original grids show gradual increase of deviation as the amount of points decreases in the three interpolation methods, ie, CRD decreased with increasing sampling density (division of the number of sampling data and sampling area) (Figure 6).For the same sampling density, CRD, deviation from the original map, was smaller for the interpolation inverse of square distance (ISD), followed by the inverse of distance (ID) and lastly by kriging (KRIG) (except for higher density of points).This result is in agreement with Coelho et al. (2009) and Bazzi et al. (2008), who found better performance for interpolation using inverse distance weighting when compared to kriging.Comparing the explanation of the dependent variable CRD, R 2 assumed values of 0.32 (Kriging), 0.61 (ID), and 0.79 (ISD), ie, explanations ranged from 32 to 79%.Kappa indexes resulting from comparison between simulated and original grids (Figure 7) decreased progressively as sampling density for the three interpolation methods was reduced.The highest index was found in the ID method, i.e.ISD and Krig methods were more influenced by sampling point's elimination.As with the CRD, for the same sample density, kappa index showed higher concordance with the original map for the ISD interpolator; followed by ID and finally by Krig (except for the highest density of points).This result is in agreement with and Bazzi et al. (2008).Comparing the explanation of the dependent variable kappa index, the R 2 assumed values of 0.51 (Krig), 0.54 (ID) and 0.60 (ISD), ie the explanation ranged between 51 and 60%.

Mean
The kappa index was associated linearly with CRD (Figure 8), with R 2 between 0.95 and 0.97, ie, explanation of the kappa index above 95%.This means that comparisons of thematic maps performed by both CRD and kappa methods lead to similar results.The choice for sampling grid depends on the degree of similarity desired between the map that will build and the map that would be produced by the combine equipped with yield monitor.This similarity is estimated by the CRD and the lower the CRD the more similar the maps.To better understand its behavior, a model was developed to describe its dependence as function of the spatial variability index (SVI) and the sampling density (SD, points ha -1 ).For such an exploratory analysis was carried out initially, which resulted in the following model  SVI and SD -independent variables a, b,..., a, b, c, d, and e -model parameters to be estimated by the method of least squares e -random error, which is assumed with normal distribution, zero mean and constant variance  2 Applying the aforementioned model to the data collected, an explanation between 87% (ID) and 93% (KRIG) was obtained (Table 4).
In practice, these models serve to enable the farmer to obtain the yield map, even when problems occur on the harvest monitor preventing data storage.For such, the following procedure is suggested: 1) With the data already collected

Figure 7 .
Figure 7. Kappa index as function of sampling density according to the interpolation method: inverse of square distance (ISD), inverse of distance (ID) and kriging (Krig)

Figure 8 .
Kappa coefficient as function of the coefficient of relative deviation (CRD) interpolated by (A) inverse of square distance, (B) inverse of distance, (C) kriging and (D) three interpolation methods

Table 1 .
Descriptive analysis of yield for data sets used in Area 1

Table 3 .
Models and parameters adjusted to the semivariogram for yield data sets for Area 1

Table 4
Estimated coefficients of multiple regression for the coefficient of relative deviation (CRD) according to the spatial variability index (SVI) and the sampling density (SD, points ha -1 ) Note: All estimators were significant by t test at 5% probability