NUGGET EFFECT INFLUENCE ON SPATIAL VARIABILITY OF AGRICULTURAL DATA

Spatial variability description of soil chemical properties by thematic maps depends substantially on suitable geostatistical models. One of the parameters composing a geostatistical model is nugget effect. This study aimed to evaluate the simultaneous influence of nugget effect and sampling design on geostatistical model estimation and estimation of soil chemical properties at unsampled sites, considering simulated data. Our results will be used as scientific basis for spatial variability analyses of soil chemical properties in agricultural areas. Given the simulation results and agricultural data, we concluded that the high nugget effect values obtained here reduced spatial estimation efficiency. Moreover, a systematic sampling design promoted the least accurate estimates of geostatistical model and at non-sampled sites. Despite that, these nugget effect estimates should be kept in the analysis. However, further studies will be needed to investigate which factors are responsible for such high nugget effect values.


INTRODUCTION
Spatial analysis of a georeferenced variable using geostatistical models enables measuring the spatial dependence degree among samples within a determined area, thus describing its spatial dependence structure (Guedes et al., 2018). The spatial dependence analysis, mainly of soil chemical properties in farmlands, allows to know their values in subregions (management zones) within the area of interest, which, in turn, enables the application of agricultural inputs at specific points (Gazolla-Neto et al., 2016).
The spatial dependence structure of a certain georeferenced variable should be described considering a stochastic process, whose data are expressed by Z(s1), Z(s2), …, Z(sn), which are known in n sites si (i = 1, …, n), being that si = (xi, yi) T is a two-dimensional vector. The georeferenced variable can be expressed by a Gaussian spatial linear model: Z(si) = µ(si) + (si) , in which µ(si) = µ is the deterministic term, µ is a constant, and (si) represents the stochastic term with mean zero, i.e., E[(si)] = 0; and variation between points in space separated by an Euclidean distance hij = ||h||, so that h = si -sj (i, j = 1, …, n) is determined by a covariance function: C(hij) = cov[(si), (sj)] = σij, which depends only on h. Moreover, a covariance matrix is found from the covariance function, as follows: Σ = [(σij)] Guedes et al., 2011;.
One of the functions used to describe a spatial dependence structure is referred to as a semivariance function γ(hij). It measures dissimilarity between values sampled at sites separated by a distance hij, for stationary and isotropic processes, and is associated with the covariance function by γ(hij) = C(0) -C(hij) .
In the literature, different theoretical models have been developed to define spatial dependence structure (semivariance function), and several methods have been used to estimate these models (Cressie, 2015;Monego et al., 2015;Cortés-D et al., 2016). The model estimated for semivariance function has the following parameters: range (a), partial sill (C1), nugget effect (C0), and sill (C0+ C1). Range is the longest distance between sampling sites spatially correlated within an area. Sill is the semivariance value when distance equals range, corresponding to the variance of a georeferenced variable (Kestring et al., 2015).
Semivariance function has a minimum distance (hmin), within which its semivariance value (γ(hij)) is calculated. When it is high, the phenomenon has high variability over a small range of distances. In these situations, the semivariance function has a parameter called nugget effect (Peng & Wu, 2014;Cressie, 2015;Genton & Kleiber, 2015. Small-scale variability may be associated with features of the studied process and/ or Engenharia Agrícola, Jaboticabal, v.40, n.1, p.96-104, jan./feb. 2020 measurement errors. It may also occur due to data heterogeneity or sampling size and scheme (BOSSEW et al., 2014;Seidel & Oliveira, 2014;Lark & Marchant, 2018;Wadoux et al., 2019). Vallejos & Osorio (2014), Cressie (2015), Bassani et al. (2018) suggested a relationship between nugget effect and (a) features related to spatial prediction, (b) a structure that describes spatial dependence and (c) sample design. Chipeta et al. (2016) described that when nugget effect is not zero, a sampling design with closer pairs of sampling points should be considered to improve geostatistical model estimation and spatial prediction.
However, there is a gap about the implications of nugget effect and sample design on geostatistical model estimation and spatial prediction simultaneously (expressed by thematic maps). This is especially true if the evaluation considers the simultaneous influence of nugget effect and sampling design.
Thus, the goals of this study were: 1) to evaluate the influence of nugget effect on geostatistical model estimation and estimation at non-sampled sites of a georeferenced variable, using Monte Carlo simulated data and considering different sample designs (random, systematic, and lattice plus close pairs); 2) to analyze the spatial variability of the following soil chemical properties: carbon, calcium, magnesium, and pH, considering the sampling data of an area whose sampling design was a lattice plus close pairs.

MATERIAL AND METHODS
Simulated datasets originated from multivariate stochastic processes, assuming stationary variables, with an isotropic Gaussian linear model . Sampling designs with 100 sampling points arranged in a regular area with a maximum limit of coordinates equal to 100 m were considered. Three sample designs were simulated: systematic (lattice) 10 × 10 grid, random and lattice 19 × 19 grid added by 19 randomly chosen nearby points (lattice plus close pairs). Sample size and the latter sampling design were chosen due to the design used in the study area.
Twelve trials were considered for each sample design, totaling 36 sets of simulations. For each set of simulation were considered 100 simulations, totaling 3600 simulations for every trials. Each simulation attempt was made considering an exponential model for semivariate function, with fixed parameters for a practical range equal to 60 m and a sill of 10 m. Nugget effect was the parameter that varied between trials. Thus, 12 trials were performed considering the following nugget effect values: 0, 1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, and 9.
Semivariance function parameters were estimated for each simulation by maximum likelihood and respective asymptotic standard errors were calculated (De . Moreover, the following measures that quantify the spatial dependence intensity were estimated: relative nugget effect (RNE) (Equation 1) and spatial dependence index (SDI) (Equation 2).

(2)
in which: MF is the model factor and reflects the spatial dependence strength of each model (for exponential, spherical, and gaussian models, MF values are 0.317, 0.375, and 0.504 respectively); C0 is the nugget effect; C0 + C1 is the sill; a is the practical range; and q.MD is the fraction (q) of the maximum distance (MD) between sample points. In this study, we assumed q as 50% of the maximum distance. RNE value was proposed by Cambardella et al. (1994), which describes the proportion of sill represented by the nugget effect. Yet, SDI was proposed by Seidel & Oliveira (2014) and includes a greater amount of information in its calculation compared to the RNE (nugget effect, sill, practical range, and semivariance function model).
Scale of values of RNE and SDI indexes are different from each other, as well as their interpretation. While RNE ranges from 0 to 100% for all semivariance function models, SDI depends on semivariance function model. According to Seidel & Oliveira (2014), for exponential, spherical, and Gaussian functions, SDI varies respectively from 0 to 31.7%, from 0 to 37.5%, and from 0 to 50.4%. Regardless of the model describing semivariance function, the closer the SDI to its maximum value, the greater the spatial dependence of the variable under study.
Considering the estimated geostatistical models, the values of the georeferenced variables at unsampled sites were estimated using the kriging method. Spatial estimation mean variance or kriging variance ( ) was calculated since it is an estimation efficiency measure, wherein the lower its value, the better the spatial estimation efficiency (Cressie, 2015;Kleijnen & Mehdad, 2016).
Simulations using nugget effect values from 0 to 8 were compared to those with nugget effect equal to 9. This comparison encompassed the following measures: the sum of squared difference between spatial estimates and accuracy measure, known as overall accuracy (OA) (De Bastiani et al., 2012). Comparison methods were chosen to compare georeferenced spatial estimates with those generated with a lower degree of spatial dependence.
We also analyzed the spatial variability of a set of real data from a commercial grain area of 167.35 ha. The area is located in the city of Cascavel, western Paraná State, Brazil. The geographical coordinates are the following: 24.95º S latitude, 53.37º W longitude, and 650 m above sea level. Local soil is classified as Oxisol, with clayey texture and deep layers of good water storage capacity, porosity, and permeability (De Bastiani et al., 2012). Local climate is very wet and classified as mesothermal, Cfa (Köeppen), with an average annual temperature of 21ºC (Kestring et al., 2015).
Engenharia Agrícola, Jaboticabal, v.40, n.1, p.96-104, jan./feb. 2020 A lattice plus close pairs sampling was performed, with a maximum distance of 141 m between sampling sites. In some sites, random samples were taken at shorter distances: 75 and 50 m between sites, resulting in a total of 102 sampling points. All samples were georeferenced and located with the aid of a signal reception device of Global Positioning System (GPS) GEOEXPLORE 3, in the UTM spatial coordinate system.
Soybeans have been grown under no-till system since 1994. We used data from the 2010/2011 crop season, which are related to the following soil chemical properties: carbon (C -g dm -3 ), calcium (Ca -cmol dm -3 ), magnesium (Mg -cmol dm -3 ), and pH. The dataset of this study was acquired by routine chemical analysis, collecting a soil sample at each marked point and five subsamples of the 0.0 to 0.2 m depth range, close to the marked points, being mixed and placed in plastic bags of about 500 g, thus composing a representative sample of the portion. These samples were sent to the Laboratory of Soil Analysis of the Central Cooperative for Agricultural Research (COODETEC) for routine chemical analyses.
The best model was fit to the semivariance function for each variable under study, according to cross-validation criteria (Lu et al. 2012;Robinson et al., 2013). Asymptotic standard errors, RNE (Equation 1) and SDI (Equation 2), were calculated for each model. Moreover, using the kriging method, thematic maps of spatial variability of variables were created for the area under study. Simulated data sets and statistical and geostatistical analyses were performed by R software (R Development Core Team, 2018), using the geoR module (Ribeiro Junior & Diggle, 2001).

Simulated Data Analysis
Nugget effect estimation was the mostly influenced parameter by sample design changes in all simulations. On average, the worst results were obtained in systematic sampling, where the estimated values were more distant from the nominal value of this parameter (Table 1). Moreover, the standard error estimates of C0 were higher (Figure 1-a) compared to those of random design and lattice plus close pairs (Figure 1-b and 1-c).
Nugget effect was overestimated in simulations using lower values of this parameter, but underestimated when higher values were used. The best estimates were achieved in random design as it was one of the sampling designs. Nugget effect estimates were on average close to the nominal value, showing less variability (Table 1) and lower standard errors (Figure 1-b). Lattice plus close pairs was considered the secondbest design in terms of nugget effect estimation, as it also showed on average nugget effect estimates closer to the nominal value (Table 1) and low standard error estimates (Figure 1-c). But unlike the previous one, this model showed a greater variability of estimates compared to those of the random design.
All sampling designs in this study generated similar range and sill estimates. In this sense, relevant differences in C0 estimates influenced directly RNE and inversely SDI index calculations. When considering RNE and SDI indexes and all simulations, the lowest spatial dependence was observed in systematic sampling if compared with random design and lattice plus close pairs. Estimations of RNE and SDI indexes closer to the nominal values were obtained by random design.
These results corroborate the conclusion of Kestring et al. (2015), Zhao et al. (2016), and Bussel et al. (2016), who claimed who stated about sample design and size influences on geostatistical model estimation.  Table 2 also presents a descriptive summary of average kriging variance ( ) for a georeferenced variable at non-sampled sites, with different nugget effect values and sampling designs. Average kriging variance results were similar for all sampling designs and increased as nugget effect was raised.
According to Cressie (2015), georeferenced variables can be decomposed into two random terms: a second-order stationary process and a white-noise measurement process. In this case, when interpolation is done at an unsampled point, variance estimate exceeds stationary variance by the amount of white-noise measurement variance (corresponding to nugget effect variation) (Burgess & Webster, 2019). Therefore, the nugget effect and kriging variation are directly related. Average kriging variance shows how efficient a spatial estimation of unsampled sites was, thus, the smaller the spatial estimation the more efficient it is. These results (Table 2) showed that the higher the nugget effect (i.e., the lower the spatial dependence degree of a georeferenced variable), the lower the efficiency of spatial estimation by kriging.
Kriging spatial estimation results in unsampled sites for simulations with C0 between 0 and 8 were compared to those obtained for simulation with C0 equal to 9 (Table 2 and Figure 2) by the sum of squared difference (SSD) ( Table 2). In all sample designs, as the nugget effect increased, SSD between spatial estimates decreased.
This result indicates that the closer the nugget effect values are in a geostatistical model for spatial estimation, the more similar their results are in terms of estimation, regardless of the sampling design. The geostatistical model used for comparison (C0 = 9) represents a georeferenced variable with pure nugget effect, that is, without spatial dependence. Thus, the results of SSD (Table 2) showed that the stronger the spatial dependence of a georeferenced variable (lower C0 value), the greater the dissimilarity thereof with a georeferenced variable without spatial dependence, as far as spatial prediction is concerned.
Kriging equations depend on semivariance function, especially for nugget effect. Higher values of this parameter imply higher kriging weight for distant samples, which, in turn, produces a smoother thematic map (Bassani et al., 2018).
Engenharia Agrícola, Jaboticabal, v.40, n.1, p.96-104, jan./feb. 2020 These results evidenced the importance of accurately modelling the nugget effect, considering its association with kriging estimation and sampling design. For high nugget effect values (concerning its sill), it is therefore recommended to consider a sample design with many short-distance sites (Chipeta et al., 2016), to minimize uncertainties in semivariance function parameters and kriging estimates (Wadoux et al., 2019). Figure 2 shows the box plot graphs for each sampling design, with similarity measure OA by comparing spatial estimations, considering estimates from data simulated using nugget effect equals to 9 (as reference for mapping) and spatial estimation from data simulated using the other values of nugget effect (as a model maps).
For systematic sampling (Figures 2-a), the findings described above are valid for all simulations considering the nugget effect values that describe the georeferenced variable as having weak spatial dependence; and most nugget effect values that consider the georeferenced variable as having moderate spatial dependence.
The systematic sampling showed the highest measurements of accuracy compared with the others, especially for C0 above 5. For C0 = 8, which belongs to the same classification as spatial dependence intensity for the data simulated with nugget effect equal to 9, simulations increased (20%) with OA ≥ 0.85, which indicates high similarity between spatial estimations carried out at nonsampled sites (De Bastiani et al., 2012).
Based on the results, the systematic design had lower sensitivity to the spatial estimation process, with changes in nugget effect value, when compared with the random sampling and lattice plus close pairs. This is due to the poor quality of nugget effect estimation in systematic sampling (Table 1 and Figure 1), which hence produced a low quality in kriging estimation.

SPATIAL VARIABILITY ANALYSIS OF SOIL CHEMICAL PROPERTIES
Descriptive statistics for carbon (C), calcium (Ca), magnesium (Mg), and pH are given in Table 3. In analyzing these values, we can note that all parameters presented data with low dispersion and homogeneity. According to crossvalidation criteria, the Gaussian model was the best-estimated model of the semivariance function for C, Ca, and Mg. As for pH, the best-estimated model was the exponential one. The spatial dependence radius for C, Ca,Mg,and pH were 254.90,639.90,685.47, and 300 m, respectively. All parameters showed nugget effect values higher than sill ones. If we associate this with simulation results, we could notice that these high nugget effect values for all parameters generate loss of efficiency in spatial estimation by kriging. However, as reported by Webster & Oliver (2007), such values should not be disregarded because the model must be correctly estimated and incorporate the estimated nugget effect value. Furthermore, in analyzing spatial dependence degree, the parameters C, Ca, and pH are classified as with moderate (25% < RNE ≤ 75%) while Mg as with weak (RNE > 75%) (Cambardella et al., 1994).
Based on quartiles, the same analysis was performed for SDI since it makes up the strictest index to evaluate degree of spatial dependence. This approach was used to rank C, Ca, and Mg within a weak spatial dependence and pH within a moderate degree (Seidel & Oliveira, 2014). When comparing the two indices, we found a difference for C and Ca, which is related to the range value and the estimated model. Figure 3 displays the thematic maps of estimates for each factor. We observed that the thematic maps for carb2on (Figure 3-a) and pH (Figure 3-d) had less smoothing as for distribution of estimates in the area, whereas calcium (Figure 3-b) and magnesium (Figure 3-c) had more smoothed maps. Calcium and magnesium showed weak spatial dependence and nugget effect values higher than those of sill (Table 3). Such finding emphasizes the influence of nugget effect on the spatial estimation of georeferenced parameters in non-sampled sites, and as a result, a greater thematic-map smoothing for high nugget effect values.
According to Webster & Oliver (2007) and Hofmann et al. (2010), increasing values of nugget effect provide an improved distribution of weights in spatial estimation, which hence generates smoother thematic maps. The nugget effect can be decreased by shortening gaps between samples, i.e., increasing sample density (Kestring et al., 2015); however, it is usually unfeasible for soil properties due to the costly requirements.
The spatial variability map for carbon (Figure 3-a) showed that lower carbon contents are located midwestern, while high levels are mainly within the northern region. The thematic maps for calcium (Figure 3-b) and pH (Figure 3-d) presented reduced levels mainly within the southern region. Figure (3-c) presents the thematic map for magnesium, in which can be seen that the northern and southern regions have the lowest magnesium contents of the studied area.

CONCLUSIONS
The behavior of near-origin semivariograms described by the nugget effect strongly affects spatial estimation by kriging of georeferenced parameters at unsampled sites. Nugget effect has negative influence on stability of this type of modeling, that is, the higher the nugget effect, the lower the kriging efficiency of spatial Engenharia Agrícola, Jaboticabal, v.40, n.1, p.96-104, jan./feb. 2020 prediction. Nugget effect was also directly related to smoothed kriging estimates.
Systematic sampling exhibited the least accurate nugget effect estimation, the worst efficiency of spatial estimation, and the lowest degree of sensitivity to spatial changes as the nugget effect changes. Given the changes in nugget effect values, the best parameter estimates of the geostatistical model and estimates at unsampled sites occurred with the use of random design, followed by the lattice plus close pairs.
High nugget effect value was observed for all soil chemical properties. Low carbon contents were found in the midwestern region, while in the northern they were high. Calcium and pH presented the lowest values in the southern region, and the lowest magnesium levels were observed in the northern and southern regions.