INTRODUCTION
Spatial analysis of a georeferenced variable using geostatistical models enables measuring the spatial dependence degree among samples within a determined area, thus describing its spatial dependence structure (^{Guedes et al., 2018}). The spatial dependence analysis, mainly of soil chemical properties in farmlands, allows to know their values in subregions (management zones) within the area of interest, which, in turn, enables the application of agricultural inputs at specific points (^{GazollaNeto et al., 2016}).
The spatial dependence structure of a certain georeferenced variable should be described considering a stochastic process, whose data are expressed by Z(s_{1}), Z(s_{2}), …, Z(s_{n}), which are known in n sites s_{i} (i = 1, …, n), being that s_{i} = (x_{i}, y_{i})^{T} is a twodimensional vector. The georeferenced variable can be expressed by a Gaussian spatial linear model: Z(s_{i}) = µ(s_{i}) + ε(s_{i}) (^{UribeOpazo et al., 2012}), in which µ(s_{i}) = µ is the deterministic term, µ is a constant, and ε(s_{i}) represents the stochastic term with mean zero, i.e., E[ε(s_{i})] = 0; and variation between points in space separated by an Euclidean distance h_{ij} = h, so that h = s_{i}  s_{j} (i, j = 1, …, n) is determined by a covariance function: C(h_{ij}) = cov[ε(s_{i}), ε(s_{j})] = σ_{ij}, which depends only on h. Moreover, a covariance matrix is found from the covariance function, as follows: Σ = [(σ_{ij})] (^{Dalposso et al., 2018}; ^{Guedes et al., 2011}; ^{2018}).
One of the functions used to describe a spatial dependence structure is referred to as a semivariance function γ(h_{ij}). It measures dissimilarity between values sampled at sites separated by a distance h_{ij}, for stationary and isotropic processes, and is associated with the covariance function by γ(h_{ij}) = C(0) – C(h_{ij}) (^{UribeOpazo et al., 2012}).
In the literature, different theoretical models have been developed to define spatial dependence structure (semivariance function), and several methods have been used to estimate these models (^{Cressie, 2015}; ^{Monego et al., 2015}; ^{CortésD et al., 2016}). The model estimated for semivariance function has the following parameters: range (a), partial sill (C_{1}), nugget effect (C_{0}), and sill (C_{0}+ C_{1}). Range is the longest distance between sampling sites spatially correlated within an area. Sill is the semivariance value when distance equals range, corresponding to the variance of a georeferenced variable (^{Kestring et al., 2015}).
Semivariance function has a minimum distance (h_{min}), within which its semivariance value (γ(h_{ij})) is calculated. When it is high, the phenomenon has high variability over a small range of distances. In these situations, the semivariance function has a parameter called nugget effect (^{Peng & Wu, 2014}; ^{Cressie, 2015}; ^{Genton & Kleiber, 2015}. Smallscale variability may be associated with features of the studied process and/ or measurement errors. It may also occur due to data heterogeneity or sampling size and scheme (^{BOSSEW et al., 2014}; ^{Seidel & Oliveira, 2014}; ^{Lark & Marchant, 2018}; ^{Wadoux et al., 2019}).
^{Vallejos & Osorio (2014)}, ^{Cressie (2015)}, ^{Bassani et al. (2018)} suggested a relationship between nugget effect and (a) features related to spatial prediction, (b) a structure that describes spatial dependence and (c) sample design. ^{Chipeta et al. (2016)} described that when nugget effect is not zero, a sampling design with closer pairs of sampling points should be considered to improve geostatistical model estimation and spatial prediction.
However, there is a gap about the implications of nugget effect and sample design on geostatistical model estimation and spatial prediction simultaneously (expressed by thematic maps). This is especially true if the evaluation considers the simultaneous influence of nugget effect and sampling design.
Thus, the goals of this study were: 1) to evaluate the influence of nugget effect on geostatistical model estimation and estimation at nonsampled sites of a georeferenced variable, using Monte Carlo simulated data and considering different sample designs (random, systematic, and lattice plus close pairs); 2) to analyze the spatial variability of the following soil chemical properties: carbon, calcium, magnesium, and pH, considering the sampling data of an area whose sampling design was a lattice plus close pairs.
MATERIAL AND METHODS
Simulated datasets originated from multivariate stochastic processes, assuming stationary variables, with an isotropic Gaussian linear model (^{UribeOpazo et al., 2012}). Sampling designs with 100 sampling points arranged in a regular area with a maximum limit of coordinates equal to 100 m were considered. Three sample designs were simulated: systematic (lattice) 10 × 10 grid, random and lattice 19 × 19 grid added by 19 randomly chosen nearby points (lattice plus close pairs). Sample size and the latter sampling design were chosen due to the design used in the study area.
Twelve trials were considered for each sample design, totaling 36 sets of simulations. For each set of simulation were considered 100 simulations, totaling 3600 simulations for every trials. Each simulation attempt was made considering an exponential model for semivariate function, with fixed parameters for a practical range equal to 60 m and a sill of 10 m. Nugget effect was the parameter that varied between trials. Thus, 12 trials were performed considering the following nugget effect values: 0, 1, 2, 2.5, 3, 4, 5, 6, 7, 7.5, 8, and 9.
Semivariance function parameters were estimated for each simulation by maximum likelihood and respective asymptotic standard errors were calculated (^{De Bastiani et al., 2018}). Moreover, the following measures that quantify the spatial dependence intensity were estimated: relative nugget effect (RNE) (Equation 1) and spatial dependence index (SDI) (Equation 2).
in which: MF is the model factor and reflects the spatial dependence strength of each model (for exponential, spherical, and gaussian models, MF values are 0.317, 0.375, and 0.504 respectively); C_{0} is the nugget effect; C_{0} + C_{1} is the sill; a is the practical range; and q.MD is the fraction (q) of the maximum distance (MD) between sample points. In this study, we assumed q as 50% of the maximum distance. RNE value was proposed by ^{Cambardella et al. (1994)}, which describes the proportion of sill represented by the nugget effect. Yet, SDI was proposed by ^{Seidel & Oliveira (2014)} and includes a greater amount of information in its calculation compared to the RNE (nugget effect, sill, practical range, and semivariance function model).
Scale of values of RNE and SDI indexes are different from each other, as well as their interpretation. While RNE ranges from 0 to 100% for all semivariance function models, SDI depends on semivariance function model. According to ^{Seidel & Oliveira (2014)}, for exponential, spherical, and Gaussian functions, SDI varies respectively from 0 to 31.7%, from 0 to 37.5%, and from 0 to 50.4%. Regardless of the model describing semivariance function, the closer the SDI to its maximum value, the greater the spatial dependence of the variable under study.
Considering the estimated geostatistical models, the values of the georeferenced variables at unsampled sites were estimated using the kriging method. Spatial estimation mean variance or kriging variance (
Simulations using nugget effect values from 0 to 8 were compared to those with nugget effect equal to 9. This comparison encompassed the following measures: the sum of squared difference between spatial estimates and accuracy measure, known as overall accuracy (OA) (^{De Bastiani et al., 2012}). Comparison methods were chosen to compare georeferenced spatial estimates with those generated with a lower degree of spatial dependence.
We also analyzed the spatial variability of a set of real data from a commercial grain area of 167.35 ha. The area is located in the city of Cascavel, western Paraná State, Brazil. The geographical coordinates are the following: 24.95° S latitude, 53.37° W longitude, and 650 m above sea level. Local soil is classified as Oxisol, with clayey texture and deep layers of good water storage capacity, porosity, and permeability (^{De Bastiani et al., 2012}). Local climate is very wet and classified as mesothermal, Cfa (Köeppen), with an average annual temperature of 21°C (^{Kestring et al., 2015}).
A lattice plus close pairs sampling was performed, with a maximum distance of 141 m between sampling sites. In some sites, random samples were taken at shorter distances: 75 and 50 m between sites, resulting in a total of 102 sampling points. All samples were georeferenced and located with the aid of a signal reception device of Global Positioning System (GPS) GEOEXPLORE 3, in the UTM spatial coordinate system.
Soybeans have been grown under notill system since 1994. We used data from the 2010/2011 crop season, which are related to the following soil chemical properties: carbon (C – g dm^{−3}), calcium (Ca – cmol dm^{−3}), magnesium (Mg – cmol dm^{−3}), and pH. The dataset of this study was acquired by routine chemical analysis, collecting a soil sample at each marked point and five subsamples of the 0.0 to 0.2 m depth range, close to the marked points, being mixed and placed in plastic bags of about 500 g, thus composing a representative sample of the portion. These samples were sent to the Laboratory of Soil Analysis of the Central Cooperative for Agricultural Research (COODETEC) for routine chemical analyses.
The best model was fit to the semivariance function for each variable under study, according to crossvalidation criteria (^{Lu et al. 2012}; ^{Robinson et al., 2013}). Asymptotic standard errors, RNE (Equation 1) and SDI (Equation 2), were calculated for each model. Moreover, using the kriging method, thematic maps of spatial variability of variables were created for the area under study. Simulated data sets and statistical and geostatistical analyses were performed by R software (^{R Development Core Team, 2018}), using the geoR module (^{Ribeiro Junior & Diggle, 2001}).
Results and Discussion
Simulated Data Analysis
Nugget effect estimation was the mostly influenced parameter by sample design changes in all simulations. On average, the worst results were obtained in systematic sampling, where the estimated values were more distant from the nominal value of this parameter (Table 1). Moreover, the standard error estimates of C_{0} were higher (Figure 1a) compared to those of random design and lattice plus close pairs (Figure 1b and 1c).
C_{0} nominal value  Systematic  Random  Lattice Plus Close Pairs  

Mean  CV (%)  Mean  CV (%)  Mean  CV (%)  
0  0.297  225.670  0.068  210.730  0.254  202.170 
1  5.873  69.060  0.761  74.390  0.770  125.620 
2  4.531  82.690  1.600  53.850  1.396  99.400 
2.5  4.472  79.270  2.028  49.120  1.724  91.300 
3  4.225  78.970  2.455  46.200  2.108  82.880 
4  3.546  82.080  3.368  41.270  2.820  73.150 
5  2.747  91.110  4.202  41.970  3.525  67.370 
6  2.162  98.960  4.961  42.380  4.369  61.430 
7  1.581  113.460  5.887  40.130  5.026  59.940 
7.5  1.303  125.200  6.159  41.120  5.335  60.950 
8  1.058  137.860  6.589  41.850  5.606  62.170 
9  0.645  167.020  7.443  44.580  6.662  53.050 
CV: coefficient of variation (%).
Nugget effect was overestimated in simulations using lower values of this parameter, but underestimated when higher values were used. The best estimates were achieved in random design as it was one of the sampling designs. Nugget effect estimates were on average close to the nominal value, showing less variability (Table 1) and lower standard errors (Figure 1b).
Lattice plus close pairs was considered the second best design in terms of nugget effect estimation, as it also showed on average nugget effect estimates closer to the nominal value (Table 1) and low standard error estimates (Figure 1c). But unlike the previous one, this model showed a greater variability of estimates compared to those of the random design.
All sampling designs in this study generated similar range and sill estimates. In this sense, relevant differences in C_{0} estimates influenced directly RNE and inversely SDI index calculations. When considering RNE and SDI indexes and all simulations, the lowest spatial dependence was observed in systematic sampling if compared with random design and lattice plus close pairs. Estimations of RNE and SDI indexes closer to the nominal values were obtained by random design.
These results corroborate the conclusion of ^{Kestring et al. (2015)}, ^{Zhao et al. (2016)}, and ^{Bussel et al. (2016)}, who claimed who stated about sample design and size influences on geostatistical model estimation.
Table 2 also presents a descriptive summary of average kriging variance (
Nominal value of (C_{0})  Estimated Measures  Systematic  Random  Lattice Plus Close Pairs  

Mean  CV (%)  Mean  CV (%)  Mean  CV (%)  
0 

2.593  23.490  2.900  15.300  2.647  19.960 
SSD  11090  31.770  20110  43.160  20960  45.890  
1 

3.251  25.220  3.893  15.760  3.462  21.890 
SSD  9533  32.980  17560  45.540  18890  47.630  
2 

3.941  25.270  4.786  15.860  4.288  22.340 
SSD  7999  35.230  15140  48.170  16750  49.710  
2.5 

4.304  24.990  5.208  15.870  4.694  22.180 
SSD  7230  36.820  13990  49.620  15700  50.790  
3 

4.683  24.320  5.616  15.870  5.112  21.690 
SSD  6452  38.190  12890  51.170  14610  52.080  
4 

5.434  23.090  6.339  18.790  5.898  20.750 
SSD  4970  42.08  10770  54.710  12580  54.440  
5 

6.161  22.030  6.996  21.400  6.571  22.520 
SSD  3629  48.870  8778  58.950  10660  56.970  
6 

6.771  26.010  7.656  21.270  7.041  28.610 
SSD  2320  61.810  6978  63.240  8844  60.290  
7 

7.135  31.270  7.784  31.640  7.674  28.090 
SSD  1300  87.090  5326  68.010  7192  63.290  
7.5 

7.335  32.810  7.978  33.400  7.652  33.050 
SSD  905.700  109.580  4593  71.870  6373  65.920  
8 

7.160  40.890  7.580  45.270  7.328  42.580 
SSD  595.900  145.294  3925  73.030  5668  67.430  
9 

5.541  77.450  5.384  88.200  5.384  88.180 
According to ^{Cressie (2015)}, georeferenced variables can be decomposed into two random terms: a secondorder stationary process and a whitenoise measurement process. In this case, when interpolation is done at an unsampled point, variance estimate exceeds stationary variance by the amount of whitenoise measurement variance (corresponding to nugget effect variation) (^{Burgess & Webster, 2019}). Therefore, the nugget effect and kriging variation are directly related. Average kriging variance shows how efficient a spatial estimation of unsampled sites was, thus, the smaller the spatial estimation the more efficient it is. These results (Table 2) showed that the higher the nugget effect (i.e., the lower the spatial dependence degree of a georeferenced variable), the lower the efficiency of spatial estimation by kriging.
Kriging spatial estimation results in unsampled sites for simulations with C_{0} between 0 and 8 were compared to those obtained for simulation with C_{0} equal to 9 (Table 2 and Figure 2) by the sum of squared difference (SSD) (Table 2). In all sample designs, as the nugget effect increased, SSD between spatial estimates decreased.
This result indicates that the closer the nugget effect values are in a geostatistical model for spatial estimation, the more similar their results are in terms of estimation, regardless of the sampling design. The geostatistical model used for comparison (C_{0} = 9) represents a georeferenced variable with pure nugget effect, that is, without spatial dependence. Thus, the results of SSD (Table 2) showed that the stronger the spatial dependence of a georeferenced variable (lower C_{0} value), the greater the dissimilarity thereof with a georeferenced variable without spatial dependence, as far as spatial prediction is concerned.
Kriging equations depend on semivariance function, especially for nugget effect. Higher values of this parameter imply higher kriging weight for distant samples, which, in turn, produces a smoother thematic map (^{Bassani et al., 2018}).
With C_{0} changes, the systematic design showed less sensitivity to spatial estimation (lowest SSD (Table 2), underestimating nugget effect (Table 1)).
These results evidenced the importance of accurately modelling the nugget effect, considering its association with kriging estimation and sampling design. For high nugget effect values (concerning its sill), it is therefore recommended to consider a sample design with many shortdistance sites (^{Chipeta et al., 2016}), to minimize uncertainties in semivariance function parameters and kriging estimates (^{Wadoux et al., 2019}).
Figure 2 shows the box plot graphs for each sampling design, with similarity measure OA by comparing spatial estimations, considering estimates from data simulated using nugget effect equals to 9 (as reference for mapping) and spatial estimation from data simulated using the other values of nugget effect (as a model maps).
In both random (Figure 2b) and lattice plus close pairs (Figures 2c) designs, the increased nugget effect in the simulations generated a high percentage of simulations with low similarity between spatial estimates (OA < 0.85, ^{De Bastiani et al., 2012}). In considering C_{0} = {3, 4, 5, 6, 7.5} (moderate spatial dependence; ^{Cambardella et al., 1994}) and C_{0} = {0, 1, 2, 2.5} (strong spatial dependence; ^{Cambardella et al., 1994}), the spatial estimates of these sampling designs had low similarity in 100% of simulations with OA values.
For systematic sampling (Figures 2a), the findings described above are valid for all simulations considering the nugget effect values that describe the georeferenced variable as having weak spatial dependence; and most nugget effect values that consider the georeferenced variable as having moderate spatial dependence.
The systematic sampling showed the highest measurements of accuracy compared with the others, especially for C_{0} above 5. For C_{0} = 8, which belongs to the same classification as spatial dependence intensity for the data simulated with nugget effect equal to 9, simulations increased (20%) with OA ≥ 0.85, which indicates high similarity between spatial estimations carried out at nonsampled sites (^{De Bastiani et al., 2012}).
Based on the results, the systematic design had lower sensitivity to the spatial estimation process, with changes in nugget effect value, when compared with the random sampling and lattice plus close pairs. This is due to the poor quality of nugget effect estimation in systematic sampling (Table 1 and Figure 1), which hence produced a low quality in kriging estimation.
SPATIAL VARIABILITY ANALYSIS OF SOIL CHEMICAL PROPERTIES
Descriptive statistics for carbon (C), calcium (Ca), magnesium (Mg), and pH are given in Table 3. In analyzing these values, we can note that all parameters presented data with low dispersion and homogeneity. According to crossvalidation criteria, the Gaussian model was the bestestimated model of the semivariance function for C, Ca, and Mg. As for pH, the bestestimated model was the exponential one. The spatial dependence radius for C, Ca, Mg, and pH were 254.90, 639.90, 685.47, and 300 m, respectively.
Parameter  Mean  Model  Nugget (C_{0})  Sill (C_{1})  Range (a)  SDI  RNE 

Carbon (C)  26.93  Gaussian  5.512  4.994  254.900  6.92%  52.47% 
(3.25)  (1.063)  (1.993)  (64.207)  
Calcium (Ca)  5.20  Gaussian  1.432  0.5859  639.900  10.60%  70.96% 
(1.37)  (0.214)  (0.422)  (269.636)  
Magnesium (Mg)  2.25  Gaussian  0.401  0.076  685.474  6.25%  84.03% 
(0.69)  (0.059)  (0.066)  (416.130)  
pH  5.10  Exponential  0.099  0.053  300.000  8.26%  64.87% 
(0.39)  (0.027)  (0.033)  (264.550) 
All parameters showed nugget effect values higher than sill ones. If we associate this with simulation results, we could notice that these high nugget effect values for all parameters generate loss of efficiency in spatial estimation by kriging. However, as reported by ^{Webster & Oliver (2007)}, such values should not be disregarded because the model must be correctly estimated and incorporate the estimated nugget effect value. Furthermore, in analyzing spatial dependence degree, the parameters C, Ca, and pH are classified as with moderate (25% < RNE ≤ 75%) while Mg as with weak (RNE > 75%) (^{Cambardella et al., 1994}).
Based on quartiles, the same analysis was performed for SDI since it makes up the strictest index to evaluate degree of spatial dependence. This approach was used to rank C, Ca, and Mg within a weak spatial dependence and pH within a moderate degree (^{Seidel & Oliveira, 2014}). When comparing the two indices, we found a difference for C and Ca, which is related to the range value and the estimated model.
Figure 3 displays the thematic maps of estimates for each factor. We observed that the thematic maps for carb2on (Figure 3a) and pH (Figure 3d) had less smoothing as for distribution of estimates in the area, whereas calcium (Figure 3b) and magnesium (Figure 3c) had more smoothed maps.
Calcium and magnesium showed weak spatial dependence and nugget effect values higher than those of sill (Table 3). Such finding emphasizes the influence of nugget effect on the spatial estimation of georeferenced parameters in nonsampled sites, and as a result, a greater thematicmap smoothing for high nugget effect values.
According to ^{Webster & Oliver (2007)} and ^{Hofmann et al. (2010)}, increasing values of nugget effect provide an improved distribution of weights in spatial estimation, which hence generates smoother thematic maps. The nugget effect can be decreased by shortening gaps between samples, i.e., increasing sample density (^{Kestring et al., 2015}); however, it is usually unfeasible for soil properties due to the costly requirements.
The spatial variability map for carbon (Figure 3a) showed that lower carbon contents are located midwestern, while high levels are mainly within the northern region. The thematic maps for calcium (Figure 3b) and pH (Figure 3d) presented reduced levels mainly within the southern region. Figure (3c) presents the thematic map for magnesium, in which can be seen that the northern and southern regions have the lowest magnesium contents of the studied area.
CONCLUSIONS
The behavior of nearorigin semivariograms described by the nugget effect strongly affects spatial estimation by kriging of georeferenced parameters at unsampled sites. Nugget effect has negative influence on stability of this type of modeling, that is, the higher the nugget effect, the lower the kriging efficiency of spatial prediction. Nugget effect was also directly related to smoothed kriging estimates.
Systematic sampling exhibited the least accurate nugget effect estimation, the worst efficiency of spatial estimation, and the lowest degree of sensitivity to spatial changes as the nugget effect changes. Given the changes in nugget effect values, the best parameter estimates of the geostatistical model and estimates at unsampled sites occurred with the use of random design, followed by the lattice plus close pairs.
High nugget effect value was observed for all soil chemical properties. Low carbon contents were found in the midwestern region, while in the northern they were high. Calcium and pH presented the lowest values in the southern region, and the lowest magnesium levels were observed in the northern and southern regions.