INTRODUCTION
Knowledge of spatial variability of soybean yield and its relationship to soil chemical properties are essential for proper crop management (^{Sobjak et al., 2016}). However, many precision agriculture users get disappointed trying to find the ideal variablerate application of the nutrient based on the prescription map, because does not always correspond to the soybean yield map generated after the intervention. One of the main explanations is related to complexity of the soil, which is considered a dynamic system whose functionality arises from interactions between physical, chemical, and biological components that are specific for each crop (^{Pereira et al., 2016}). Thus, a localized management system for soybean requires accurate information on spatial variation and the interaction between soil chemical properties and their relationship to yield (^{Pagani and Mallarino, 2015}; ^{AlKaisi et al., 2016}; ^{Dalla Nora et al., 2017}).
One of the methods used in such characterization is geostatistics, which takes soil spatial variation patterns into account and provides techniques that enable construction of maps associated with one or more soil chemical properties (^{Cressie, 2015}). One of the different methods used in geostatistical studies is spatial linear models, which have been widely evaluated by assuming a Gaussian stochastic process (^{UribeOpazo et al., 2012}; ^{Grzegozewski et al., 2013}; ^{Nesi et al., 2013}; ^{De Bastiani et al., 2015}). This modeling enables estimation of spatial dependence parameters through the maximum likelihood method (ML), making inferential studies possible.
However, ^{Assumpção et al. (2014)} and ^{Schemmer et al. (2017)} reported on the sensitivity of a Gaussian distribution for atypical values and errors with heavier tails than normal and stated that such modeling may generate unrealistic maps. Thus, one of the alternatives to avoid this problem would be to use more robust models based on heavier tails. Following this line of reasoning, multivariate slash distribution is quite attractive because it has an additional parameter, here designated as η (0 < η < 1) which allows kurtosis adjustment, making the modeling more flexible in the presence of atypical values (^{Osorio et al., 2009}; ^{Alcantara and Cysneiros, 2013}). This distribution was first introduced by ^{Lange and Sinsheimer (1993)}, belonging to the class of scale mixtures of normal distribution. The basic idea behind this distribution class is to insert randomness in the covariance matrix, as well as in the mean vector of the multivariate normal distribution, through a strictly positive mixture variable. This allows generalization of a multivariate normal distribution, preserving the main properties.
Although it creates robust models, spatial modeling based on the slash distribution may be affected by influential observations. Therefore, it is important to evaluate its sensitivity through influence diagnostics, which may be carried out by different evaluation methods, one of them being local influence assessment. The aim of this assessment is to evaluate the goodness of fit of the model, and the robustness of its estimates when small perturbations are introduced in the model and/or in the dataset (^{Cook, 1986}; ^{Zhu and Lee, 2001}; ^{Jonathan et al., 2016}).
Regarding georeferenced data, ^{UribeOpazo et al. (2012)} evaluated the sensitivity of covariance function estimators and linear predictors under small dataset perturbations and/or a spatial linear model with a normal distribution. ^{Assumpção et al. (2011)} presented techniques for local influence for spatial analysis of soil physical properties and soybean yield using Student's tdistribution. ^{Grzegozewski et al. (2013)} considered a Gaussian spatial linear model (GSLM) and information about soil macro and micronutrients in evaluating the effects of influential observations on spatial model selection, parameter estimation by maximum likelihood, and characterization of spatial continuity of soybean yield. In these studies, the authors emphasized that influential values can modify the interpolated maps and may generate inaccurate predictions.
The aim of this study was to propose a spatial linear model based on the slash distribution (SSLM) to characterize the spatial variability of soybean yields as a function of soil chemical properties.
MATERIALS AND METHODS
Experimental site
We gathered data from a field experiment set up in a commercial grain production area in which the notillage system has been practiced since 1994. The area is located in the municipality of Cascavel, in the west of the state of Paraná, Brazil. It lies at the geographical coordinates of approximately 24.95° S, 53.57° W, with an average altitude of 650 m. Soil at the location is classified as a Latossolo Vermelho Distróférrico (^{Santos et al., 2013}), which corresponds to Rhodic Hapludox (^{Soil Survey Staff, 2014}), with an average slope of around 4 %. The temperate climate is mesothermal and highly humid, classified as Cfa (Köppen system); and mean annual temperature is 21 °C. Samples were taken during the 2014/2015 crop season in a 127.18 ha area. For this study, we used a lattice plus close pairs design (^{Diggle and Ribeiro Junior, 2007}), with a distance of 141 m between points belonging to the regular grid and, in some random locations, shorter distances of 75 and 50 m between points, thereby obtaining 78 locations. All the collected data was georeferenced by a GPS (Global Positioning System) receiver, Geoexplorer^{®} 3 (Trimble^{®}), under a UTM coordinate system, zone 22 south and datum WGS 84 (Figure 1).
The response variable considered in the model was soybean yield, which was estimated by the quantity of grains harvested in an area of 0.90 m^{2} centered on each georeferenced point. After harvest, the grain from each plot was weighed and moisture level was corrected to 13 %; this weight was then converted into t ha^{−1}. As explanatory variables (covariates), we selected P (mg dm^{3}), K (mg dm^{3}), pH, and organic matter (OM) (g dm^{3}). Five samples were collected at a depth of 0.000.20 m to determine the chemical contents.
These samples were taken near each georeferenced point and were then mixed in order to have a representative composed sample for each plot. Analysis was carried out in the laboratory of the Cooperativa Central de Pesquisa Agrícola (Coodetec, Brazil), whose methodology is available in ^{Costa and Oliveira (2001)}.
These covariates were chosen based on the following considerations: (i) high kurtosis associated with indexes outside the range of −2 to +2 (^{Casella and Berger, 2010}); (ii) structure of spatial dependence of soil chemical properties, identified from the construction of omnidirectional experimental semivariograms using the Matheron estimator (^{Soares, 2014}; ^{Cressie, 2015}); (iii) P increases crop yield at early reproductive stages and is applied in large amounts since Brazilian soils are poor in this element; (iv) K plays an important role in plant photosynthesis and respiration, assists in formation of starch and sugars, and produces vigorous and resistant plants; (v) Brazilian soils are acidic (low pH), which limits crop yield; and (vi) OM provides nutrients for sustainable crop production and increases water infiltration into the soil, reducing erosion (^{Santos et al., 2013}).
Spatial modeling and parameter estimation
In order to build the data model from the spatially correlated variables, we considered a stochastic process {Y(s_{i}), s_{i}) ∈ S} defined in a region S ⊂ ℛ^{2}, where each element Y (s_{i}) related to soybean yield has known locations s_{i}, i = 1, …, n. We also consider the process as stationary, wherein Y = (Y (s_{i}), …, Y(s_{n}))^{T} follows a multivariate slash distribution (^{Lange and Sinsheimer, 1993}) in which the second moment is finite, being that Y~SL_{n} (Χβ, Σ, η), where E(Y) = Χβ and Coν(Y) = Σ. In addition, X is a matrix formed by the vector 1's and the covariates P, K, pH, and OM; β = (β_{o}, …, β_{q})^{τ}, the vector of unknown parameters associated with each covariate; and η, the parameter of kurtosis adjustment. Under these conditions, the SSLM was expressed by equation 1:
in which ε(s_{i}) is the random [ε~SL_{n} (O, Σ, η)].
Spatial dependence was determined by the covariance matrix Σ, whose parametric form is given by Σ = ϕ_{1} l_{n} + ϕ_{2} R(ϕ_{3}), where l_{n} is the identify matrix; ϕ_{1} ≥0 is the nugget effect; ϕ_{2} ≥0 is the partial sill; ϕ_{3} is the parameter related to the spatial dependence radius (range); and R(ϕ_{3}) = [(r_{ij})] is the correlation matrix for the soybean yield dataset observed between the points located at s_{i} and at s_{j} (^{UribeOpazo et al., 2012}).
We considered the Matérn family covariance function (^{Matérn, 1986}) to explain spatial dependence with the smoothness parameter κ ∈ {0.5, 1.5, 2.5}, whose relationship to the practical range is given by 3 ϕ_{3}, 4.75 ϕ_{3}, and 5.92 ϕ_{3}, respectively (^{Diggle and Ribeiro Junior, 2007}).
We estimate the unknown vector of parameters of the model θ = (β^{T}, ϕ^{T})^{T}, with ϕ = (ϕ_{1}, ϕ_{2}, ϕ_{3})^{T}, by maximizing the loglikelihood function (ML), whose form is expressed by the equation 2:
in which a = (n/2 + η^{−1}) and b = c(η)δ/2, with δ = (Y  Χβ)^{T}
Σ^{−1} (Y – Χβ) and c(η) = (1 – η)^{−1}, for 0 < η < 1, where η stands for the parameter of kurtosis adjustment. Here, the ML method defines the estimator
The ML estimator was calculated using an expectationmaximization (EM) algorithm, which consists of a computational method that generates approaches iteratively by maximizing expectation of the logarithm of the likelihood function for the completed data set, called the Qfunction, which is expressed by the equation 3:
in which
To avoid identifiability problems in ML estimations, the parameters of kurtosis adjustment (η) and smoothing (κ) of the Matérn family model were determined through crossvalidation (CrV) (^{De Bastiani et al., 2015}) and Trace criterion (Tr) (^{Kano et al., 1993}).
Significance (hypothesis) testing on the estimated parameters β_{s} was carried out by the likelihood ratio statistic (LR), given by the equation 4:
where
Local influence analysis
After defining the parameters, an interpolation map was built by regressionkriging in order to visualize soybean yield variability as a function of the covariates (P, K, pH, and OM) (^{Soares, 2014}). To investigate the presence of observations that may have interfered in the process of interpolation, a local influence analysis was performed.
As proposed by ^{Cook (1986)}, we examined possible influential observations on the soybean yield response variable by considering the likelihood displacement (Equation 2) after we insert a perturbation vector ω (noise). We used the expression
In a similar manner, influence analysis was performed on the Qfunction (Equation 3), which was considered the reference measure h_{[Q]max}. For more details, see ^{Zhu and Lee (2001)} and ^{Assumpção et al. (2011}, ^{2014}).
To evaluate the influence on the linear predictor, we considered the methodology presented by ^{Assumpção et al. (2011)} applied to the SSLM (Equation 1). In this case, maximum influence directions were denoted by L_{p} and Q_{p}. Analysis was carried out using R software 3.3.2 for Windows (^{R Development Core Team, 2016}).
RESULTS AND DISCUSSIONS
Table 1 shows the exploratory data analysis for the response variable and covariates, i.e., the soil chemical properties (P, K, pH, and OM). Average soybean yield was 2.37 t ha^{−1}, which is lower than the national average (3.03 t ha^{−1}) for the 2014/2015 season (^{IBGE, 2016}). Standard deviation and coefficient of variation were 0.87 t ha^{−1} and 11.48 %, respectively, exhibiting low data dispersion around the mean and homogeneity (^{Carvalho et al., 2003}).
Variable  Y  P  K  pH(H_{2}O)  OM 

t ha^{1 (1)}  mg dm^{3}  mg dm^{3}  g dm^{3}  
Mean  2.37  19.19  0.31  4.82  50.63 
Standard deviation  0.07  123.56  0.02  0.17  40.05 
Coefficient of variation (%)  11.48  57.91  45.35  8.43  12.50 
Skewness  0.51  1.34  0.60  1.08  0.02 
Kurtosis  3.11  4.73  2.44  4.31  2.44 
^{(1)}The units of measure refer only to the mean and standard deviation. P and K: extractor Mehlich1; pH(H_{2}O): pH in water at a ratio of 1:2.5 v/v; OM: organic matter (^{Walkley and Black, 1934}).
Quartile analysis indicated an atypical value of 3.18 t ha^{−1}, corresponding to sample 33. Standard analysis of the directional semivariograms (omitted here) towards 0°, 45°, 90°, and 135° showed similar behavior for all directions, indicating that the spatial dependence structure is isotropic.
Descriptive statistics of soil chemical properties (Table 1) indicate very high levels of P and K for soybean (^{Costa and Oliveira, 2001}). Moreover, the standard deviation and coefficient of variation for both P and K suggested average dispersion around the mean and heterogeneity (^{Warrick and Nielsen, 1980}). High variations in P and K contents may be due to continuing fertilization at a fixed rate along the plant row, affecting micro and macro spatial variability (^{Amado et al., 2009}). Another factor may be the mobility of these chemical elements in the soil. For example, as a monovalent cation, K is easily leached absorbed, fixed, adsorbed, or stabilized in the soil solution, generating large variability. Therefore, these variations may reflect the effects of the complex interaction between soil chemical properties and crop management practices (^{Bottega et al., 2013}; ^{Pereira et al., 2016}).
Soil pH was classified as highly acidic (from 4.31 to 5.00), with low dispersion around the mean. Organic matter contents, in turn, were rated as high (from 35.01 to 60.00 g dm^{3}), with a coefficient of variation of 12.50 % (^{Warrick and Nielsen, 1980}; ^{Costa and Oliveira, 2001}). Large amounts of OM and high acidity might be associated with the notillage system (NTS) implemented in the area since 1994. The amount of straw increases on the soil surface in areas under NTS for long periods, which shifts the quantity and quality of OM and gradually alters pH due to basic cations and soluble organic carbon (^{Dalchiavon et al., 2013}). In addition, NTS may have limited the action of liming, restricting it to the areas of application, i.e., in the surface soil layers (^{Nunes et al., 2011}; ^{Paganiand Mallarino, 2015}; ^{Dalla Nora et al., 2017}).
All variables showed high kurtosis (Table 1) (^{Casella and Berger, 2010}). A process modeled under this condition may produce a model of interpolation with biased parameters and, consequently, generate overestimated/underestimated regions in the map. Our proposed model based on the slash distribution is able to reduce this impact by determining the shape parameter η, which will allow kurtosis adjustment of the data.
All the samples and the SSLM were considered in studying the dataset, assuming Y~SL_{n} (Χβ, Σ, η). For comparison, a Gaussian spatial linear model (GSLM) was also used, in which Y~N_{n} (Χβ, Σ). Thus, we could assess the model robustness and the effect of outliers and/or influential values on parameter estimates and mapping. Both criteria, CrV and Tr, indicated that for an SSLM, the adequate value for kurtosis adjustment was η = 0.25, and for the Matérn model, smoothing was κ = 0.25. For a GSLM, an adequate model for explaining spatial dependence was also the Matérn model, with a smoothing parameter of κ = 0.25.
Table 2 shows the parameter estimated by the ML estimator. The asymptotic standard errors (in brackets) were calculated from Fisher's information matrix. It may be noted that the estimated values of the parameter vector β were similar in both models, with the lowest standard deviations observed for the GSLM, except for
Model  Structure  Estimated parameter(1)  










GSLM  Matérn κ = 2.5  2.030  0.003  0.121  0.033  0.003  0.053  0.018  52.666 
(0.412)(2)  (0.003)  (0.228)  (0.076)  (0.005)  (0.059)  (0.059)  (0.007)  
SSLM η = 0.25  Matérn κ = 2.5  1.993  0.003  0.109  0.034  0.004  0.051  0.026  81.988 
(0.516)  (0.003)  (0.288)  (0.095)  (0.006)  (0.036)  (0.013)  (0.006) 
^{(1)}β_{i}: parameters associated with the variable i = [soybean yield (Y), P, K, pH, and OM];
^{(2)}The lowest values of the asymptotic standard deviations are underlined.
It is noteworthy that the large values of coefficient of the parameters were associated with K (
Another issue that may be related to pH is the low value of the
Comparing values estimated for parameters related to the covariance of the spatial process, relevant differences were found, with the lowest standard deviations being registered in the SSLM. By calculating the relationship ϕ_{1}/(ϕ_{1} + ϕ_{2}), we found that SSLM had an index of 0.66, while GSLM had an index of 0.75. The smoothing parameter was κ = 0.25 for both models, and this confirms that the practical range of SSLM was 485.37 m, whereas that of GSLM was merely 311.78 m. For ^{Cambardella et al. (1994)}, the best estimates are reached when models are based on covariance functions with the lowest “nugget effect/sill” ratio and the highest range.
Using data from table 2, we were able to establish the GSLM of the soybean yield at S_{i} for the area under study, which was expressed by equation 5:
with i = 1, …, 78 and the covariance matrix given by
with i = 1, …, 78 and the covariance matrix given by
In both cases, the correlation matrix elements R were determined by the correlation function of the Matérn family with κ = 0.25.
The hypothesis test to assess the significance of parameter vector β was applied jointly and individually for both models. Table 3 displays the results of the LR and respective pvalues. The null hypothesis β_{1} = β_{2} = β_{3} = β_{4} = 0 was rejected at 5 % probability for both models. Individual testing also showed the parameter values of (β's) as significant. Therefore, the covariates P, K, pH, and OM remained in the modeling.
Hypothesis(1)  GSLM  SSLM  

LR  pvalue  LR  pvalue  
β_{1} = 0  6.687  0.036*  11.659  0.004* 
β_{2} = 0  7.001  0.032*  15.215  0.001* 
β_{3} = 0  5.953  0.050*  8.567  0.016* 
β_{4} = 0  6.257  0.044*  9.113  0.013* 
β_{1} = β_{2} = β_{3} = β_{4} = 0  7.865  0.039*  13.897  0.003* 
^{(1)}Parameters associated with the variable: β_{1}  P, β_{2}  K, β_{3}  pH, and β_{4}  OM.
^{*}significant at 5 % probability.
Local influence was diagnosed to evaluate the sensitivity of estimates from the model and from the predictive process to atypical and/or influential values. For GSLM, we used a perturbation scheme fitted to a normal distribution, as presented by ^{De Bastiani et al. (2015)}. For the GSLM, the samples exerting influence on the response variables were 15 and 72 (Figure 2a). Conversely, for the linear predictor, the number of influencing samples was higher, namely, samples 15, 58, 65, and 72 (Figure 3a).
The figure regarding local influence on the response variable perturbing the SSLM shows that no observations were identified when considered measure h_{[L]max} (Figure 2b). However, sample 71 was influential for measure h_{[Q]max} (Figure 2c). Considering perturbation of the linear predictor, samples 1 and 71 were influential (Figure 3b) when measure L_{p} was used. And, for perturbation of the Qfunction, measure Q_{p}, samples 1, 3, 62, and 71 were identified as influential (Figure 3c).
It is noteworthy that atypical sample 33 was not identified as influential on soybean yield response in any of the cases. These results corroborate those of ^{UribeOpazo et al. (2012)}, who used a GSLM, tested different influence diagnostics techniques, and observed that not every atypical value could have an influence on model determination in a covariate study. ^{Assumpção et al. (2014)}, based on Student's tdistribution, also found differences between the atypical and the influential values; they highlighted the relevance of the local influence method as opposed to a simple boxplot analysis. Furthermore, the smaller number of influential cases found here when SSLM is fitted is consistent with the results of ^{De Bastiani et al. (2015)}; these authors used distributions with heavier tails and obtained greater modeling robustness.
Figure 4 displays the postplot graph of the experimental area, highlighting the position of each influential sample. For GSLM (Figure 4a), the yield value of sample 72 was from 1.87 to 2.18 t ha^{−1} and the yield values of samples 58 and 65 were from 2.18 to 2.33 t ha^{−1}. Note that nearest observations of these samples showed soybean yield highs values. Sample 15 was found within a region with high values. When considering SSLM (Figure 4b), we found that the values of samples 1, 62, and 71 were from 2.55 to 3.18 t ha^{−1} and that of sample 3 was from 2.33 to 2.55 t ha^{−1}. Spatial analysis of the samples showed that neighboring observations to the left of sample 71 had values lower than the values of samples 1 and 3, which are close to the contour of the plotting domain.
In diagnostic analysis, when one or more influential values are detected, they are removed from the dataset to understand how their removal affects model selection, parameter estimates, and construction of the interpolation maps (^{Assumpção et al., 2011}). For GSLM, samples 15 and 72 were removed from the dataset since they were identified as influential when applying measures h_{[L]max} and L_{p}. Using the same criterion for SSLM and considering measures h_{[L]max}, h_{[Q]max}, L_{p}, and Q_{p}, samples 1 and 71 were removed. Table 4 shows the values of η and κ, which were selected by CrV and Tr criteria; this table also shows the estimated values of parameters for both models after sample removal. Furthermore, it is notable that there was no change in the choice of the model to describe spatial dependence. Nevertheless, after removing the influential samples, the estimated values of parameters changed, and especially asymptotic standard deviation showed higher values compared to the model without sample exclusion (Table 2). Moreover, we note a reduction in ϕ_{1} values and in the estimated spatial dependence radius, which is 170.73 m for GSLM and 348.57 m for SSLM.
Model  Structure  Estimated parameter(1)  










GSLM  Matérn κ = 2.5  2.066  0.003  0.102  0.039  0.002  0.000  0.074  28.840 
(0.423)(2)  (0.003)  (0.238)  (0.078)  (0.006)  (0.986)  (0.987)  (0.003)  
SSLM η = 0.20  Matérn κ = 2.5  2.172  0.004  0.164  0.053  0.002  0.050  0.010  58.880 
(0.493)  (0.003)  (0.277)  (0.009)  (0.006)  (0.030)  (0.007)  (0.003) 
^{(1)}β_{i}: parameters associated with the variable i = [soybean yield (Y), P, K, pH, and OM];
^{(2)}The lowest values of the asymptotic standard deviations are underlined.
Figure 5 shows the interpolated maps of yield as a function of the covariates studied using regressionkriging, and the estimated values of parameters are shown in tables 2 and 4. The maps were generated based on the following scenarios: S1 using the entire dataset and the GSLM (Figure 5a); S2  using the entire dataset and the SSLM (Figure 5b); S3  removing the influential observations for GSLM (Figure 5c); and S4  removing the influential observations for SSLM (Figure 5d). Four classes of the same size were considered to build the interpolated maps; these classes were obtained by dividing the estimated range of yield variation.
Comparing figure 5b to 5a, similar regions could be observed. However, the SSLM generated map (with
To quantify the similarity between the interpolated maps, the global accuracy (GA) and kappa indexes were used. According to the classification of ^{Krippendorff (2004)}, maps show low similarity if kappa < 0.67, medium similarity if 0.67 ≤ kappa ≤ 0.80, and high similarity if kappa > 0.80. With respect to GA, as stated by ^{Anderson et al. (1976)}, maps are similar if the GA index is greater than 0.85. Table 5 shows the values associated with each index. Comparing scenario S_{1} to S_{2}, the kappa index indicated medium similarity, whereas the GA index assigned similarity to the maps. However, comparing S_{1} to S_{3}, the index values mark dissimilarity between the maps, i.e., upon removal of samples 15, 58, 65, and 72, the map outlines changed. In the case of comparing S_{2} to S_{4}, the maps were similar.
Maps analyzed  Index  

kappa  Global accuracy (GA)  
S_{1} against S_{2}  0.78  0.88 
S_{1} against S_{3}  0.59  0.67 
S_{2} against S_{4}  0.87  0.86 
S_{1}: using the complete dataset and the GSLM; S_{2}: using the complete dataset and the SSLM; S_{3}: removing the influential observations for the GSLM; S_{4}: removing the influential observations for the SSLM.
Therefore, the accuracy indexes of the maps showed the lower sensitivity of those generated by the slash distribution when removing an influential value, in contrast to a normal distribution. Thus, the SSLM enabled more robust modeling in the presence of influential observations, avoiding unnecessary exclusion of samples. From the interpolated maps, the yield potential of the area and its relationship to subregions can be investigated. This allows definition of better soil management and production system strategies, such as acidity correction.
CONCLUSIONS
The map of soybean yield variability as a function of soil chemical properties generated from the SSLM was less sensitive to the high kurtosis of the data set and the presence of influential and/or atypical values, which shows the robustness of the proposed model.
Diagnostic of local influence on the response variable and on the linear predictor based on the SSLM confirmed that an atypical value might not be influential since spatial modeling takes the position of the variable within the space into account. This finding may be a determining factor in the prediction process, avoiding exclusion of samples when using this method of modeling.