GAUSSIAN SPATIAL LINEAR MODEL OF SOYBEAN YIELD USING BOOTSTRAP METHODS

This study aims to quantify the uncertainties associated to the parameters of a Gaussian spatial linear model (GSLM) and the assumption of normality residuals in the modeling of the spatial dependence of the soybean yield as a function of soil chemical attributes. The spatial bootstrap methods were used to determine the point and interval estimators associated with the model parameters. Hypothesis tests were carried out on the significance of the model parameters and the quantile-quantile probability plot was elaborated to verify the data normality. The uncertainties associated to the parameters of the spatial dependence structure were quantified and the potassium content, phosphorus content and soil pH covariates were significant to explain the soybean yield mean. These covariates were used in the elaboration of a new model, which provided the elaboration of a contour map of soybean yield. Analysis of the quantile-quantile plot indicated that soybean yield data follow a normal probability distribution.


INTRODUCTION
Soybean is the most important legume species in the world (Gallon et al., 2016) and it stands out in the Brazilian agribusiness for its high productive potential in different regions (Cruz et al., 2016).Soybean is widely used for the elaboration of animal feed and oil and as Ávila & Albrecht (2010) explained, it has been emphasized as an alternative in the prevention of diseases and in human food, being able to be transformed into several protein foods.
Among the studies involving soybean yield, we highlight the geostatistics methods used to detect the spatial variability in the crops (Borssoi et al., 2011;Kestring et al., 2015;Guedes et al., 2016).Although geostatistics allows the understanding of spatial variability of soybean yield, the samples used in the analyzes are generally few and sparse (Pardo-Igúzquiza & Olea, 2012), then there are uncertainties associated with the results obtained.
The uncertainties occur because the spatial linear model is estimated by parametric methods, such as those presented by Mardia & Marshall (1984).To quantify the uncertainties associated with the results of a geostatistics analysis, an alternative to traditional inference methods is the use of spatial bootstrap resampling (SB) (Solow, 1985) and parametric spatial bootstrap (PSB) methods (Tang et al., 2006), adaptations of the bootstrap method (Efron, 1979) for spatially dependent data.
The bootstrap method is well known and has been used in studies involving independent samples of soybean yield (Dalposso et al., 2016;Gupta & Manjaya, 2016).The bootstrap methods for spatially dependent data have been highlighted in the literature, due to the importance of uncertainty modeling in the analyzes, as can be observed in the works of Kang et al. (2008), Schelin & Sjöstedt-De Luna (2010), Olea &Pardo-Igúzquiza (2011) andPardo-Igúzquiza &Olea (2012).
The aim of this study was to use spatial bootstrap (SB) and parametric spatial bootstrap (PSB) methods to quantify the uncertainties associated with spatial dependence structure parameters and the assumption of residuals normality distributed in spatial dependence modeling of soybean yield with chemical attributes of the soil considered as covariates.

Study area and data
The data set was collected in the 2012/2013 agricultural year and comes from an area (Figure 1) of 167.35 ha located in the western region of Paraná, Brazil, near the municipality of Cascavel, with central coordinates defined by latitude 24º57'25''S and longitude 53º34'29''W, and average altitude of 714 m.According to the Köppen classification, the climate of the agricultural area is the Cfa type (Aparecido et al., 2016) and the soil is classified as Oxisol (Embrapa, 2009).

Consider a
Gaussian stochastic process , with , being the twodimensional Euclidean space.Suppose that the data of this process, , are recorded in known spatial locations ( , and generated by the following model: (1) where, : an vector of the variable of interest; : an vector that represents the mean of the process, and : an vector of errors, with zero mean vector and covariance matrix , where , for .
The mean vector can be written as a linear model , where is a vector of unknown parameters and is an matrix formed with covariates at site .According to Mardia & Marshall (1984) Bastiani et al., 2015).In order to identify the spatial dependence structure of soybean yield, the omnidirectional experimental semivariogram was constructed using the Matheron estimator (Cressie, 2015) and to model the structure of spatial dependence were used models of the Matérn family (Matérn, 1986), presented in [eq.( 2)]: (2) where, : Bessel function of the third order type ; : Gamma function, and : Euclidean distance between points and .
For second-order stationary Gaussian and isotropic processes, the semivariance function has the following relation: (Uribe-Opazo et al.,

2012), for , being and the variance of
In [eq.( 2)], the form parameter k controls the behavior close to the origin and the process analytical smoothing.In this study, we considered the fixed values (Diggle & Ribeiro-Jr., 2007).The parameters estimation of the models was carried out using the maximum likelihood (ML) method and the criteria for selecting the geostatistics model for the covariance matrix were the cross-validation (CrV), the trace of the asymptotic covariance matrix of the estimated mean (Tr) and the maximum value of the loglikelihood function (LML) (De Bastiani et al., 2015).To investigate the significance of covariates, the likelihood ratio test (Rao, 1973) was used, and the kriging with external drift was used to obtain the predicted values (Wackernagel, 1995).

Bootstrap for spatial data
In this study, the spatial bootstrap (SB) methods, proposed by Solow (1985) and presented in Algorithm 1 and parametric spatial bootstrap (PSB), proposed by Tang et al. (2006) and presented in Algorithm 2 were used.
Algorithm 1: Spatial Bootstrap (Solow, 1985).As Tang et al. (2006) explained, the BSP method does not uncorrelated the residuals vector as in the BS method.Instead, the residuals are generated independent from a standard normal distribution.The theoretical basis of the BSP method can be seen in the study of Sjöstedt-De Luna & Young (2003).

Quantification of uncertainties in geostatistics analysis
The Algorithms 1 and 2 were used to determine B = 1000 bootstrap samples from the soybean yield dataset.For each sample, a model was adjusted, which allowed to construct the empirical distribution of the parameters of the model and, consequently, to determine point and interval estimators of the parameters using the percentile bootstrap method (Efron, 1982).
To verify the assumption of normality, the uncorrelated residuals obtained by the spatial bootstrap method (Algorithm 1) were ordered and plotted versus the theoretical quantis of the standard normal distribution, resulting in a quantile-quantile plot (q-q plot).

Computational resources
The calculations performed in this study were developed in the R software (R Core Team, 2017).The criteria used to choose the best model, the likelihood ratio test, the bootstrap methods and the multivariate plots q-q plots were implemented by the authors.
The calcium (Ca) contents varied from 2.30 cmolcdm -3 to 13.10 cmolc-dm -3 and the magnesium (Mg) contents ranged from 0.85 cmolc dm -3 to 5.75 cmolc dm -3 (Table 1), and they were classified as high (Tomé Jr., 1997).Analyzing the classification of the coefficient of variation (CV) (Table 1) proposed by Pimentel Gomes (1985), the high dispersion of potassium (K) and phosphorus (P) contents and homogeneity of pH data are highlighted.
In Table 2, in all models of spatial dependence fitted for covariance matrix ( ), the estimated values of the nugget effect parameter ( ) were equal to zero, the estimated values of the contribution parameter ( ) were similar and the estimated values of the range parameter indicated a radius of spatial dependence ranging from 109 to 112 m.According to the criteria CrV, LML and Tr (Table 2), the best fit for the covariance matrix (Σ) was provided by the Matérn model with parameter k = 4.5.practical range; LML: maximum value of the logarithm of the likelihood function; Tr: trace of covariance matrix.The values in bold indicate the best fit for the covariance matrix according to the used criterion.
According to the fitted Gaussian linear spatial model (GSLM), the value associated with the potassium content (K) (Table 3) was the highest and presented a positive sign, indicating that it was the covariate that contributed the most to an increase in the average of the soybean yield (t ha -1 ).This result corroborates with the study of Dos Passos et al. (2015), considering that this nutrient is the second most absorbed by the soybean plant, being essential for the growth and development of the plants (Li et al., 2015).The estimates of the parameters associated to the covariates of phosphorus (P) and manganese (Mn) contents indicated that these covariates contributed little to the estimated soybean yield (Table 3).
The fact that the nugget effect is null indicates that, at small distances, spatial dependence is also small (Vieira & Gonzalez, 2003), meaning that the distances considered among the samples were adequate.
To evaluate the reliability of the parameter estimates, the descriptive statistics of the bootstrap replicates obtained by the BS and BSP methods were calculated and the Efron percentile confidence intervals were determined (Table 4).
From the B = 1000 models adjusted by the BS method, two models (0.2%) were excluded.By the BSP method, only one model (0.1%) was excluded.These exclusions were carried out due to, in these settings; the radius of spatial dependence was greater than the maximum distance between samples (1766 m) in the monitored area.As pointed out by Dalposso et al. (2009), the models that present this behavior should be discarded, considering information that goes beyond the area under study.
For both bootstrap methods, the confidence intervals for the parameters associated with calcium (CA), magnesium (Mg) and manganese (Mn) contents were zero, indicating that these covariates may not be significant (Table 4).
Although the zero value is contained in the confidence intervals of the bootstrap replicates of the contribution ( ) (Table 4), there is evidence to assume that the contribution is not null, since in addition to the value obtained in the original sample, 0.18, 94% of the bootstrap replicates obtained by the SB method and 95.1% of the bootstrap replicates obtained by the PSB method presented contribution different than zero.Despite the bootstrap methods used to generate replicates of the soybean yield data set, some models provided high range parameters, emphasizing the positive asymmetry (Table 4).
When comparing the standard deviations of the model parameters (Table 3), with the respective standard deviations obtained by the bootstrap (Table 4), except for the standard deviation of the range parameter ( ), the others presented similar values.In the analysis of significance of the GSLM parameters (Table 5) the hypotheses , and are not rejected at 5% significance, demonstrating that spatial bootstrap methods are an alternative to test the individual effect of covariates.
Due to the parameters associated with calcium (Ca, cmolc dm -3 ), magnesium (Mg, cmolc dm -3 ) and manganese (Mn, mg dm -3 ) contents were not significant (Table 5), a new spatial linear model considering potassium (K, mg dm -3 ), phosphorus (P, mg dm -3 ) and soil pH was adjusted, which presented significant parameters.According to the CrV, LML and Tr criteria, the best fit for the GSLM was provided by the Matérn model with parameter of k form = 4.5 (Table 6).TABLE 6. Parameters estimated by ML for the linear spatial model considering the covariance function Matérn with form parameter k = 4.5 and the covariates K, P and pH.
Comparing the complete model (Table 3) with the model elaborated with the covariates identified as significant (Table 6), the value associated with potassium content (K) remained high and positive.The value near zero of the parameter associated with the phosphorus content indicates that its influence on the average yield is minimal.
In relation to soil pH, the estimated value of the parameter presented a loss of intensity, going from -0.44 to -0.16.The negative signal of this parameter evidences an inverse relation with the yield, which was already expected, since, in general, the increase of soil pH decreases the availability of the micronutrients (Sousa et al., 2007), which, consequently, can result in yield losses.
Observing the productivity contour map (Figure 2) elaborated with the parameters of the fitted spatial linear model (Table 6), the majority of the area (84%) presented yield values varying from 3.175 t ha -1 to 3.592 t ha -1 .FIGURE 2. Map of soybean yield generated using kriging with external drift.
Another characteristic observed in the soybean yield map (Figure 2) is the presence of circular regions centered on the sample points.These regions represent the phenomenon known as bull eyes effect that, as explained by Menezes et al. (2016), it occurs when the model has a spatial dependence radius less than the distance between neighboring sample points.Analyzing the distances between the pairs of sample points in the monitored area (Figure 1), only 33 pairs of points (0.68%) had a distance less than the spatial dependence radius (121 m).Thus, for presenting a behavior similar to a pure nugget effect, as highlighted by Margalho et al. (2014), the map was expected to show constant values.
Analyzing the q-q plot chart (Figure 3), traced points are on or near the straight line that passes through the ordered pairs ( and ( that and represent, respectively, the quantiles of order 0.25 and 0.75.As Fowlkes (1987) explained, this fact indicates that the theoretical quantis are in agreement with the quantis observed; therefore, it is coherent to assume that the data of soybean yield follow a normal distribution.FIGURE 3. Quantile-quantile plot of soybean yield (t ha -1 ).In particular, the straight line that passes through ordered pairs formed with quantiles of order 0.25 and 0.75.

CONCLUSIONS
The SB and PSB methods allowed to quantify the uncertainties associated to the spatial dependence structure and to evaluate the individual significance of the parameters associated to the average of the spatial linear model, allowing the determination of a model with a smaller number of parameters.
The fact that the linear spatial model formed with the potassium (K, mg dm -3 ), phosphorus (P, mg dm -3 ) and soil pH covariates had a spatial dependence radius close to the minimum distance between points, provided a map characterized by circular regions and a high amount of near-average values.
The use of the uncorrected residuals in the q-q plot elaboration allowed verifying the normality assumption of the soybean yield data, a presupposition that, in many cases, is simply assumed without verification.This finding is extremely important in the analysis since a violation of the normality assumption can invalidate statistical hypothesis tests.

FIGURE 1 .
FIGURE 1. Geographic location map of the study area.
of uncorrelated residuals; d) Considering the elements of the vector of uncorrelated residuals, carry out resamples with replacement to form the vector of bootstrap residuals ; e) The spatial bootstrap sample is obtained by recorrelating bootstrap residuals .Algorithm 2: Parametric Spatial Bootstrap (Tang et al., 2006).a) Considering the spatial data set { ,..., }, determine the residue vector being the ML estimator of and the ML estimator of ; b) Use the Cholesky decomposition method to obtain , that is a lower triangular matrix of order n; c) Use the standard normal distribution N (0.1) to create a vector of size , called the parametric bootstrap vector , that for ; d) The spatial bootstrap sample is obtained by recorrelating bootstrap residuals .

TABLE 1 .
Descriptive analysis of soybean yield and covariates.

TABLE 2 .
Criteria for selecting the soybean yield model elaborated with covariates considering the covariance function Matérn with different form parameters.
k: form parameter of the Matérn model; CrV: cross validation;: estimator of the nugget effect; : estimator of contribution; : estimator of the parameter that defines the range;

TABLE 3 .
Estimated parameters for the linear spatial model by the maximum likelihood method considering the covariance function Matérn with form parameter k = 4.5.

TABLE 5 .
Likelihood ratio test (LR) value and p-value for hypotheses , in the linear spatial model.
LR: likelihood ratio test, p-value: descriptive level of the test at 5% significance.