Acessibilidade / Reportar erro

SPATIAL VARIABILITY OF SOYBEAN YIELD THROUGH A REPARAMETERIZED T-STUDENT MODEL

ABSTRACT:

The t-Student distribution has been used to the spatial dependence modelling of soybean yield as an alternative to the normal distribution, being used for data with heavier tails or discrepant values. However, a usual Student t-distribution does not allow direct comparisons of geostatistical methods with a normal distribution. The aim of this study was to assess the soybean yield spatial variability through a reparameterized t-Student linear model, comparing the results with those of a Gaussian linear model. For parameter estimation, a complete maximum likelihood (CML) method was used through an expectation-maximization (EM) algorithm. The maps constructed with both reparameterized t-Student and normal distributions are dissimilar and present a kappa index (K) equivalent to 0.64. The reparameterized t-Student distribution is an alternative in studying data with discrepant values, showing the ability to decrease the influence of these points.

KEYWORDS:
EM algorithm; spatial dependence; geostatistics; complete maximum likelihood

INTRODUCTION

Geostatistics can assist in precision agriculture since its techniques allow constructing maps that determine the spatial dependence structure of yield associated with soil and plant attributes. Thus, it helps the producer to decide on the use of agricultural inputs in appropriate quantities and locations in order to increase yield, reduce losses, and maintain environmental quality. This technique is based on the regionalized variable theory proposed by Matheron, influenced by the observations made by Kriger. According to Vieira (2000)Vieira SR (2000) Geoestatística em estudos de variabilidade espacial do solo. Tópicos em Ciências do Solo. Revista Brasileira de Ciência do Solo 1:1-54. DOI:http://dx.doi.org/10.1590/S0100-06832005000200002
http://dx.doi.org/10.1590/S0100-06832005...
, Kriger analyzed gold concentration data in South Africa and observed the impossibility of finding meaning in the variances without taking into account the distance between samples. Therefore, the values of a variable distributed in the space are correlated within a radius of spatial dependence.

In a spatial variability study, the results obtained by geostatistical methods can be influenced by discrepant data, leading to biased predictions (Cressie, 2015Cressie NAC (2015) Statistics for spatial data. New-York, Jonh Willey & Sons.). A solution to the presence of discrepant data is the use of robust models, whose parameter estimation is less sensitive to these data. According to Manghi et al. (2016)Manghi RF, Paula GA, Cysneiros FJA (2016) On elliptical multilevel models. Journal of Applied Statistics 43(12):2150-2171. DOI:http://dx.doi.org/10.1080/02664763.2015.1134445
http://dx.doi.org/10.1080/02664763.2015....
, class models of symmetric distributions allow reducing the influence of discrepant data, incorporating additional parameters that adjust the kurtosis of data distribution. The t-Student distribution belongs to the class of symmetric distributions and exhibits symmetry properties, greater flexibility regarding the degree of kurtosis, and has as additional shape parameter v > 0, which defines the degrees of freedom of distribution (Assumpção et al., 2011Assumpção RAB, Uribe-Opazo MA, Galea M (2011) Local influence for spatial analysis of soil physical properties and soybean yield using student's t-distribution. Revista Brasileira de Ciência do Solo 35(6):1917-1926. DOI: http://dx.doi.org/10.1590/S0100-06832011000600008
http://dx.doi.org/10.1590/S0100-06832011...
; 2014Assumpção RAB, Uribe-Opazo MA, Galea M (2014) Analysis of local influence in geostatistics using Students t-distribution. Journal of Applied Statistics 41(3):615-630. DOI: http://dx.doi.org/10.1080/02664763.2014.909793
http://dx.doi.org/10.1080/02664763.2014....
). Lange et al. (1989)Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. Journal of the American Statistics 84(408):881-896. DOI:http://dx.doi.org/10.2307/2290063
http://dx.doi.org/10.2307/2290063...
propose a reparametrization of the t-Student distribution from a transformation in the shape parameter v, allowing us to assume the existence of the second finite moment and thus a more direct comparison with the normal distribution. This reparametrization is justified by the importance that the spatial dependence modeling represents since the new shape parameter η is limited and this process allows estimating parameters by maximum likelihood (Nesi et al., 2013Nesi CN, Ribeiro A, Bonat WH, Ribeiro Jr PJ (2013) Verossimilhança na seleção de modelos para predição espacial. Revista Brasileira de Ciência do Solo 37(2):352-358. DOI:http://dx.doi.org/10.1590/S0100-06832013000200006.
http://dx.doi.org/10.1590/S0100-06832013...
) and implementing the EM iterative algorithm (Dempster et al., 1977Dempster A, Laird N, Rubin DB (1977) Maximum likehood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1):1-38.; Assumpção et al., 2014Assumpção RAB, Uribe-Opazo MA, Galea M (2014) Analysis of local influence in geostatistics using Students t-distribution. Journal of Applied Statistics 41(3):615-630. DOI: http://dx.doi.org/10.1080/02664763.2014.909793
http://dx.doi.org/10.1080/02664763.2014....
).

This study aimed to assess the spatial variability of soybean yield by means of a reparameterized t-Student linear model, comparing the results with a Gaussian linear model. For estimating these model parameters, a complete maximum likelihood (CML) method was used through an expectation-maximization (EM) algorithm.

THEORETICAL FOUNDATION

Reparameterized t-Student distribution

Much of the statistical inference involving continuous random variables is based on normal distribution. However, to obtain reasonable inferences, assuming normality, it is necessary to ensure conditions such as symmetry and a certain value of kurtosis. Among the symmetric models alternative to the normal distribution is the t-Student distribution, which presents as an additional parameter the degree of freedom v (v > 0) that allows kurtosis modeling. A priori, this parameter can be fixed. However, Lange et al. (1989)Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. Journal of the American Statistics 84(408):881-896. DOI:http://dx.doi.org/10.2307/2290063
http://dx.doi.org/10.2307/2290063...
recommend fixing it at v = 4 for a small data set and its estimation for a large data set. This distribution has been widely used in the study with real data because it has tails longer than the normal distribution and allows the discrepant points present in the data set to be encompassed (Lange et al., 1989Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. Journal of the American Statistics 84(408):881-896. DOI:http://dx.doi.org/10.2307/2290063
http://dx.doi.org/10.2307/2290063...
; Osorio et al., 2007Osorio F, Paula GA, Galea M (2007) Assessment of local influence in elliptical linear models with longitudinal structure. Computacional Statistics & Data Analysis Journal 51(9):4354-4368.). Galea et al. (2002)Galea M, Bonfarine H, Labra FV (2002) Influence diagnostics in structural erros-in-variables model under Student-t-distribution. Journal of Applied Statistics 29(8):1191-1204. DOI: http://dx.doi.org/10.1080/0266476022000011265
http://dx.doi.org/10.1080/02664760220000...
suggest the t-Student distribution as an alternative to the normal distribution due to the statistical inference based on the t-Student distribution to combine conceptual and computational simplicity with generality, in addition to being applicable in a great variety of situations. An important feature of t- Student distribution is that when the degree of freedom v increases, the t-Student distribution approaches to the normal probability distribution.

Lange et al. (1989)Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. Journal of the American Statistics 84(408):881-896. DOI:http://dx.doi.org/10.2307/2290063
http://dx.doi.org/10.2307/2290063...
state that if a random vector Y=(Y1,,Yn)T has as probability density function multivariate t-Student with a location parameter μ, scale matrix V, and v > 0 degrees of freedom, Y~tn(μ1,V,v) is denoted. The expectation of the random vector Y is E(Y)=μ1, where 1 is a vector of 1's of order n × 1, for v > 1, and the covariance matrix n × n of Y is Cov(Y)=vv2V=Σ for v > 2. For values of v ≤ 2, the covariance matrix Cov(Y) is undefined.

Lange et al. (1989)Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. Journal of the American Statistics 84(408):881-896. DOI:http://dx.doi.org/10.2307/2290063
http://dx.doi.org/10.2307/2290063...
suggest the reparametrization of t-Student distribution for allowing the direct comparison between parameter estimation of the mean vector and the covariance matrix with the model assuming normality. The authors also mention that an improvement of inference is observed when the degree of freedom presents the transformation v=1η.

Y=(Y1,,Yn)T is considered a random vector that has reparameterized t-Student distribution with shape parameter η fixed, in which 0<η<12, with covariance matrix Σ, mean vector E(Y)=μ1 if its probability density function is given by [eq. (1)]:

(1) f Y ( y ) = K n ( η ) | Σ | 1 2 [ 1 + c ( η ) δ 2 ] 1 2 η ( 1 + n η ) ,

where,

K n ( η ) = ( c ( η ) π ) n 2 Γ ( 1 + n η 2 η ) Γ ( 1 2 η ) ,

Which δ2=(Yμ1)TΣ1(Yμ1) is the Mahalanobis distance, c(η)=η/(12η) and 0<η<12. It is denoted by Y~τn(μ1,Σ,η) that the vector Y has n-variate reparameterized t-Student distribution.

Spatial linear model

For the study of spatial dependence, {Y(si),siS} is considered a stochastic process of second-order stationary, where S2 and 2 is a two-dimensional Euclidean space. Let Y=(Y(s1),,Y(sn))T be a vector n × 1 of the response variable corresponding to spatial locations known in si with i = 1, …, n. The georeferenced variable Y(si) can be written as:

(2) Y ( S i ) = μ ( S i ) + e ( S i ) ,

being the deterministic term μ(si)=xiTβ, where xiT=(xi1,,xip) is a vector 1 × p of explanatory variables at position si, β=(β1,,βp)T is the vector p × 1 of unknown parameters to be estimated, and e(si) is a spatially correlated random component.

Equation (2)) can be written in a matrix form as:

(3) Y = X β + ε ,

where X is a matrix n × p of columns with complete rank, with lines xiT and ε=(e(s1),,e(sn))T, with i=1,,n. It is assumed that the random errors e(si) have zero mean, i.e. E[e(si)]=0 and the variation between points in space is determined by some covariance function Cov[e(si),e(su)]=Cov[Y(si),Y(su)]=C(si,su)=σiu for i,u=1,,n. The spatial modeling given in [eq. (3)] depends on the covariance matrix structure Σ=[(σiu)], where σiu=C(si,su) for i,u=1,,n, of the stochastic process Y . The covariance function C(si,su) is used in the study of spatial dependence of the stationary process and it is specified by a three-dimensional vector φ=(φ1,φ2,φ3)T of the form given in [eq. (4)] (Uribe-Opazo et al., 2012Uribe-Opazo MA, Borssoi JA, Galea M (2012) Influence Diagnostics in Gaussian Spatial Linear Models. Journal of Applied Statistics 39(3):615-630. DOI:http://dx.doi.org/10.1080/02664763.2011.607802
http://dx.doi.org/10.1080/02664763.2011....
):

(4) Σ = φ 1 I n + φ 2 R

where φ1 is the parameter nugget effect (φ10), φ2 is the parameter sill (φ20), R is a symmetric matrix n ×n, whose elements are as a function of the parameter (φ3>0)(R=R(φ3)=[(riu)]) with diagonal elements rii=1 and riu=φ21C(si,su) for φ20 and riu=0 for φ20, iu=1,,n being riu dependent on the Euclidian distance hiu=sisu between the points si and su, and In is the identity matrix n × n. The parametric form of the covariance matrix Σ, represented in [eq. (4)], occurs for several stationary and isotropic processes, in which the covariance C(si,su)=C(hiu) is defined by the covariance function C(hiu)=φ2riu. In the covariance functions C(hiu), the variance of the stochastic process reparameterized t-Student Y is given by C(0)=φ1+φ2.

On the assumption that Y~τn(Xβ,Σ,η), where η represents the shape parameter, considered fixed and the unknown parameters of the model θ=(βT,φT)T, with β=(β1,,βp)T and φ=(φ1,φ2,φ3)T can be estimated by maximizing the logarithm of the complete likelihood function defined by [eq. (5)]:

(5) l c ( θ ^ , Y c ) = max ( l c ( θ , Y c ) ) ,

being

(6) l c ( θ , Y c ) = n 2 log ( 2 π ) 1 2 log | Σ | 1 2 δ 2 ϑ + 1 2 η log ( 1 2 c ( η ) ) log ( Γ ( 1 2 η ) ) + η 2 log ( ϑ ) + 1 2 c ( η ) [ log ( ϑ ) ϑ ] ,

where δ2=(YXβ)TΣ1(YXβ), c(η)=η/(12η), ϑ>0 and 0<η<12.

Maximization of [eq. (6)] is obtained by using an iterative process. In this case, the EM (expectation and maximization) algorithm was applied, being the stopping criterion the relative error (RE), where REr=θrθ(r1)θr<, with =105. To determine the shape parameter η considered fixed, the criteria of cross-validation (VC(η)), presented by De Bastiani et al. (2015)De Bastiani F, Cysneiros AHMA, Uribe-Opazo MA, Galea M (2015) Influence diagnostics in ellipitical spatial linear models. Test 24(2):322-340. DOI:http://dx.doi.org/10.1007/s11749-014-0409-z
http://dx.doi.org/10.1007/s11749-014-040...
, and the trace criterion (Tr(η)), proposed by Kano et al. (1993)Kano Y, Berkane M, Bentler P (1993) Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations. Journal American Statistical Association 88(421): 135-143. DOI: http://dx.doi.org/10.2307/2290706
http://dx.doi.org/10.2307/2290706...
, were applied. For the reparameterized t-Student model, cross-validation is given by [eq. (7)]:

(7) V C ( η ) = 1 n [ i = 1 n ( y ( s i ) y ^ i ( s i ) 1 h i i ) 2 ] ,

wherey^i(si)=xiTβ^i with xiT=(xi1,,xip) being the i-th line of the matrix X, is the prediction at the location si without considering the observation (yi,xiT),β^i is the maximum likelihood estimator for βi without considering the i-th observation and hii is the i-th diagonal element of the matrix Hat (H=X(XTΣ^1X)1XTΣ^1), also called a projection matrix. Trace criterion consists of calculating the trace of the asymptotic covariance matrix of the estimated mean (μ^=Xβ^), as a criterion in choosing considering that the shape parameter is obtained by:

(8) T r ( η ) = ( ( 1 2 η ) ( 1 + ( n + 2 ) η ) 1 + n η ) t r [ ( X T X ) ( X T Σ ^ 1 X ) 1 ]

Where Σ^=φ^1In+φ^2R^. For the two criteria, the best shape parameter η is determined by the lowest values of cross-validation (VC(η)) and trace (Tr(η)). After choosing the estimation of η, the best Mátern family model was defined with different shape parameters κ (Matérn, 1986Matérn B (1986) Lecture notes in statistics. Springer, New York, p68-106.) by using the lowest standard error. The map was constructed by means of the regression-kriging method (Michel & Kobiyama, 2015Michel PG, Kobiyama M (2015) Estimativa da profundidade do solo: parte 2- métodos matemáticos. Revista Brasileira de Geografia Física 8(4):1225-1243.) since it allows the use of covariates. Finally, the maps constructed with the reparameterized t-Student distribution and normal distribution were compared using the Kappa index (K) (De Bastiani et al., 2012De Bastiani F, Uribe-Opazo MA, Dalposso GH (2012) Comparison of maps of spatial variability of soil resistance to penetration constructed with and without covariables using a spatial linear model. Engenharia Agrícola 32(2):394-404. DOI:http://dx.doi.org/10.1590/S0100-69162012000200019
http://dx.doi.org/10.1590/S0100-69162012...
), used to measure the exactness of thematic classifications, i.e. it provides a measure of agreement between the reference map values and the model map values. This index is recommended as an adequate precision measure because it uses all elements of the error matrix, being defined by [eq. (9)]:

(9) K = ( N * ) i = 1 r n i i i = 1 r ( n i + n + i ) ( N * ) 2 i = 1 r ( n i + n + i ) ,

where N* is the total area, nii is the area belonging to class i of the model and reference maps, ni+ is the area belonging to class i of the model map, n+i is the area belonging to class i of the reference map, and r is the number of classes. According to Krippendorff (2004)Krippendorff K (2004) Content analysis: an introduction to its methodology. Beverly Hills, Sage Publications. classification, K is classified with low similarity if K < 0.67, medium similarity if 0.67 < K < 0.80, and high similarity if K > 0.80.

MATERIAL AND METHODS

Location and characteristics of the study area

Data on soybean yield, plant height, and pods per plant were collected from an experimental area of 47.95 ha located in Cascavel, the western region of Paraná, Brazil, with an approximate location of 24.83° S and 53.60° W, and an average altitude of 650 meters. The soil of this area is classified as a clayey Oxisol (Haplorthox) (EMBRAPA, 2011EMBRAPA - Empresa Brasileira de Pesquisa Agropecuária (2011) Manual de métodos de análise de solo. Rio de Janeiro, Embrapa Solos, 2 ed. p212.) and regional climate is a temperate super-humid climate type Cfa (Köeppen) with average annual temperature of 21 °C. All samples were georeferenced in the spatial coordinate system (UTM) by using a Trimble GPS25 (Global Positioning System) GEOEXPLORER 3 data receiver. Figure 1 shows the experimental area in a regular grid of 75 × 75 meters, totaling 83 observations for the 2006/2007 agricultural season.

FIGURE 1
Area location in the 2006/2007 agricultural season.

In 2006, soybean was cultivated in this area by means of the no-tillage system. In 2007, data on soybean yield were collected, being estimated by considering the amount of soybean harvested from all plants distributed in two rows over a meter long, representing a plot. Grains were weighed for each plot and the water content was verified for subsequent correction to 13%. Yield value was converted into t ha−1. The estimation of average plant height (Hgt), in cm, was performed at soybean vegetative peak by calculating the average of four plants over a linear meter. For the average number of pods per plant (N), four plants were chosen at each point and the number of pods was counted per plant at harvest time.

Statistical analyses were performed using the free software R, version 3.2.0 (R Core Team, 2016R Core Team. (2016) A language and environment for statistical computing. Vienna, Foundation for Statistical Computing.). The following packages were used: geoR (Ribeiro Junior & Diggle, 2016Ribeiro Junior PJ, Diggle PJ (2016) geoR: Analysis of Geostatistical Data. R package version 1.7-5.1. Available: https://CRAN.R-project.org/package=geoR.
https://CRAN.R-project.org/package=geoR...
) for studying the spatial data, map construction by regression kriging interpolation, and comparison of thematic maps; matrixcalc (Novomestky, 2012Novomestky F (2012) matrixcalc: Collection of functions for matrix calculations. R package version 1.03. Available: https://CRAN.R-project.org/package=matrixcalc.
https://CRAN.R-project.org/package=matri...
) for trace calculation; e1071 (Meyer et al., 2015Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2015) e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.67. Available: https://CRAN.R-project.org/package=e1071.
https://CRAN.R-project.org/package=e1071...
) for calculating the asymmetry and kurtosis; and classInt (Bivand, 2015Bivand, R (2015) classInt: Choose Univariate Class Intervals. R package version 0.1-23. Available: https://cran.r-project.org/package=classInt.
https://cran.r-project.org/package=class...
) for choosing the class intervals for continuous numerical variables.

RESULTS AND DISCUSSION

Table 1 shows the exploratory analysis of values found for the variables soybean yield (Prod) (t ha−1), average plant height (Hgt) (cm), and an average number of pods (N). The average soybean yield is 2.99 t ha−1, with a minimum value of 1.50 t ha−1 and a maximum value of 5.53 t ha−1. Moreover, 75% of the area presents a yield lower than or equal to 3.35 t ha−1. Soybean yield is classified as heterogeneous since the coefficient of variation (CV) is 21.27%.

TABLE 1
Descriptive statistics for the variable soybean yield (Prod), the covariates average plant height (Hgt) and an average number of pods (N).

The boxplot graph presented in Figure 2a detected a single discrepant point, which corresponds to the sample element 6, with coordinates (236325, 7250475), referring to the maximum yield value in the data set, being equivalent to 5.53 t ha−1. According to the Postplot graph shown in Figure 2b, observation 6 is in a region where the nearest neighbors have a soybean yield between 2.60 and 2.94 t ha−1.

FIGURE 2
Box plot (a) and Postplot (b) graphs for soybean yield data

In order to identify the spatial dependence structure of soybean yield as a function of the average plant height (Hgt) and an average number of pods per plant (N), the average soybean yield μ(si) in the position siS2 was considered as a spatial linear regression model given by:

(10) μ ( s i ) = β 1 + β 2 H g t ( s i ) + β 3 N ( s i ) , i = 1 , , n ,

where β1, β2, and β3 are the unknown parameters to be estimated.

Parameter estimation studies were performed by complete maximum likelihood (CML) using the EM algorithm of the spatial linear model defined in [eq. (10)] and parameters of the spatial dependence structure Σ given in [eq. (4)], considering the Matérn family with parameters κ = 0.5, 1.0, 2.0, 5.0, 10 and 20 associated to shape parameters of the reparameterized t-Student η = 0.05, 0.067, 0.1, 0.143, and 0.2.

Table 2 shows the determination of the best shape parameter η of the reparameterized t- Student distribution associated to each shape parameter κ of the Matérn family using the crossvalidation criterion and trace defined by Equations (7) and (8). In bold is presented the choice of each parameter η for each κ with the lowest values of cross-validation (VC(η)) and trace (Tr(η)).

TABLE 2
Cross-validation and trace for the choice of the best shape parameter η

Figure 3 shows the cross-validation VC(η) and trace Tr(η) graphs for each κ value of the Matérn family model related to those chosen in Table 3. For κ = 0.5 and 20, VC(η) and Tr(η) values increase as η value increases. For the other cases, when η values increase, VC(η) and Tr(η) values oscillate.

FIGURE 3
Graphs of cross-validation VC(η) and trace Tr(η).
TABLE 3
Estimation of the parameters β and φ via EM algorithm for different k and η.

Table 3 shows the results of parameter estimation and the respective standard deviations considering the η values for each κ selected in Table 2. The lowest standard deviations of estimators correspond to the estimated values of η = 0.050 and κ = 0.5, whose estimates are β^1=0.993, β^2=0.021, β^3=0.030, φ^1=0.248, φ^2=0.121, and φ^3=112.8, with a practical range of approximately 338.0 m.

Figure 4a shows the soybean yield map constructed by means of regression kriging interpolation considering that the data have a reparameterized t-Student distribution with η = 0.05 and shape parameter of the Matérn model κ = 0.5 with the following parameters estimated by CML of the spatial linear regression model: β^1=0.993, β^2=0.021, β^3=0.030, φ^1=0.248, φ^2=0.121, and φ^3=112.8, with a practical range of 328.0 m. Figure 4b shows the soybean yield map considering that the data have a normal distribution and shape parameter of the Matérn model κ = 0.5 with the following parameters estimated by the maximum likelihood of the spatial linear regression model: β^1=0.957, β^2=0.023, β^3=0.030, φ^1=0.298, φ^2=0.132, and φ^3=133.4, with a practical range of 400.20 m.

FIGURE 4
(a) Map 1: soybean yield with reparameterized t-Student distribution with η = 0.050 and Matérn family model with shape parameter κ = 0.5; (b) Map 2: soybean yield with normal distribution and Matérn family model with shape parameter κ = 0.5 .

An increase in area percentage was observed in the 1st, 2nd, and 5th classes of the map constructed with a normal distribution (Map 2) when compared to the map constructed with the reparameterized t-Student distribution (Map 1) (Figure 4 and Table 4). Consequently, the 3rd and 4th classes presented a reduction, with the 3rd class obtaining a greater reduction, equivalent to 6.08%, decreasing from 37.33 to 31.25% of the area.

TABLE 4
Area percentage at each map class of soybean yield constructed with the reparameterized t-Student distribution and normal distribution.

For comparison between maps, the kappa accuracy index (K) was calculated. This index is considered an appropriate measure by Anderson et al. (2001)Anderson J, Hardy E, Roach J, Witmer R (2001) A land use and land cover classification system for use with remote sensor data‥ US Geological Survey Professional, Washington, DC, US Geological Survey Professional. 41p. (Technical Report Paper 964) since it uses all elements of the error matrix constructed from omission errors and designation between maps (De Bastiani et al., 2012De Bastiani F, Uribe-Opazo MA, Dalposso GH (2012) Comparison of maps of spatial variability of soil resistance to penetration constructed with and without covariables using a spatial linear model. Engenharia Agrícola 32(2):394-404. DOI:http://dx.doi.org/10.1590/S0100-69162012000200019
http://dx.doi.org/10.1590/S0100-69162012...
). The obtained value of K = 0.64 allows classifying it as a low similarity. Consequently, the maps constructed with reparameterized t-Student and normal distributions are dissimilar due to the influence of the discrepant point.

As a complementary analysis, a new geostatistical study was carried out by removing the point 6, which was considered as discrepant and assuming that the data presented reparameterized t- Student distribution and normal distribution. The maps constructed without the discrepant point are shown in Figure 5. The kappa accuracy index for comparison between the new maps was K = 0.89, indicating a high similarity between maps (Krippendorff, 2004Krippendorff K (2004) Content analysis: an introduction to its methodology. Beverly Hills, Sage Publications.). Therefore, the interference of this discrepant point in mapping is relevant.

FIGURE 5
(a) Map 1: soybean yield with reparameterized t-Student distribution with η = 0.050 and Matérn family model with shape parameter κ = 0.5 without point 6; (b) Map 2: soybean yield with normal distribution and Matérn family model with shape parameter κ = 1.0 without point 6.

CONCLUSIONS

When applying the methodology proposed in this study for soybean yield data with the covariates average height and an average number of pods per plant, the parameters estimated by complete maximum likelihood using the reparameterized t-Student distribution presented differences in the estimates of parameters that define the spatial dependence structure when compared to those obtained from a normal distribution. Consequently, differences were observed in soybean yield maps obtained from the different methods. Thus, the use of reparameterized t-Student distribution is an alternative in studying data with discrepant values, showing the ability to decrease the influence of these points.

ACKNOWLEDGEMENTS

To the CNPq, CAPES, Araucária Foundation of the Paraná state and project FONDECYT 1150325 Chile, for the financial support to develop this research.

REFERENCES

  • Anderson J, Hardy E, Roach J, Witmer R (2001) A land use and land cover classification system for use with remote sensor data‥ US Geological Survey Professional, Washington, DC, US Geological Survey Professional. 41p. (Technical Report Paper 964)
  • Assumpção RAB, Uribe-Opazo MA, Galea M (2011) Local influence for spatial analysis of soil physical properties and soybean yield using student's t-distribution. Revista Brasileira de Ciência do Solo 35(6):1917-1926. DOI: http://dx.doi.org/10.1590/S0100-06832011000600008
    » http://dx.doi.org/10.1590/S0100-06832011000600008
  • Assumpção RAB, Uribe-Opazo MA, Galea M (2014) Analysis of local influence in geostatistics using Students t-distribution. Journal of Applied Statistics 41(3):615-630. DOI: http://dx.doi.org/10.1080/02664763.2014.909793
    » http://dx.doi.org/10.1080/02664763.2014.909793
  • Bivand, R (2015) classInt: Choose Univariate Class Intervals. R package version 0.1-23. Available: https://cran.r-project.org/package=classInt
    » https://cran.r-project.org/package=classInt
  • Cressie NAC (2015) Statistics for spatial data. New-York, Jonh Willey & Sons.
  • De Bastiani F, Uribe-Opazo MA, Dalposso GH (2012) Comparison of maps of spatial variability of soil resistance to penetration constructed with and without covariables using a spatial linear model. Engenharia Agrícola 32(2):394-404. DOI:http://dx.doi.org/10.1590/S0100-69162012000200019
    » http://dx.doi.org/10.1590/S0100-69162012000200019
  • De Bastiani F, Cysneiros AHMA, Uribe-Opazo MA, Galea M (2015) Influence diagnostics in ellipitical spatial linear models. Test 24(2):322-340. DOI:http://dx.doi.org/10.1007/s11749-014-0409-z
    » http://dx.doi.org/10.1007/s11749-014-0409-z
  • Dempster A, Laird N, Rubin DB (1977) Maximum likehood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1):1-38.
  • EMBRAPA - Empresa Brasileira de Pesquisa Agropecuária (2011) Manual de métodos de análise de solo. Rio de Janeiro, Embrapa Solos, 2 ed. p212.
  • Galea M, Bonfarine H, Labra FV (2002) Influence diagnostics in structural erros-in-variables model under Student-t-distribution. Journal of Applied Statistics 29(8):1191-1204. DOI: http://dx.doi.org/10.1080/0266476022000011265
    » http://dx.doi.org/10.1080/0266476022000011265
  • Kano Y, Berkane M, Bentler P (1993) Statistical inference based on pseudo-maximum likelihood estimators in elliptical populations. Journal American Statistical Association 88(421): 135-143. DOI: http://dx.doi.org/10.2307/2290706
    » http://dx.doi.org/10.2307/2290706
  • Krippendorff K (2004) Content analysis: an introduction to its methodology. Beverly Hills, Sage Publications.
  • Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. Journal of the American Statistics 84(408):881-896. DOI:http://dx.doi.org/10.2307/2290063
    » http://dx.doi.org/10.2307/2290063
  • Manghi RF, Paula GA, Cysneiros FJA (2016) On elliptical multilevel models. Journal of Applied Statistics 43(12):2150-2171. DOI:http://dx.doi.org/10.1080/02664763.2015.1134445
    » http://dx.doi.org/10.1080/02664763.2015.1134445
  • Matérn B (1986) Lecture notes in statistics. Springer, New York, p68-106.
  • Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2015) e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.67. Available: https://CRAN.R-project.org/package=e1071
    » https://CRAN.R-project.org/package=e1071
  • Michel PG, Kobiyama M (2015) Estimativa da profundidade do solo: parte 2- métodos matemáticos. Revista Brasileira de Geografia Física 8(4):1225-1243.
  • Nesi CN, Ribeiro A, Bonat WH, Ribeiro Jr PJ (2013) Verossimilhança na seleção de modelos para predição espacial. Revista Brasileira de Ciência do Solo 37(2):352-358. DOI:http://dx.doi.org/10.1590/S0100-06832013000200006
    » http://dx.doi.org/10.1590/S0100-06832013000200006
  • Novomestky F (2012) matrixcalc: Collection of functions for matrix calculations. R package version 1.03. Available: https://CRAN.R-project.org/package=matrixcalc
    » https://CRAN.R-project.org/package=matrixcalc
  • Osorio F, Paula GA, Galea M (2007) Assessment of local influence in elliptical linear models with longitudinal structure. Computacional Statistics & Data Analysis Journal 51(9):4354-4368.
  • R Core Team. (2016) A language and environment for statistical computing. Vienna, Foundation for Statistical Computing.
  • Ribeiro Junior PJ, Diggle PJ (2016) geoR: Analysis of Geostatistical Data. R package version 1.7-5.1. Available: https://CRAN.R-project.org/package=geoR
    » https://CRAN.R-project.org/package=geoR
  • Uribe-Opazo MA, Borssoi JA, Galea M (2012) Influence Diagnostics in Gaussian Spatial Linear Models. Journal of Applied Statistics 39(3):615-630. DOI:http://dx.doi.org/10.1080/02664763.2011.607802
    » http://dx.doi.org/10.1080/02664763.2011.607802
  • Vieira SR (2000) Geoestatística em estudos de variabilidade espacial do solo. Tópicos em Ciências do Solo. Revista Brasileira de Ciência do Solo 1:1-54. DOI:http://dx.doi.org/10.1590/S0100-06832005000200002
    » http://dx.doi.org/10.1590/S0100-06832005000200002

Publication Dates

  • Publication in this collection
    Jul-Aug 2017

History

  • Received
    28 Apr 2016
  • Accepted
    08 Mar 2017
Associação Brasileira de Engenharia Agrícola SBEA - Associação Brasileira de Engenharia Agrícola, Departamento de Engenharia e Ciências Exatas FCAV/UNESP, Prof. Paulo Donato Castellane, km 5, 14884.900 | Jaboticabal - SP, Tel./Fax: +55 16 3209 7619 - Jaboticabal - SP - Brazil
E-mail: revistasbea@sbea.org.br