SciELO - Scientific Electronic Library Online

vol.35 issue1Physical quality of an Oxisol under management systems evaluated by the Index "S"Soil air permeability in a Rhodic Hapludox under different tillage systems author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand




Related links


Revista Brasileira de Ciência do Solo

On-line version ISSN 1806-9657

Rev. Bras. Ciênc. Solo vol.35 no.1 Viçosa Jan./Feb. 2011 



Selecting statistical models to study the relationship between soybean yield and soil physical properties(1)


Seleção de modelos estatísticos para a relação entre produtividade da soja e atributos físicos do solo



Marcio Paulo de OliveiraI; Maria Hermínia Ferreira TavaresII; Miguel Angel Uribe-OpazoIII; Luis Carlos TimmIV

IMaster in Agricultural Engineering, Graduate Program in Agricultural Engineering, Western Parana State University (UNIOESTE), Universitaria Street, 2069, CEP 85819-110 Cascavel (PR), Brazil. E-mail:
IIProfessor of the Graduate Program in Agricultural Engineering at the Western Parana State University (UNIOESTE). CEP 85819-110 Cascavel (PR), Brazil. E-mail:
IIIProfessor of the Graduate Program in Agricultural Engineering at the Western Parana State University (UNIOESTE). CEP 85819-110 Cascavel (PR), Brazil. E-mail:
IVProfessor of Federal University of Pelotas. CEP 96001-970 Pelotas (RS), Brazil. E-mail:




Statistical models allow the representation of data sets and the estimation and/or prediction of the behavior of a given variable through its interaction with the other variables involved in a phenomenon. Among other different statistical models, are the autoregressive state-space models (ARSS) and the linear regression models (LR), which allow the quantification of the relationships among soil-plant-atmosphere system variables. To compare the quality of the ARSS and LR models for the modeling of the relationships between soybean yield and soil physical properties, Akaike's Information Criterion, which provides a coefficient for the selection of the best model, was used in this study. The data sets were sampled in a Rhodic Acrudox soil, along a spatial transect with 84 points spaced 3 m apart. At each sampling point, soybean samples were collected for yield quantification. At the same site, soil penetration resistance was also measured and soil samples were collected to measure soil bulk density in the 0-0.10 m and 0.10-0.20 m layers. Results showed autocorrelation and a cross correlation structure of soybean yield and soil penetration resistance data. Soil bulk density data, however, were only autocorrelated in the 0-0.10 m layer and not cross correlated with soybean yield. The results showed the higher efficiency of the autoregressive space-state models in relation to the equivalent simple and multiple linear regression models using Akaike's Information Criterion. The resulting values were comparatively lower than the values obtained by the regression models, for all combinations of explanatory variables.

Index terms: autocorrelation, cross correlation, linear regression, state-space model, soil and plant properties.


Modelos estatísticos permitem representar conjuntos de dados, assim como estimar e prever o comportamento de uma variável por meio da sua interação com as demais variáveis envolvidas no fenômeno. Entre os modelos estatísticos encontram-se modelos autorregressivos em espaço de estados (AEE) e modelos de regressão linear (RL) que permitem quantificar as relações entre variáveis do sistema solo-planta-atmosfera. Neste trabalho, com o objetivo de se comparar a qualidade dos modelos AEE e RL para a modelagem das relações entre a produtividade da soja e atributos físicos do solo, utilizou-se o critério de informação de Akaike, o qual fornece um coeficiente que permite a seleção do melhor modelo. O conjunto de dados foi amostrado, em um Latossolo Vermelho distroférrico, ao longo de uma transeção com 84 pontos espaçados de 3 m entre si. Em cada ponto, foram coletadas amostras de soja para quantificar a produtividade e mediu-se a resistência do solo à penetração, assim como retiradas amostras de solo, nas camadas de 00,10 e 0,100,20 m, para mensurar a sua densidade. Os resultados mostraram que os dados de produtividade da soja e resistência do solo à penetração apresentaram autocorrelação e estrutura de correlação cruzada. A densidade do solo, entretanto, apresentou autocorrelação apenas na camada de 00,10 m e não mostrou correlação cruzada com a produtividade da soja. Os resultados mostraram a maior eficácia dos modelos autorregressivos de Espaço de Estados em relação aos modelos equivalentes de regressão linear simples e múltipla com o emprego do Critério de Informação de Akaike, o qual resultou em valores comparativamente mais baixos do que os obtidos com os modelos de regressão, para todas as combinações das variáveis explicativas.

Termos de indexação: autocorrelação, correlação cruzada, regressão linear, modelo de espaço de estados, atributos do solo e da planta.




The spatial and temporal variability in physical and chemical properties and their relationship with crop yield should not be underestimated at the moment of agricultural management planning (Coelho et al., 1998; Shuai & Yost, 2004). Among the statistical techniques to study relationships among soil properties and crop yield are autoregressive state-space and linear regression models (Reichardt & Timm, 2008).

The state-space models, by means of the Kalman filter, provide an adequate representation of a spatial or temporal process of a variable of interest with coefficients, confidence intervals and model errors if the observation density supports the identification of the correlation length (Timm et al., 2003; Nielsen & Wendroth, 2003). Thus, this approach may be applied to identify and represent processes in agricultural systems aiming to improve the understanding of the relations of that system (Dourado-Neto et al., 1999). It may be a more effective research tool in comparison to other approaches to understand and explain landscape-scale variation in agricultural systems (Stevenson et al., 2001).

The objective of this study was to select statistical models to explain the relationships between soybean yield and the soil physical properties penetration resistance and soil bulk density in two layers (0-0.10 and 0.10-0.20 m).



The field experiment was carried out on the Fazenda Santo Antonio, Braganey County, State of Paraná, Brazil (latitude 24 º 54 ' 44.94 " South; longitude 53 º 08 ' 46.03 " West, 750 m a.s.l.). The regional climate is temperate meso-thermal and super-humid.

The soil of the experimental area was classified as Rhodic Acrudox (Embrapa, 2006), on basaltic substrate, with a slightly undulated relief. The crop was planted on a 6-7 % slope area, under continuous no-tillage cropping system for 8 years, alternating summer soybean with winter wheat. The spacing between soybean rows was 0.45 m, sown at a density of 17 seeds per meter. A spatial transect (length 252 m) was sampled, parallel to the plant rows, with 84 sampling sites spaced 3 m apart and geo-referenced by the Global Positioning System (GPS).

The following plant and soil properties were measured at each point: (a) soybean yield (SY, Mg ha-1), as response variable; (b) soil penetration resistance (SRP, MPa); and (c) soil bulk density (SD, Mg m-3), measured in the layers 0-0.10 m and 0.10-0.20 m. All soybean plants of each experimental unit were harvested, threshed and the seeds sieved. The mass of each sample was weighed on a 0.01 g precision digital scale and the results converted to Mg ha-1.

To determine soil bulk density, undisturbed soil samples were collected (layers 0-0.10 m and 0.10-0.20 m) at each site along the spatial transect. In the laboratory, the soil samples were dried at 105 ºC for a minimum period of 24 h and the dry mass was weighed on a 0.01 g precision digital scale.

Soil penetration resistance at each sampling site was determined by a Soil Control Penetrographer PAT SC-60 equipment, with four replications per site. Concomitantly with RSP measurements at each site, disturbed soil samples were collected to determine the gravimetric soil water content. Then, the mean soil water contents of the spatial transect in the layers 0-0.10 m and 0.10-0.20 m were calculated, resulting in 0.256 and 0.274 kg kg-1, respectively.

The sample autocorrelation function among the variables Zj(xi) at position xi with Zj(xi + h) at position xi + h may be determined by equations 1 and 2:

in which Cj(h) is the covariance between the variables Zj(xi) and Zj(xi + h); n is the number of Zj(xi) and j(xi + h) pairs separated by a specific distance h; j(x) is the arithmetic mean of variable Zj(xi); rj(h) is the sample autocorrelation coefficient for pairs of variable Zj(xi) separated by a specific distance h, ranging from 1 < rj(h) < 1; and S2 is the sample variance of Zj (Reichardt & Timm, 2008). The sample cross correlation coefficients for pairs of two different measured variables Zj(xi) and Zu(xi + h), separated by a specific distance h, were calculated by equation (3):

in which C(Zj(xi), Zu(xi + h )) is the covariance between the variables Zj(xi) and Zu(xi + h); is the sample variance of variable Zj(xi); and is the sample variance of variable Zu(xi) (Reichardt & Timm, 2008).

To calculate the significant confidence intervals (CI) to determine whether autocorrelation and cross correlation coefficients are significantly different from zero or not, the function of the cumulative probability (p = ±1.96 for 95 %) was used for a standardized normal distribution (Davis, 1986; Nielsen & Wendroth, 2003) and the number of observations n, as shown by equation 4:

All data sets for the state-space approach were transformed by equation 5:

in which Zj*(xi ) is the transformed data, Zj(xi) the measured data, m the arithmetic mean of the Zj(xi) data set and S the sample standard deviation of the Zj(xi) data set. The software Applied Statistical Time Series Analysis (ASTSA) (Shumway, 1988; Shumway & Stoffer, 2000) was used to apply the state-space approach.

Applied models

The space-state model is a way of representing a linear or non-linear system, starting from a system of two dynamic equations. The observation vector Y(xi) of the process is generated as a function of the non-observed state vector Z(xi), called observation equation, by equation 6 for i = 1, …, n:

in which the observation vector Y(xi) is related to the state vector Z(xi) by an observation matrix Mii and an observation error (or noise) vi.

The dynamic evolution of the non-observed state vector Z(xi), called state equation, is given by equation (7), for i = 1,…, n:

in which the state vector Z(xi) at position xi is related to its value at position xi-1 by a matrix of state coefficients ϕii (transition matrix) and an error (or noise) wi associated to the state. According to Hui et al. (1998), when all data sets are transformed by equation 5, the magnitudes of the state coefficients of the matrix f are directly proportional to their contribution to each state variable used in the analysis.

The mean of the noise values, which are not autocorrelated and normally distributed with constant variances, is assumed to be zero. If these Z variables were observable, this would be the usual structure of a vector autoregressive model, in which the coefficients of matrix ϕ could be estimated by multiple regression techniques, taking Z(xi) and Z(xi-1) as dependent and independent variables, respectively. The formulation of a linear regression model may be theoretically represented by equation 8:

where Y(xi) is the explanatory variable; β0, β1 and β2 are the regression coefficients, X1 and X2 are the explanatory variables and ei is the model error.

Criterion for model selection

A way to compare the quality of different models is defined by Akaike (1973), cited by Shumway & Stoffer (2000), and may be adapted to regression techniques. The criterion is based on the calculation of the residual sum of square (RSSavg) between observed (Y(xi)) and estimated (Y(xi)*) values, calculated by equation 9:

Akaike's Information Criterion (AIC) considers the number of regression coefficients k and the number of observations n, and is given by equation (10):

To correct Akaike's coefficient (AICc) as a function of the number of samples, equation (11) is used:

where the term (n + k) (n – k – 2) is the correction factor for the AIC coefficient.



Table 1 presents the descriptive statistical analysis of soybean yield data (SY), soil penetration resistance data (SRP) and soil bulk density data (SD), in the two studied layers, along the spatial transect. The Coefficients of Variation (CV) for SY and SD, in the 0-0.10 m layer and for SRP and SD, in the 0.10-0.20 m layer, were lower than 20 %, indicating data homogeneity. The normality tests of Kolgomorov-Smirnov and Anderson-Darling showed a trend of the SY, SRP, and SD data in the 0-0.10 m and 0.10-0.20 m layers to a normal distribution of probability, at a level of 5 %. The normal variable distributions allow the construction of regression models to explain variable relations.



The data spatial distribution of soybean yield, soil penetration resistance, and soil bulk density, in the 0-0.10 m and 0.10-0.20 m layers, is presented in the figures 1a,bc, respectively. This figure shows data oscillation, revealing a trend of addition/decrease around the mean, along the sampling elements indicating stationary behavior (Morettin & Toloi, 2004). Point-to-point fluctuations in all variables can also be observed due to soil natural spatial variability, which present local characteristics and may therefore be better represented by locally adaptable models (e.g. state-space model). In this case, global or space-independent models (e.g. classical multiple regression model) based on the assumption that the mean of each data set is constant along the entire transect, independent of local spatial variation, fail to express the soil spatial variability. These traditional statistical models to not take the spatial coordinates of the data of a studied variable into account, assuming that the coordinates are spatially independent from each other, randomly distributed over the entire field. According to Nielsen et al. (1998), these analyses still lack proven concepts and strategies to interpret the cause of spatial patterns, mainly because response functions between crop yield and different variables are neither constant nor consistent within an agricultural field.

Figure 2 presents the autocorrelographs of soybean yield, soil penetration resistance, and soil bulk density data sets in the 0-0.10 m and 0.10-0.20 m layers. In figure 2 it is possible to observe a third-order autocorrelation for the data set of soybean yield (Figure 2a) and soil penetration resistance in the 0.10-0.20 m layer (Figure 2c), indicating spatial dependence up to a distance of 9 m between adjacent observations of each variable. However, for soil penetration resistance in the 0-0.10 m layer (Figure 2b), a second-order autocorrelation was observed.

For adjacent observations of soil bulk density, in the 0-0.10 m layer (Figure 2d), there was a significant first-order autocorrelation while no autocorrelation was observed in the 0.10-0.20 m layer (Figure 2e).

Cross correlographs for soybean yield versus soil penetration resistance were designed (Figure 3). The cross correlation between soybean yield and soil penetration resistance in the 0-0.10 m layer (Figure 3a) is in the order of two lags (6 m in our study) and in the order of one lag in the 0.10-0.20 m layer (Figure 3b) (3 m in our study). Applying the "t" test at 5 %, there was no cross correlation between soybean yield and soil bulk density in both studied soil layers (Figure 3c,d).



Soybean yield was estimated by the autoregressive state-space equations using soil penetration resistance data as predictor variables (Table2). The variable soil bulk density was not considered, since there was no spatial correlation with the soybean yield. For each model, the values of Akaike's Information Coefficient (AIC) and corrected Akaike's Information Coefficient (AICc) were provided.



The AIC is based on the estimated variance of a model to select the best-performing model, since the smaller the coefficient values the better is the model performance, once the variance is also lower (Nielsen & Wendroth, 2003). Partial autocorrelation coefficients (PACF) for each studied variable were also calculated, which indicated that the PACF coefficients are significant up to 1 lag, i.e. autoregressive models of the order of 1 lag can be applied.

The model with the lowest AIC was the one that measured the contribution of the variables in the 0-0.10 m layer, in which the coefficients AIC and the AICc were -4.604 and -3.576, respectively. In this model, soybean yield and soil penetration resistance measurements at site i-1 contribute with 0.8727 and 0.1143, respectively, to soybean yield estimates at site i.

The AIC values were highest in the model that considered soil penetration resistance in the 0-0.10 m and 0.10-0.20 m layers. In this model, measurements at site i-1 of soybean yield, soil penetration resistance (0-0.10 m layer) and soil penetration resistance (0.10-0.20 m layer) contributed with 0.8578, 0.3374 and 0.1952 to the soybean yield estimates at site i.

For a better understanding of the spatial relations between SY and SRP, the state-space analysis and the classical multiple regression were compared using the same state variables. Classical multiple regression is based on mean values of each variable across the space under study, in which only the difference between a variable at a given location compared to its respective value at a previous location is considered. The simple and multiple linear regression models (Table 3) were combined with soybean yield estimates as a function of soil penetration resistance measurements in both soil layers.



The variance was lowest in the model with AIC and AICc coefficient values of -1.717 and -0.687, respectively, with a R2 of 0.090, so that no more than 9.0 % of the variance of the SY data along the spatial transect was explained by the SRP values in both studied layers (Table 3), independent of their position. Regression-estimated values are much less variable than the measured values and consistently underestimate larger and overestimate lower measured values. In this model, each addition/decrease of one unit resulted in a contribution of -0.1130 to the soil penetration resistance in the 0-0.10 m layer and of -0.1180 in the 0.10-0.20 m soil layer. The AIC and AICc coefficient values were highest in the model considering the variable soil penetration resistance in the 0-0.10 m layer, with a R2 of 0.063. In this model, at each addition/decrease of one unit in soil penetration resistance, soybean yield was reduced by -0.680. The better performance of state-space models compared to classical multiple regression models was also reported by Timm et al. (2004). These authors observed a better estimation of the soil water content by state-space than by the equivalent linear regression equations.

The benefit of a state-space analysis of the spatial processes of crop-dependent variables is that each variable is treated statistically as random and the spatial association and variation as a function of the distance between measurements. In our study, the variations in soil penetration resistance in the layers 0-0.10 m and 0.10-0.20 m across the spatial transect were most strongly related to the spatial distribution of soybean yield. From the agronomic point of view, this result is similar to findings of Beutler & Centurion (2003), among others. A spatial soil process is the change of a variable or a vector consisting of several variables across a spatial domain due to local conditions, e.g., the spatial process of soybean yield considered across a field can be mainly influenced by spatial changes in soil management, type, topography, and by rainfall. In other words, the state-space analysis could provide "site-specific" and/or "time-specific" management information without the disadvantage of considering the very general average values of an entire area or year. As a result, crop management practices could be more precisely adapted to be more effective and sustainable, with less deleterious effects on soil and water resources.



1. There was autocorrelation and cross correlation between soil penetration resistance in the two studied layers (0.0-0.10 and 0.10-0.20 m) and soybean yield measurements along the spatial transect. Soil bulk density was not auto and cross correlated with soybean yield.

2. The autoregressive state-space models described variations in soybean yield along the spatial transect better than the equivalent simple and multiple linear regression models using Akaike's Information Criterion.

3. In the application of the state-space analysis under field conditions, it was possible to consider the underlying processes between soybean productivity and soil resistance to penetration in every local neighborhood along the transect. It was also possible to identify the local relations between soybean productivity and soil resistance to penetration measurements and quantify this relationship, stochastically, taking measurement and model errors into account. The model represents a possibility of investigating the effect of values of crop yield and soil physical properties from adjacent sampling points in heterogeneous fields.



The authors are indebted to the Brazilian Federal Agency for Support and Evaluation of Graduate Education - CAPES and the Brazilian National Research Council - CNPq for the scholarships.



AKAIKE, H. Information theory and extension of the maximum likelihood principle. J. Royal Stat. Soc., 51:469-483, 1973.         [ Links ]

BEUTLER, A.N. & CENTURION, J.F. Efeito do conteúdo de água e da compactação do solo na produção de soja. Pesq. Agropec. Bras., 38:849-856, 2003.         [ Links ]

COELHO, A.M.; DORAN, J.W. & SCHEPERS, J.S. Irrigated corn yield as related to spatial variability of selected soil properties.In: ROBERT, P.C.; RUST, R.H. & LARSON, W.E., eds. In: INTERNATIONAL CONFERENCE ON PRECISION AGRICULTURE, 4., St Paul, 1998. Proceedings. St Paul, ASA-CSSA-SSSA, 1998. p.441-452.         [ Links ]

DAVIS, J.C. Statistics and data analysis in geology. 2.ed. New York, John Wiley & Sons, 1986. 646p.         [ Links ]

DOURADO-NETO, D.; TIMM, L.C.; OLIVEIRA, J.C.M.; REICHARDT, K.; BACCHI, O.O.S.; TOMINAGA, T.T. & CASSARO, F.A.M. State space approach for the analysis of soil water content and temperature in a sugarcane crop. Sci. Agric., 56:1215-1221, 1999.         [ Links ]

EMPRESA BRASILEIRA DE PESQUISA AGROPECUÁRIA - EMBRAPA. Sistema brasileiro de classificação de solos. Brasília, Embrapa-SPI; Rio de Janeiro, Embrapa-Solos, 2006. 306p.         [ Links ]

HUI, S.; WENDROTH, O.; PARLANGE, M.B. & NIELSEN, D.R. Soil variability: Infiltration relationships of agroecosytems. J. Balkan Ecol., 1:21-40, 1998.         [ Links ]

MORETTIN, P.A. & TOLOI, C.M.C. Análise de séries temporais. São Paulo, Edgard Blücher, 2004. 535p.         [ Links ]

NIELSEN, D.R.; WENDROTH, O. & PIERCE, F.J. Emerging concepts for solving the enigma of precision farm research. In: ROBERT, P.C.; RUST, R.H. & LARSON, W.E., eds. In: INTERNATIONAL CONFERENCE ON PRECISION AGRICULTURE, 4., St Paul, 1998. p.303-318.         [ Links ]

NIELSEN, D.R. & WENDROTH, O. Spatial and temporal statistics - Sampling field soils and their vegetation. Reiskirchen, Catena Verlag, 2003. 63p.         [ Links ]

REICHARDT, K. & TIMM, L.C. Solo, planta e atmosfera: Conceitos, processos e aplicações. Barueri, Manole, 2008. 478p.         [ Links ]

SHUAI, X. & YOST, R.S. State-space modeling to simplify soil phosphorus fractionation. Soil Sci. Soc. Am. J., 68:1437-1444, 2004.         [ Links ]

SHUMWAY, R.H. Applied statistical time series analyses. Englewood Cliffs, Prentice Hall, 1988. 379p.         [ Links ]

SHUMWAY, R.H. & STOFFER, D.S. Time series analysis and its applications. New York, Springer, 2000. 549p.         [ Links ]

STEVENSON, F.C.; KNIGHT, J.D.; WENDROTH, O.; van Kessel, C. & NIELSEN, D.R. A comparison of two methods to predict the landscape-scale variation of crop yield. Soil Till. Res., 58:163-181, 2001.         [ Links ]

TIMM, L.C.; REICHARDT, K.; OLIVEIRA, J.C.M.; CASSARO, F.A.M.; TOMINAGA, T.T.; BACCHI, O.O.S.; DOURADO-NETO, D. & NIELSEN, D.R. State-space approach to evaluate the relation between soil physical and chemical properties. R. Bras. Ci. Solo, 28:49-58, 2004.         [ Links ]

TIMM, L.C.; REICHARDT, K.; OLIVEIRA J.C.M.; CASSARO F.A.M.; TOMINAGA T.T.; BACCHI O.O.S. & DOURADO-NETO, D. Sugarcane production evaluated by the state-space approach. J. Hydrol., 272:226-237, 2003.         [ Links ] R. Bras. Ci. Solo, 35:97-104, 2011



Received for publication in November 2009 and approved in December 2010.



(1) Extracted from the dissertation of the first author.

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License