Open-access Bayesian modeling of the coffee tree growth curve

Modelagem bayesiana da curva de crescimento do cafeeiro

ABSTRACT:

When modeling growth curves, it should be considered that longitudinal data may show residual autocorrelation, and, if this characteristic is not considered, the results and inferences may be compromised. The Bayesian approach, which considers priori information about studied phenomenon has been shown to be efficient in estimating parameters. However, as it is generally not possible to obtain marginal distributions analytically, it is necessary to use some method, such as the weighted resampling method, to generate samples of these distributions and thus obtain an approximation. Among the advantages of this method, stand out the generation of independent samples and the fact that it is not necessary to evaluate convergence. In this context, the objective of this work research was: to present the Bayesian nonlinear modeling of the coffee tree height growth, irrigated and non-irrigated (NI), considering the residual autocorrelation and the nonlinear Logistic, Brody, von Bertalanffy and Richard models. Among the results, it was found that, for NI plants, the Deviance Information Criterion (DIC) and the Criterion of density Predictive Ordered (CPO), indicated that, among the evaluated models, the Logistic model is the one that best describes the height growth of the coffee tree over time. For irrigated plants, these same criteria indicated the Brody model. Thus, the growth of the non-irrigated and irrigated coffee tree followed different growth patterns, the height of the non-irrigated coffee tree showed sigmoidal growth with maximum growth rate at 726 days after planting and the irrigated coffee tree starts its development with high growth rates that gradually decrease over time.

Key words:  residual autocorrelation; nonlinear models; Logistic model; Brody model; Von Bertalanffy model; Richards model.

RESUMO:

Na modelagem de curvas de crescimento deve-se considerar que dados longitudinais podem apresentar autocorrelação residual, sendo que, se tal característica não é considerada, os resultados e inferências podem ser comprometidos. A abordagem bayesiana, que considera informações à priori sobre o fenômeno em estudo tem se mostrado eficiente na estimação de parâmetros. No entanto, como geralmente não é possível obter as distribuições marginais de forma analítica, faz-se necessário a utilização de algum método, como o método de reamostragem ponderada, para gerar amostras dessas distribuições e assim obter uma aproximação para as mesmas. Dentre as vantagens desse método, destaca-se a geração de amostras independentes e o fato de não ser necessário avaliar convergência. Diante desse contexto, o objetivo deste trabalho foi apresentar a modelagem não linear bayesiana do crescimento em altura de plantas do cafeeiro, irrigadas e não irrigadas (NI), considerando a autocorrelação residual e os modelos não lineares Logístico, Brody, von Bertalanffy e Richards. Em vista dos resultados, verificou-se que, para as plantas NI, o DIC e CPOc, indicaram que, dentre os modelos avaliados, o modelo Logístico é o que melhor descreve o crescimento em altura do cafeeiro ao longo do tempo. E, para as plantas irrigadas, esses mesmos critérios indicaram o modelo Brody. Assim, o crescimento da planta do cafeeiro não irrigado e irrigado seguiram padrões de crescimento distintos, a altura do cafeeiro não irrigado apresentou crescimento sigmoidal com taxa máxima de crescimento aos 726 dias após o plantio, já o cafeeiro irrigado inicia seu desenvolvimento com altas taxas de crescimento que vão diminuindo aos poucos com o tempo.

Palavras-chave:  autocorrelação residual; modelos não lineares; modelo Logístico; modelo Brody; modelo Von Bertalanffy; Modelo Richards

INTRODUCTION:

Due to the economic importance of coffee for Brazil, one of the main focuses of research in the agricultural sector refers to the development of coffee tree. A relevant variable to be considered in such research is the plant height, because in addition to being representative of vegetative development, it is also correlated with productivity (ASSIS et al., 2014; PEREIRA et al., 2016; SILVA et al., 2019).

Modeling of the coffee tree growth curve using nonlinear regression models allows the researcher to describe growth over time and obtain information of practical interest, such as the maximum height at adult stage and the speed of growth. They are important to characterize the development of plants (PEREIRA et al., 2014; REIS et al., 2014; FERNANDES et al., 2017; SARI et al., 2018; SILVA & SAVIAN, 2019; JANE et al., 2020; SILVA et al., 2021; MENDES et al., 2021).

To model the growth curve, it should be considered that the growth data taken over time may show residual autocorrelation (PEREIRA et al., 2016; JANE et al., 2020; SILVA et al., 2021). However, in several applied researches, it appears that this characteristic is not incorporated in the modeling, and may compromise the results and inferences (MUNIZ et al., 2017).

An approach that has been shown to be efficient in estimating parameters is the Bayesian approach, which considers both observations and model parameters as random variables. Thus, an a prior distribution for the parameters is specified, denoted by P (θ) - which represents in terms of probability the pre-existing knowledge about the parameters - and a joint probability distribution for the sampling data, called the likelihood function L (θ ǀ Y) (BOX & TIAO, 1992; PAULINO et al., 2018; BOLSTAD & CURRAN, 2016). The combination of this information, by Bayes’ theorem, results in the posterior distribution of θ. This distribution, indicated by P (θ ǀ Y) represents the updated knowledge about the parameters, associating the researcher initial knowledge with the information from the sample: Pθ| Y=Lθ| YPθPY=Lθ| YPθPθ, YdθLθ| YPθ, where P(Y) is a normalizing constant.

Among the advantages of the Bayesian approach, stands out satisfactory modeling even with a relatively small sample and obtaining credibility intervals (MARTINS FILHO et al., 2008; SILVA et al., 2020; MACEDO et al., 2017).

In order to make inferences about a given parameter θi, i= 1, 2, ..., p, the posterior marginal distribution of this parameter should be obtained, which is given by the integral of the posterior distribution in relation to the other parameters. Due to the complexity of these integrals, in most cases, obtaining the marginal distributions is not done in an analytical way, making it necessary to use some approximation method. Markov Chain Monte Carlo methods (MCMC) have been the most used in the generation of samples from a posterior distribution, thus making it possible to obtain an approximate density. However, as they are iterative, these methods may take time to converge and require the researcher to use convergence assessment diagnostics (COLE et al., 2012).

An alternative method to the MCMC, which generates independent samples, does not require convergence assessment and is computationally faster, is the weighted resampling method, proposed by RUBIN (1987) (COLE et al., 2012; LOPES et al., 2012). The generation of samples by this method, also known as sampling-importance resampling (SIR), is done in two stages: initially, the values of a candidate distribution are generated, and in the second stage, resampling is performed considering the weight (probability) assigned to each value generated. According with COLE et al. (2012) the SIR has a more attractive and easy-to-implement theory, so that researchers can focus their efforts on eliciting prior distributions.

Although, there are applied researches in the literature presenting the fitting of regression models using Bayesian methods, such researches generally use linear and nonlinear models . They are limited to the assumption of residual independence (ANDRADE FILHO et al.; 2010; MARTINS FILHO et al.; 2008; SILVA et al., 2020; MACEDO et al., 2017) or consider autocorrelation and use linear models (CHIB & GREENBERG, 1994; MENZEFRICKE, 1999). Specifically, in the study of coffee tree growth, there is no research in the literature using a Bayesian approach considering both nonlinear models and residual autocorrelation.

Given this context, the objective of this study was: to present the Bayesian nonlinear modeling of coffee tree height growth, irrigated and non-irrigated, considering residual autocorrelation and nonlinear Logistic, Brody, von Bertalanffy and Richards models.

MATERIALS AND METHODS:

Data analyzed comes from an experiment carried out with the cultivar Rubi MG1192, in the experimental area of the Department of Agriculture, Federal University of Lavras, in Lavras, state of Minas Gerais.

Coffee crop was planted in January 2001, using the 4m x 1m spacing, which corresponds to the planting of 2,500 plants per hectare. During the conduct of the experiment, crop treatments and phytosanitary control were carried out in accordance with the requirements of the crop. Fertilizer application and liming were done according to the analysis of soil and micronutrients provided by foliar fertilization.

The design used was in randomized blocks, with four replications. In each block, six irrigation regimes were drawn; however, in this study, only two regimes were analyzed: non-irrigated (NI) and irrigated at 60 kPa (most commonly used in this region). The irrigation system implanted in the field was the drip system. Irrigations at 60 kPa occurred when the tensiometer, at a depth of 25 cm, recorded this tension value on the tensimeter, which is measured in kilopascal (kPa).

Each experimental unit consisted of three planting rows with ten plants in each row, the eight central plants of the central row were considered useful. The plant height was represented by the average of the eight useful plants measured in centimeters (cm), measured quarterly, in the period between May 2001 and August 2006, totaling 22 measurements.

To analyze this growth, the nonlinear regression models in table 1 were used (PEREIRA et al., 2014; PEREIRA et al., 2016). Considering that the residuals (white noise) follow a Normal distribution, that is, ui ~ N(0, σ2 u), we have the following expression for the likelihood function: LθY=i=12212πσu2exp-1(yi-μi)22σu2..

Table 1
Expected value and autocorrelation structure for the residuals of the Logistic, Brody, von Bertalanffy and Richards models considering the data referring to the non-irrigated (NI) and irrigated regimes.

Y represents the vector formed by the observed values yi and, the expected value μi and θ vary according to the model and the presence and order of residual autocorrelation, which were verified through the graphs of the autocorrelation (acf) and partial autocorrelation (pacf) functions of the residuals of the ordinary model (considering all assumptions met). In cases where acf suggested residual dependency, pacf was used to identify the p order of the AR (if AR(1), AR(2), etc) (MORETTIN & TOLOI, 2006).

It should be noted that the equations presented in table 1 represent the expression for the mean µ that will be substituted in the likelihood presented in the previous paragraph. In this parameterizations, ti indicated the evaluation times, in days after planting: ti = 90, 180, ..., 1980 days; α is the upper horizontal asymptote, that is, it represents the maximum expected height; β indicates the abscissa of the inflection point in the Logistical and Richards models, in Brody and Von Bertalanffy it is a scale parameter without direct biological interpretation; δ is the parameter that defines the shape of the curve and determines in what proportion of α the inflection point occurs; k indicates the growth rate and determines the efficiency of growth: the higher this value, the less time it will take for α to be reached; ei is the residual which is assumed to be independently and identically distributed according to a Normal distribution with zero mean and homogeneous variance, that is, ei ~ N(0, σ2).

Thus, the adjustments that presented residual autocorrelation of order 1 were redone by the generalized least squares method, incorporating in the modeling the autoregressive of order 1, φ1 parameter, given by: ei= φ1ei-1+ui.

Similarly, for those who presented autocorrelation of order 2, the autoregressive parameter of order 2, φ2 was incorporated:

ei=φ1ei-1+φ2ei-2+ui,

Where ei, ei-1 e ei-2 correspond to the residuals generated in the times t i , t i - 1 e t i - 2 respectively and uiis the white noise, obviously in a structure without autocorrelation ei=ui (MORETTIN & TOLOI, 2006).

The prior distributions were elicited based on the technique known as “prior of specialist”, which determined a plausible range of occurrence for the parameters and some measure of position, such as the mean, according to an expert’s knowledge of the phenomenon studied and about parametric space of parameters. According to MOALA & PENHA (2016), the elicitation of a prior distributions for the parameters constitutes an important phase in carrying out the Bayesian analysis. Thus, after exploratory analysis of the behavior of the growth curve parameters and about Gamma and Beta distributions, the following priorities were elicited.

For parameter α, which indicated an estimate of the maximum height to be reached by coffee trees; therefore α > 0 (SARI et al., 2018), a Gamma distribution with hyperparameters 10 and 0.04 was considered as a prior distribution.

For parameter β, which in the Logistic and Richards models, indicated the moment when the growth rate is maximum (inflection point) and in the other models does not have a direct practical interpretation (SARI et al., 2018), a Gamma distribution with hyperparameters 1.5 and 0.003 was also considered.

Parameter k, which is interpreted as a growth index, generally assumes low values, concentrated near the lower limit of the interval (0.1) (SARI et al., 2018). Thus, a Beta distribution (2.25; 1,500) was considered as a prior distribution.

When using the Richards model, specifying an a prior distribution for the δ parameter is also necessary. For this parameter, which can assume positive or negative values, a Normal was considered with a mean of 0.11 and standard deviation 10.

A generalized Beta distribution was defined as a prior for φ1 whose occurrence interval is [-1, 1], because according to MORETTIN & TOLOI (2006), for the process to be stationary, the following condition must be satisfied: |φ1|<1.

The generalized Beta distribution is obtained as follows: whether V ~ Beta (h1, h2) defined in the interval (0, 1), then the random variable defined by W= (n-m)V+m in the finite interval [m, n] follows a generalized Beta distribution (h1, h2) (MCDONALD & XU, 1995). Thus, when considering V ~ Beta (3, 2), the random variable φ1 defined as φ1 = (1 + 1).V-1 has a generalized Beta distribution (3, 2) defined in the interval [-1, 1]. Therefore, the prior distribution of φ1 is given by [2.Beta (3, 2) -1].

The prior distribution established for parameter φ2 was a Uniform (-1, 1), with the following stationary conditions being observed: φ1+ φ2<1, φ2 - φ1<1 and | φ2|<1.

Hyperparameters determination of the prior distributions (which was done in order to obtain large variance) and the attribution of an non-informative distribution for parameter φ2 is based on the fact that the literature is scarce in research in the plant area related to Bayesian growth curve modeling considering the autocorrelation between the residuals.

For each model analyzed, independence between the parameters was considered. Thus, the joint prior distribution for the vector of parameters of interest θ, P(θ), is given by the prior product of each parameter. The number of parameters in θ varies depending on the model and the autocorrelation structure indicated by the graphic of pacf.

Using Bayes’ theorem and replacing the expressions obtained for the likelihood function and the joint a prior distribution, and discarding the integration constant, we have that the a posterior distribution P (θ | Y), for each case, is given by:

PθYexp-12σu2i=122yi-μi2P(θ).

To obtain an informative summary on the a posterior marginal distribution of each parameter, it is necessary to solve integrals of the form: EhθY=hθYP(θ|Y)dθ, where h(θ) is a conveniently chosen function. However, as it is not possible to solve these integrals analytically, the sampling-importance resampling (SIR) based on SMITH & GELFAND (1992) theory, was used to generate samples. Subsequently, estimates of the mean, mode and highest a posterior density (HPD) were obtained for the parameters (PAULINO et al., 2018).

To indicate the model that presented the best fit, considering the Bayesian approach, the Deviance information criterion - DIC (SPIEGELHALTER et al., 2002) and the criterion of density predictive ordered - CPO (GELFAND & DEY, 1994) were used. As mentioned in ANDRADE FILHO et al. (2010), lowest value for DIC and the highest for CPO indicated better fitted.

All the procedures necessary for this research, such as the generation of samples, summary of distributions, obtaining graphics, CPO and DIC, among others, were performed using the R software (R DEVELOPMENT CORE TEAM, 2020), mainly with packages nlme (PINHEIRO et al., 2019) and coda (PLUMMER et al., 2006).

RESULTS AND DISCUSSION:

Table 2 presents the estimates obtained for the mean and mode a posterior, as well as the HPD interval, for all parameters of the adjusted models, considering the non-irrigated regime (NI).

Table 2
Posterior estimates for the mean, mode and HPD interval of the parameters of the Logistic (L), Brody (B), von Bertalanffy (VB) and Richards (R) models, considering the data referring to the NI regime.

Analyzing table 2, it can be seen that, in general, the estimates for the mean and the mode a posterior of the parameters β and k for the Logistic, Brody and von Bertalanffy models are close, indicating that the a posterior marginal distribution of these parameters can be considered symmetrical. When analyzing the growth of the tomato plant fruit, MENDES et al. (2021) also reported that the marginals of the parameters β and k, for the Logistic model are symmetrical.

However, when comparing the upper quantile of the HPD interval of the marginal distribution of the k parameter referring to the Brody model (qhpd), whose area below it is 97.5%, with the quantile of the Normal distribution (qn), considering k̅=0.00061 and sk = 0.00006 (not shown in the table) have: qhpd = 0.00071 and qn = 0.000727. Thus, as qn > qhpd it is concluded that the Normal has tails heavier than the marginal of parameter k.

Similarly, for parameter β of the Brody model, we have that qhpd = 0.97142 and qn = 0.974174 (considering β=0.9563 and sβ=0.00912 ). Therefore, as qn > qhpd it follows that Normal has tails heavier than the marginal of parameter β.

Regarding the estimates obtained for the parameters of the Richards model, there is a marked difference between the values of the mean and mode a posterior (Table 2). This indicate that the marginals of the parameters are asymmetric, and according to SAVIAN et al. (2009) and SILVA et al. (2020), in future studies, this information can be considered in specification of prior distributions.

Still in relation to table 2, it can be seen that, in general, the HPD intervals obtained for the parameters were significant, affirming the efficiency of the Bayesian methodology in the estimation of parameters, reported in several studies, such as ANDRADE FILHO et al. (2010), MARTINS FILHO et al. (2008) and SAVIAN et al. (2009), among others.

Table 3 lists the estimates obtained for the mean and mode a posterior as well as the HPD interval, for all parameters of the adjusted models, considering the irrigated regime.

Table 3
Posterior estimates for the mean, mode and HPD interval of the parameters of the Logistic (L), Brody (B), von Bertalanffy (VB) and Richards (R) models, considering the data referring to the irrigated regime.

Based on table 3, it appears that, also in the irrigated regime, the HPD intervals were significant for all parameters. The interval obtained for parameter α, considering the adjustment of the Brody model, for example, indicates that the most likely maximum heights, for coffee trees at the adult phase, are between 210.6 cm and 234.8 cm.

Still in relation to Table 3, there was a marked difference between the estimates for the mean and mode a posterior of the parameter φ1 for the von Bertalanffy and Logistic models, suggesting that the marginals are asymmetric to the left. Furthermore, the HPD intervals for the φ parameter do not include zero (Table 2 and Table 3), emphasizing that the autocorrelation is significant in all studied scenarios. As highlighted by Silva et al. (2021), incorporating residual dependence may not necessarily improve the quality of the fit, but it is more consistent, since the assumption of independence is not satisfied in longitudinal data modeling.

The differences between the values obtained for the mean and mode a posterior for all parameters of the Richards model were also accentuated, indicating that the marginal distributions for its parameters are asymmetric. Such behavior was also verified for the marginal distributions of parameters of this model considering the NI regime (Table 2).

To compare the quality of fit provided by the four models analyzed, the Deviance information criterion (DIC) and the criterion of predictive ordered (CPO) density were calculated, the results are listed in table 4. As mentioned lowest value for DIC and the highest for CPO indicated better fitted.

Table 4
Estimates for the DIC and CPO selection criteria, considering data referring to the non-irrigated (NI) and irrigated regimes and the Logistic (L), Brody (B), von Bertalanffy (VB) and Richards (R) models.

Considering the NI regime, the estimates obtained for both adopted criteria indicate that, among the models evaluated, the Logistic model best describes the growth in height of coffee trees, as this model presented the lowest value for DIC and the highest for CPO (Table 4). The marginal distributions of each parameter of the Logistic model for the NI regime are illustrated in figure 1.

Figure 1
Sample histograms of the marginal distributions of parameters of the Logistic model considering first order residual autocorrelation and the NI regime.

Regarding the irrigated regime, the CPO and DIC estimates indicated that the Brody model, among the evaluated models, is the one that best represents the growth curve in height of the irrigated coffee tree (Table 4). The marginal distributions of each parameter of the Brody model for the irrigated regime are illustrated in figure 2.

Figure 2
Sample histograms of the marginal distributions of parameters of the Brody model considering data referring to the irrigated regime.

The growth of the NI and irrigated coffee trees followed different growth patterns. As the Logistic model was more suitable to describe the NI coffee tree height, sigmoidal growth was observed with maximum growth rate at 726 days after planting (estimate of parameter β of the Logistic model in Table 2). For the irrigated coffee tree, the Brody model was more suitable, indicating that the growth started at the maximum growth rate decreasing over time (SANTOS et al., 2019). Moreover, the HPD for the α parameter of the Logistic model for NI coffee was 171cm to 186 cm (Table 2) and for the Brody model for irrigated coffee tree it was 210 cm to 234 cm (Table 3), thus indicating that the irrigated coffee tree had a maximum height greater than the non-irrigated plant, as there was no intersection in the intervals. Importantly, one of the great advantages of working with the Bayesian methodology is to obtain the credibility interval (HPD), which is the interval with (1 - α)% probability that contains the most plausible values for the parameter, which allows direct comparison between intervals (GUEDES et al., 2014; SILVA et al., 2020).

The growth pattern, as well as the values observed for the mean height of non-irrigated coffee trees and the graphical representation of the Bayesian adjustment of the Logistic model with AR(1), as well as the values observed for the irrigated coffee trees and the Bayesian adjustment of the Brody model are shown in figure 3.

Figure 3
Mean height of coffee trees and Bayesian fitting of the Logistic with AR(1) and Brody models considering the NI and irrigated regimes, respectively.

CONCLUSION:

For non-irrigated plants, the Bayesian information criterion and the criterion of predictive ordered density indicated that, among the evaluated models, the Logistic model is the one that best described the coffee plant height growth over time. For irrigated plants, these same criteria indicated the Brody model. Thus, the growth of the non-irrigated and irrigated coffee trees followed different growth patterns, the NI coffee height showed sigmoidal growth with maximum growth rate at 726 days after planting and the irrigated coffee tree starts its development with high growth rates that gradually decrease over time.

ACKNOWLEDGEMENTS

“And was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).”

REFERENCES

  • CR-2021-0275.R1

Edited by

Publication Dates

  • Publication in this collection
    14 Mar 2022
  • Date of issue
    2022

History

  • Received
    08 Apr 2021
  • Accepted
    10 Oct 2021
  • Reviewed
    10 Dec 2021
location_on
Universidade Federal de Santa Maria Universidade Federal de Santa Maria, Centro de Ciências Rurais , 97105-900 Santa Maria RS Brazil , Tel.: +55 55 3220-8698 , Fax: +55 55 3220-8695 - Santa Maria - RS - Brazil
E-mail: cienciarural@mail.ufsm.br
rss_feed Stay informed of issues for this journal through your RSS reader
Accessibility / Report Error