Bayesian inference for the fitting of dry matter accumulation curves in garlic plants

The objective of this work was to identify nonlinear regression models that best describe dry matter accumulation curves over time, in garlic (Allium sativum) accessions, using Bayesian and frequentist approaches. Multivariate cluster analyses were made to group similar accessions according to the estimates of the parameters with biological interpretation (β1 and β3). In order to verify if the obtained groups were equal, statistical tests were applied to assess the parameter equality of the representative curves of each group. Thirty garlic accessions were used, which are kept by the vegetable germplasm bank of Universidade Federal de Viçosa, Brazil. The logistic model was the one that fit best to data in both approaches. Parameter estimates of this model were subjected to the cluster analysis using Ward’s algorithm, and the generalized Mahalanobis distance was used as a measure of dissimilarity. The optimal number of groups, according to the Mojena method, was three and four, for the frequentist and Bayesian approaches, respectively. Hypothesis tests for the parameter equality from estimated curves, for each identified group, indicated that both approaches highlight the differences between the accessions identified in the cluster analysis. Therefore, both approaches are recommended for this kind of study.


Introduction
Garlic (Allium sativum L.), the fourth most economically important vegetable in Brazil, is cultivated in most regions of the country (Mota et al., 2006;Lucini, 2008).In addition to its culinary use, garlic also stands out for its medicinal qualities, such as analgesic, anti-inflammatory, antiseptic, antibacterial, antifungal, antiviral, diuretic, and antioxidant properties, besides being an immune system stimulant (Trani, 2009).
The study on the curves of plant growth and dry matter accumulation of garlic allows to detect problems on the development of the culture (Reis et al., 2014), which contributes to a proper management.According to Pôrto et al. (2007), the curves of dry matter and nutrient accumulation are useful as an indication for care demands in each stage of plant development.
Nonlinear regression models have been shown as adequate to describe these curves, both by the frequentist and by the Bayesian approaches, which have parameters with biological interpretation, such as asymptotic weight and growth velocity (Martins Filho et al., 2008).Puiatti et al. (2013) and Reis et al. (2014) indicated high-fitting quality for the logistic, Gompertz, and Von Bertalanffy models, in garlic accessions, using the frequentist approach.However, there are no reports about these models in garlic accessions using the Bayesian approach.
The Bayesian inference considers the vectors of unknown parameters as random quantities, and any initial information on them can be represented by probabilistic models.Thus, by attributing probability distributions, the Bayesian approach allows to incorporate some knowledge on these parameters before data have been collected.Therefore, to perform a Bayesian inference, it is necessary to specify an a priori probability density function P(θ) that combined with the likelihood L(y 1 , ..., y n |θ), by means of the Bayes' theorem, generates the a posteriori probability density function P(θ|Y n ).Any conclusion on the θ parameter is performed from the a posteriori density distribution, which is represented as , in which Y n = {y 1 , ..., y n }.
According to Rosa (1998), to infer on any element of θ, it is necessary to incorporate the a posteriori integrated distribution of the parameters -P(θ 1 |Y) -over all other parameters.Thus, to make inference on θ 1 , it is necessary to obtain the P(θ 1 |Y) distribution, which is called the posterior marginal distribution, represented by The integrals to obtain the marginal distributions usually does not have analytical solutions, which makes it necessary to use specialized iterative algorithms such as the Gibbs sampler and the Metropolis-Hastings, which are called Markov chain Monte Carlo (MCMC) algorithms.
Usually, in growth curve studies, the researcher is interested in comparing the estimates of the growth curve parameters among the different populations, in order to identify in which one the growth process was most efficient (Silveira et al., 2011).The cluster analysis allow to group individuals with great homogeneity, but with high heterogeneity among them (Johnson & Wichern, 1992;Azevedo et al., 2012;Faria et al., 2012).Therefore, it is possible to group the garlic varieties according to the studied variables, such as the growth rate and dry matter accumulation, derived from the growth curve.
The objective of this work was to identify nonlinear regression models that best describe dry matter accumulation curves over time, in garlic accessions, using the Bayesian and frequentist approaches, and to infer on the estimated curve equality for each group identified by both approaches.

Materials and Methods
The field experiment was carried out from March to November in 2010, in an area belonging to Plant Science Department of the Universidade Federal de Viçosa (20º45'S, 42º51'W, at 650 m altitude), in the Zona da Mata region, state of Minas Gerais, Brazil.
Thirty garlic accessions, registered in the vegetable germplasm bank of Universidade Federal de Viçosa (VGB/UFV), were evaluated (Table 1).A randomized complete block design was used, considering four replicates, and plots were constituted by four longitudinal rows of 1.0 m length.The planting spacing was 0.25×0.10m, which resulted in 40 plants per plot.The total dry matter evaluation was performed in four periods: 60, 90, 120, and 150 days after planting (DAP).
The models used to describe the longitudinal trajectory of the total dry matter accumulation are shown in Table 2, where: β 1 is the parameter that represents the asymptotic weight; β 2 represents the parameter for location (scale), which does not have a biological interpretation; and β 3 represents the maturity rate or growth rate; y i is the observation of the dependent variable; X i is the independent variable; and ε i is the random error, with ε i ~ N(0, σ 2 ).
For the Bayesian approach, estimates of the growth curve parameters were obtained by a hierarchical method, which was adjusted separately for each of the seven models tested.As an example for the logistic model, the Bayesian methodology was applied regarding the following specifications: a) Sample data distribution, , , and c) a priori distribution, ; and τ e is the error variance.
In the last stage, the values for the parameters considered for the previous distributions were specified: ) in which k = 1, 2, and 3.These values were respectively considered as equal to the mean and the variance of the parameter estimates reported in previous studies (Puiatti et al., 2013;Reis et al., 2014) with the same accessions.
The presented Bayesian inference was implemented in the R software (R Core Team, 2015), by the R2OpenBugs pack, in which the Gibbs sampler and the Metropolis-Hastings algorithms were implemented.For all models, 20,000 iterations were used, from which the first 1,000 were discarded to avoid errors associated with the initial values.To ensure sample independence, the sampling interval of four iterations was used.In this way, a final chain of 4,000 observations for each parameter was obtained.To assess the convergence, the tests of Geweke and of Raftery-Lewis (1992) were used by means of the Bayesian output analysis (BOA) package of the R software (R Core Team, 2015).
For the frequentist approach, the parameters were estimated using the method of ordinary least squares, with solutions obtained by the Gauss-Newton iterative process through the nls of the R package.To evaluate the fitting quality of the models, the convergence percentage (C%), the mean square error (MSE), the coefficient of determination (R 2 ), the Akaike´s information criterion (AIC), the Bayesian information criterion (BIC), and the mean absolute deviation (MAD) were used for the frequentist approach; whereas the deviance information criterion (DIC) was used for the Bayesian approach.
After choosing the best model, in each approach, the parameter estimates were grouped using the Ward algorithm, with the aim of grouping the most similar accessions.The method proposed by Ward Jr. ( 1963) is based on the variation changes within and among groups undergoing formation at each step of the grouping process.The measure utilized was the generalized distance of Mahalanobis, which considers the existence of correlations among the analyzed characteristics, and shows differences of scales among the variables.To determine the number of groups, the procedure suggested by Mojena (1977) was used, and the adopted constant value for k was 1.25.
The accession classifications in groups by the cluster analysis was made by estimates obtained from equations that represented the curves related to each group.The next step was to test the hypotheses for the parameter equality of these models in relation to the formed groups.For the frequentist approach, the model identity method for nonlinear regression, presented by Regazzi & Silva (2010), was used.Therefore, to check the equality of the parameter estimates in the accession groups, the F-test was used to evaluate the hypothesis which states that the reduced model, fitted for both groups, is identical to the fitted complete model, according to F(H in which SQRR represents the residual sum of squares of the regression for the given model; Ω is the parametric space for the complete model; ω is the parametric space for the reduced model under H 0 ; t is the number of parameters to be tested; and N is the total number of observations.The considered hypotheses were as follows: 3) : not H 0 (3) , in which k is the number of formed groups.
For the Bayesian approach, the equality tests of the parameters of the models, in relation to the formed groups, were performed using samples obtained from the a posteriori marginal distributions related to differences between the estimates  β 1 and  β 3 for all groups.Thus, the differences showed themselves as an additional parameter in the model, and they allowed to test the hypothesis of the equality of parameters through the highest posterior density (HPD).If the value zero is contained in the interval, it is quite conclusive that the parameters for the two populations involved in the contrast are statistically identical.This methodology was proposed by Silva et al. (2005), to compare curve parameters for the lactation of goats from two populations.Afterwards, it was used to compare the growth curves of Nelore cattle from different genetic groups (Silva et al., 2007).

Results and Discussion
In the frequentist approach, the means and the standard deviations of the evaluated parameters for the fitting quality of the models did not allow the differentiation of the models classification (Table 3); that is, the models with the lowest values for MSE, AIC, BIC, and MAD were also those that showed the highest values for R 2 .The model that best fitted to the data was the logistic (L) model, followed by the Gompertz (G), Von Bertalanffy (vB), Brody (B), Meloun I (M1), Mitscherlich (M), and Meloun II (M2).Reis et al. (2014), studying nonlinear regression models to describe the dry matter accumulation in different parts of the garlic plant, found that the L model was the one that best fitted the data for all plant parts evaluated.A similar result was reported by Puiatti et al. (2013), who identified and grouped the nonlinear regression models that best fitted the description of the total dry matter accumulation of garlic plant over time, where L showed better performance than that of the B, G, L, M, M1, M2, vB, and Meloun III models.The L model fitted the data well in several experiments with nonlinear regression models, for the description of growth curves or nutrient accumulation, as in Pôrto et al. (2007) for onion cultivation, Maia et al. (2009) for banana trees, and Martins Filho et al. ( 2008) who also reported great adjustments for the L model using the Bayesian methodology for the growth data of two bean cultivars.
Among the analyzed models, the only ones that showed convergence for all the accessions were the L, G, and vB models.For the B model, there was convergence for half the accessions.This outcome possibly occurred because this model has no inflection point, which hinders its performance in this sort of study, since these data have a sigmoid format.
Means, standard deviations, and variation coefficients of the parameter estimates for each model, in both approaches, are shown in Table 4.As for the  β 1 estimates, for the frequentist approach, the B, M1, M2, and M models were not good representatives for curves of total dry matter accumulation, since the values of these estimates were very high in relation to the final garlic plant weight.According to Reis et al. (2014),  β 1 and  β 3 are practically the same in the B, M1, M2, and M models.This characteristic may result from the fact that these models are only different reparameterizations of the same model, in which the changes of values for the estimates occur only for the β 2 parameter.However, the L, G, and vB models are good representatives of the obtained results in this kind of study, and showed  β 1 estimate values quite close to the reality of the data.The  β 2 estimate was quite different in all models, whose lowest value (-4.85) was observed for the M2 model, and the highest value (320,524.5)for the L model.This result, however, does not constitute a problem, since such parameter does not have a biological interpretation, and it is only a location parameter.The G, vB, B, M1, M, and M2 models had the lowest  β 3 estimate values that were all very close to one another.The L model had the highest value for the estimate and showed a very defined sigmoid shape.
After the group classification of the accessions, a curve was adjusted for each group in the model that Table 3. Means of the determination coefficient (R 2 ), of the mean squared residue (MSR), of the Akaike information criterion (AIC), of the Bayesian information criterion (BIC), and of the mean absolute deviation of residues (MAD) of the adjusted models for the mean of total dry matter of plant (TDMP) of the 30 garlic (Allium sativum) accessions analyzed, with the respective convergence for each model.As the H 0 (3) was significant (Table 5), we conclude that at least one of the estimates differs from the others; therefore, a single equation cannot be adopted for the three groups.Nonetheless, it is pointed out that the hypothesis that the β 3 parameter value is the same for the three groups was not rejected.Thus, a new twoagainst-two comparison was made to estimate the β 1 parameter, to check which equations were statistically equal.Based on the evaluated hypotheses, only the asymptotic weight estimates for groups I and II were equal (for H 0 (5) , p<0.01, and for H 0 (4) , p>0.01).Thus, it can be concluded that the equations of the groups I and II do not significantly differ, and that they can be represented by one single model.Therefore, only two equations are sufficient to represent the accessions, one for groups I and II, and another for group III, as follows: ŷ = 20.10[1+ 35386.9exp(-0.10x)] - and ŷ = 27.31[1+ 39820.17exp(-0.10x) - , respectively.Using these two adjusted equations, it is possible to conclude that the mean asymptotic weight of the accessions (27.31 g) of group III was superior to those of the other accessions (20.10 g), which is an evidence that these accessions show higher dry matter accumulation at 150 DAP.As for the  β 3 estimate, there were no significant differences, which indicates that, theoretically, the analyzed variations have the same growth rate.The total dry matter accumulation curves were adjusted by the L model for each of the two equations through the frequentist approach (Figure 2).The curve for group III was superior to that for groups I and II, irrespective of time.The two curves reached the asymptotic weight around 130 DAP; that is, after this time, the weight tends to stabilize, and the plants are ready to be harvested without any significant yield loss.
For the Bayesian approach, the convergence was reached for all models, according to the convergence tests of Raftery & Lewis, Heidelberger & Welch, and Geweke (Raftery & Lewis, 1992), since the heating period ("burn in"), the total number of iterations, and the distance among samples ("thin") were superior to the minimum recommended by Raftery & Lewis (1992).Furthermore, all values for the dependency factor (DF) of the criterion were close to 1.According to the authors, it can be said that the chain did not reach convergence when DF was superior to 5.0.The Heidelberger & Welch criterion showed acceptance of the null hypothesis of chain stationarity, making a higher number of iterations unnecessary.For the Geweke criterion, all the obtained p-values were superior to the established significance level of 0.5, and, therefore, they did not provide evidence against the convergence.
To make the adjustments obtained with the two methods comparable, determination coefficients were calculated for the Bayesian approach from the square correlation between the observed and the predicted values.The results were similar to those obtained with the frequentist approach; thus, the L model showed the highest R 2 value (0.99).
All the estimated values by the Bayesian approach showed a lower coefficient of variation value than those estimated by the frequentist approach (Table 4).Moreover, all the estimates of asymptotic weight for the same model, in the frequentist approach, were superior to those found with the Bayesian approach.As for the  β 1 estimate, the M1, M2, and M models, similarly to the frequentist approach, were not good representatives for the data standard behavior in this kind of study, since the value for these estimates was very high in relation to the total final weight of plants.Nevertheless, the L, G, and vB models were good representatives.
The grouping of the  β 1 and  β 1 estimates of the parameters for the L model in the Bayesian approach, by the Mojena criterion with k= 1.25, indicated the optimum number of four groups.The cutoff point in the dendrogram was 66.4, which corresponds to 47.1% of the maximum distance observed at different levels of fusion (Figure 1).The accessions were classified as follows: group A, 1, 2, 4, 5, 6, 11, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28, and 30; group B, 3, 9, 14, and 25; group C, 7, 8, 10, 12, 13, and 15; and group D, 27 and 29.After the accession classifications into groups, a curve for each group was adjusted in the model that best fitted to data (L model) in the Bayesian analysis, and the following equations were obtained: ŷ = 20.Groups A and C did not differ from one another statistically (Table 6), since the HPD interval had a value of zero both for the asymptotic weight and for the growth rate.The remaining differences were all significant.Thus, the accessions can be classified into three groups, in which groups A and C are considered one single group, the A-C group.Therefore, the adjusted equations to represent groups A-C, B, and D were, respectively: ŷ = 20.46[1+ 54919.6exp(-0.1035x)] - , ŷ = 26.81[1+ 151335.6exp(-0.1021x)] - , and ŷ = 15.47[1+ 17553.0exp(-0.092x)] - .We conclude, therefore, that the mean asymptotic weight of the accessions in group B is superior to those of the other groups, which is an evidence that the accessions in this group show a higher dry matter accumulation per asymptotic weight.Except for the access 9, the other ones from the group of higher asymptotic weight, by the Bayesian approach, coincided with those observed in the frequentist approach.As for the group with the lowest weight, the only accession that was classified in both approaches was the number 27.The A-C group had the higher number of accessions ( 24), with a 20.46 g mean asymptotic weight, which is very close to that in group I-II in the frequentist approach (20.10 g).
The estimated asymptotic weight for the group with the highest weight in the frequentist approach was superior to that estimated in the Bayesian one.The asymptotic weight estimated for the group with the lowest weight showed also the same behavior.The Bayesian approach was less rigorous at classifying the groups.
Approximately from 125 DAP, the curve for group B was superior to those of the other groups (Figure 2), whereas the curve for group D was the lowest one from 110 DAP on.This is an evidence that the group B is formed by latter varieties.Thus, if it is desirable to harvest the plants before 125 DAP, the variables in group A-C are recommended.However, if the harvest is to occur after this date, the variables in group B will produce a higher weight.This interpretation from a practical perspective is useful to justify research in the phytotechnology field that involves the estimation of curves by nonlinear models with posterior grouping, and the application of hypothesis tests to compare the generated curves for each group.

Conclusions
1.The logistic model is that which fits best to the yield data of total dry matter of garlic (Allium sativum) plants, in the evaluated 30 accessions, by both the frequentist and Bayesian approaches.
2. The hypothesis tests for the parameter equality of the estimated curves, for each group of accessions, indicate that both methods show the same differences reported by the clustering analysis, which makes them both indicated for experiments of this nature.Table 6.Mean, standard deviation, and a posteriori maximum density interval (HPD) of 95% for the estimate differences of β 1 parameters, and the estimate differences of β 3 parameters, in all groups, by the Bayesian approach.

Figure 1 .
Figure 1.Dendrogram obtained from the grouping of the estimates for β 1 and β 3 parameters of the logistic model, with the 30 garlic (Allium sativum) accessions analyzed by the frequentist (A) and Bayesian (B) approaches.

Figure 2 .
Figure 2. Total dry matter accumulation curves of garlic (Allium sativum) plant adjusted to the logistic model, for each of the accession groups formed by the frequentist (A) and Bayesian (B) approaches.

Table 1 .
The 30 garlic (Allium sativum) accessions used in this study, which are registered in the Vegetable Germplasm Bank of Universidade de Viçosa.

Table 2 .
Nonlinear regression models used to describe the dry matter accumulation in garlic (Allium sativum) plants.

Table 4 .
Mean, standard deviation, and coefficient of variation of the estimates of β 1 , β 2 , and β 3 parameters, for the adjusted models with the frequentist and Bayesian approaches.

Table 5 .
Analyzed hypotheses, statistical values for the F test, number of degrees of freedom, and descriptive level (p-value) of the parameter equality tests by the frequentist approach.