INTRODUCTION:
Brazilian coffee production in 2017 was 45 million bags, and a 29% increase in production is estimated for 2018, with a likely record of 58 million bags (^{CONAB, 2018}). Brazil is the largest coffee producer and exporter worldwide. In 2017, Brazil exported approximately 31 million bags, with 5.2 billion USD in revenue (^{CECAFÉ, 2018}), which is a considerable amount for the country’s agribusiness.
Given the importance of the coffee crop, farmers should maximize their knowledge about the causes and factors that contribute to improved productivity. In this context, understanding the factors that affect plant metabolism and alter coffee productivity and quality has been the focus of researchers and producers. ^{BACHIÃO et al. (2018}) assessed the number of leaves, leaf area, shoot and root dry matter, plant height, and stem diameter of four coffee cultivars using linear and polynomial regression models as a function of different fertilizer doses and observed the adequate fitness of these models. ^{COLODETTI et al. (2015}) compared the effects of the control treatment and a dose of fertilizer on coffee plant growth. The authors fitted the simple linear regression model to the number of leaves and plant height data as a function of age. ^{MENEGHELLI et al. (2016}) assessed the effect of different substrate doses on coffee seedling development and measured seedling height and root and stem and leaf dry matter. ^{MARANA et al. (2008}) compared the effect of different fertilizer doses on coffee seedling growth and fit polynomial regression models to seedling height and root and stem and leaf dry matter data as a function of the doses and obtained satisfactory fits.
According to ^{BACHIÃO et al. (2018}), the number of leaves and leaf area are relevant factors to plant productivity, as intercepting and transforming solar radiation into chemical energy needed for plant growth. ^{DUBBERSTEIN et al. (2017}) highlighted that adequate management enables a coffee plant to attain its full potential. Thus, the success of coffee farming is directly associated with the treatment applied to a crop, and knowledge about the phenological phases of the plant, which is essential because its development is conditional on physiological and environmental factors for its management. The authors also reported that the leaf dry matter content decreases during the reproductive period of coffee plants. Macronutrient mobilization from leaves to fruits in coffee cultivars was assessed by ^{VALARINI et al. (2005}). In the fruit growth phase, the authors observed that the leaf macronutrient content of productive branches of the cultivars decreased and the most productive coffee plants had slightly higher macronutrient concentrations than plants with intermediate productivity.
Given the influence of leaves on coffee plant development, understanding the variation in the number of leaves in coffee plants as a function of seedling age is necessary. However, most studies employ the observed mean number of leaves as a response variable, which fails to satisfy the normality assumption and prevents the use of classical regression models, which require continuous responses. Considering this issue, an alternative is to use generalized linear models, which are useful in studies that involve count data, represented by discrete random variables (DRVs). Note that count data can be modelled for certain distributions using continuous distributions; for example, count data that follows a Poisson distribution and has a high mean can be modelled using a normal distribution. As the leaf counts of coffee seedlings may contain excess zeros, which yield low means per plot, dispersion modelling was considered, considering DRVs. Initially, a Poisson distribution was assumed, followed by a negative binomial distribution with a parameterization that involves a Poisson distribution as an alternative to overdispersion modelling. In this context, Y_{i} (i = 1, 2, ..., n) is defined as the number of leaves observed in coffee plant seedlings. Assuming a Poisson distribution, E[Y_{i}] = Var[Y_{i}] = µ_{i}. However, in practice, the variance (Var[Y_{i}]) may be higher than the mean µ, which evidences a typical situation of overdispersion. Thus, a plausible alternative is to fit the negative binomial model with a log link function, which is also employed in the Poisson model (^{HINDE & DEMÉTRIO, 1998}). ^{HESS et al. (2015}) fitted generalized linear models when evaluating tree growth, assessed the fitness of normal, Poisson and gamma models, and observed that the gamma distribution had the best fit. ^{ROCHA et al. (2014}) fitted the Poisson model to the number of stomata on the abaxial and adaxial surfaces of coriander leaves with satisfactory results. The Poisson and negative binomial models were compared by ^{SILVA et al. (2014}) when evaluating the number of mites on rubber tree leaves, and the negative binomial model showed the best fit due to overdispersion of the data.
In the analysis of the experimental results, when considering the coffee seedling as the sampling unit, seedling leaf counts are expected to differ. Therefore, a model that addresses this heterogeneity should be proposed, as, to the best of our knowledge, there are no studies of leaf count overdispersion, specifically in seedlings, exist; that is, researchers, that typically transform the experimental data and then apply the simple linear regression method. This procedure is not invalid; however, the use of transformation changes the original scale of the data and may hinder the interpretation of the results related to the predictions.
For this reason, the objective of this study was to fit models that address sampling data overdispersion. The Poisson and negative binomial models were compared, and the viability of their use in leaf count data of coffee seedlings was analysed.
MATERIALS AND METHODS:
The data analysed were extracted from ^{LUZ (2017}). The experiment was conducted at the Universidade Federal de Lavras. Seedlings of the “Mundo Novo 379-19” cultivar were planted on January 2016. The inter-row spacing was 3.6 m, and the inter-plant spacing was 0.75 m. The plot consisted of a row with six plants, in which the four central plants were considered useful, and the two plants at the ends were considered border plants. The rows between plots were also considered to be border rows.
The experimental design consisted of randomized blocks to control for possible soil heterogeneity. The analysis of variance was performed by ^{LUZ (2017}), who observed that the block effect was not significant. Thirty treatments that consisted of combinations of three soil covers, two fertilization levels, and five soil conditioners with three replicates and four plants per plot were employed in this study. One block with the soil covered with plastic film, which was treated with Produquímica^{®} controlled-release fertilizer and the soil conditioner coffee hull, was evaluated in this study.
The double-sided, polyethylene-based plastic film was white on the upperside and black on the underside and was installed on the row shortly after coffee planting. The fertilizer was applied according to the manufacturer’s instructions, four days after planting, on a 5-cm-deep side pit and 10 cm from the plant. A dose of 10 L/plant of coffee hull was applied under the coffee plant crown projection after planting. The number of leaves was counted starting in April, when the seedlings were properly established and showed typical growth. The first evaluation was performed on 8 April 2016, and the other evaluations were performed 18, 32, 47, 62, 76, 95, 116, 133, and 153 days after the first evaluation; a total of ten measurements were obtained over time.
This study employed count data to describe the model, and the counts were represented by the random variable Y_{1}, Y_{2}, ..., Y_{n}. For comparison purposes, the Poisson and negative binomial models with log link function were fitted, their linear predictor was given by η_{i} = β_{0} + β_{1}x_{i}, and x_{i} was the covariate described by age.
Incorporating the log link function, the model and were described by
Thus, the mean number of leaves predicted for each seedling was estimated as follows (2):
When fitting the negative binomial model, the same specifications regarding the systematic component and the log link function were maintained; although, increased as shown in equation (3):
where ϕ is the dispersion parameter, which is estimated using the least squares method as a function of Pearson residuals.
Note that the generalized linear model may be fitted using different parameterizations, assuming that the distribution of response variable is represented by a negative binomial distribution. Another important issue is that the parameter ϕ is unknown; and therefore, the distribution of Y_{i} approaches the exponential family class of distributions. Thus, a way to overcome this problem is to fit the negative binomial model considering the following assumptions:
Therefore,
and
With these specifications, the resulting joint distribution is described by
The following marginal distribution was obtained:
resulting in a negative binomial distribution, which was obtained by mixing the distributions cited in (4) and (5):
where y _{i}=0,1,2,….. Thus, Y_{i} ~ negative binomial (m_{i}, f), with [Y _{ i } ] specified in (3).
After defining the models, the parameters were estimated using the reweighted least squares method, in which the parametric vector β = (β_{0}, β_{1})^{T} is estimated using an iterative process and expressed as
in which z = ( z_{1}, z_{2}, ..., z_{n})^{T} with
is a modified dependent variable, which contemplates the variance function and the weights attributed to each observation, and w_{i} = u_{i} considering the canonical link function. The design matrix X and the weight matrix are given by (10):
(^{HARDIN and HILBE, 2018}).
The fitness of the models was assessed using deviance and the simulated envelopes method for residuals (^{HINDE & DEMÉTRIO, 1998}; ^{LISKA et al., 2015}). Tests were performed using the statistical software R (^{R DEVELOPMENT CORE TEAM, 2017}). The fitness of the negative binomial model was assessed using the MASS package and the simulated envelopes for residuals using the hnp package, which adopts a significance level of α = 1% in all statistical tests.
In the diagnostic analysis, the elements h_{i} of the main diagonal of the matrix were assessed to detect the presence of leverage points. This matrix is expressed as
Influential observations were assessed using Cook’s distance and expressed as
where β (i) are estimates of the parameter without the i^{th} observation. Large values of h_{i} or D_{i} indicate that the i^{th} observation is a leverage or influential, respectively. The adequacy of the link function was assessed using the z_{i} plot, as specified in (9) versus η_{i}, where a linear trend indicates that the link function is adequate (^{FARAWAY, 2016}).
RESULTS AND DISCUSSION:
The estimates of the parameters of the Poisson and negative binomial models are outlined in table 1. Results showed that the parameter associated with seedling age (β_{1}) was significant in both fitted models (Table 1), which suggesteds a strong effect of age on the predictive power of both models.
Parameter | Estimate | Standard error | z value | P-value |
β_{0} | 1.894850 | 0.092165 | 20.560 | <0.0001 |
β_{1} | 0.010968 | 0.000795 | 13.770 | <0.0001 |
-------------------------------------------------------Negative binomial------------------------------------------------------- | ||||
Parameter | Estimate | Standard error | z value | P-value |
β_{0} | 2.012354 | 0.125238 | 16.068 | <0.0001 |
β_{1} | 0.009721 | 0.001218 | 7.981 | <0.0001 |
ϕ | 13.53880 | 5.060000 |
The results outlined in table 2 shows devianceestimates for both models. The fit of the Poisson model (P-value= <0.0001) to leaf counts of coffee seedlings is not acceptable and presents evidence of data overdispersion. Maintaining this characteristic, the fit of the negative binomial model is confirmed by the non-significant deviance (P-value = 0.4940) and corroborated by the simulated envelopes obtained for each model (Figures 1 and 2).
The results showed that the assumption of a Poisson response (Figure 1) for the number of leaves in coffee seedlings over time is not confirmed, and the model shows an unsatisfactory fit. The simulated envelopes indicated that the residuals show a systematic trend as they are above the mean, and 65% of the points lie outside the confidence limits, which confirms the data overdispersion.
The simulated envelopes of the negative binomial distribution (Figure 2) showed that the residuals are distributed around the mean and inside the confidence limits; and therefore, the results confirm the good fit of the model. As the data showed overdispersion, the negative binomial distribution adequately described the number of leaves in coffee seedlings, whose estimates are presented in table 1. A practical interpretation of the estimates of the parameters of the negative binomial model (Table 1) is provided as follows: the expected mean number of leaves was expressed as µ_{i} = exp(2.012354+ 0.009721x_{i}) and exp(β_{1}) = exp(0.009721) = 1.009768; therefore, 0.9768% is the expected relative increase in the number of leaves per day. Similar results were obtained by ^{SILVA et al., (2014}), who assessed the number of mites in rubber tree leaves and observed the occurrence of overdispersion, which concludes that the negative binomial model adequately described the data.
Based on the negative binomial model, results illustrated in figure 3 express the relationship between the observed leaf counts and predicted leaf counts in coffee seedlings as a function of age, which show a slow increase in seedlings by the number of leaves until 100 days after the first evaluation, followed by a sharper increase. Figure 3 also shows that the model overestimated the prediction at 153 days after the first evaluation; although, the negative binomial model adequately describes the number of leaves in coffee seedlings.
The results illustrated in figure 4(a) showed that only one observation - labelled (37) - is considered to be a leverage point, whose effect increases the uncertainty of . The observation that is identified is considered to be an outlier as it is distinct from other observations. As only one observation was identified as an outlier and the standard errors of the estimates of the parameters are lower than the estimate (Table 1), this observation was retained in the model. However, in a study characterized by the use of tubes to produce coffee seedlings (Coffea arabica L.), ^{POZZA et al., (2007}) stated that the level of fertilization affected plant growth, specifically plant height, and favoured diseases such as coffee brown eye spot, which is the main defoliation disease.
Results shown in Figure 4(b) identified two influential observations, and it is important to note that the cause of this effect is not explained by this measure. This is because External causes cited by ^{MALAVOLTA et al., (1997}), and ^{SANTINATO (2014}), who stated that variations in the supply of a specific nutrient present in the soil or fertilizer affect a plant’s mineral reserves and metabolic activity, may render these observations influential. However, in this context, predictions for these observations should be considered with caution.
The linear relationship between the linear predictor and the modified dependent variable (z) shown in Figure 4(c) indicates that the specification of the systematic component is correct; that is, quadratic or interaction terms do not need to be incorporated. In adverse situations, in which the functional relationship showeds the use of nonlinear functions between the response variable and the independent variables, the model should be implemented using generalized additive models (GAMs) or a generalized additive model for location, scale, and shape (GAMLSS). This result, which was obtained with the negative binomial model, can be adapted to calculate the coffee seedling quality and growth indices proposed by ^{MARANA et al. (2008}) as an alternative to normal linear models specified with quadratic terms, which renders the model more parsimonious.
CONCLUSION:
The fit of the Poisson model to leaf counts of coffee seedlings was inadequate due to data overdispersion. Due to this characteristic, the negative binomial model adequately described the data.Considering the negative binomial model with the log link function, the expected relative increase in the number of leaves per day is 0.9768%.The residuals test provided by the negative binomial model can be a complementary analysis of the study and assessment of leaf counts in coffee seedlings.