Nonlinear quantile regression to describe the dry matter accumulation of garlic plants. Nonlinear quantile regression to describe the dry matter accumulation of garlic plants

: The objective of this study was to adjust nonlinear quantile regression models for the study of dry matter accumulation in garlic plants over time, and to compare them to models fitted by the ordinary least squares method. The total dry matter of nine garlic accessions belonging to the Vegetable Germplasm Bank of Universidade Federal de Viçosa (BGH/UFV) was measured in four stages (60, 90, 120 and 150 days after planting), and those values were used for the nonlinear regression models fitting. For each accession, there was an adjustment of one model of quantile regression (τ=0.5) and one based on the least squares method. The nonlinear regression model fitted was the Logistic. The Akaike Information Criterion was used to evaluate the goodness of fit of the models. Accessions were grouped using the UPGMA algorithm, with the estimates of the parameters with biological interpretation as variables. The nonlinear quantile regression is efficient for the adjustment of models for dry matter accumulation in garlic plants over time. The estimated parameters are more uniform and robust in the presence of asymmetry in the distribution of the data, heterogeneous variances, and outliers. Nonlinear quantile regression to describe the dry matter accumulation of garlic plants.


INTRODUCTION
Studies regarding the growth trajectory of plants are very important to accomplish their appropriate handling. Information regarding several methods of care, such as manure handling, can be identified with the study of curves of dry matter accumulation of the vegetable. The growth rate of the plant varies along its development, demanding different amounts of nutrients in each stage (SOUZA & MACÊDO, 2009).
Nonlinear regression models are based on theoretical considerations inherent in the phenomenon to be modeled, and are appropriate to describe growth curves (MAZUCHELI & ACHCAR, 2002). Therefore, it is possible to describe the relationship between the dry matter accumulation of plants and the number of days Puiatti et al. after planting with the parameters of interest, such as asymptotic weight, which represents the final weight of the plant at maturity, and growth rate, which indicates the time that the plant takes to reach maturity. Nonlinear regression models have been used in several growth curve studies in cultures of plants, such as banana trees (MAIA et al., 2009), cassava (SILVA et al., 2014 and coffee (SOUSA et al., 2014). According to REIS et al. (2014) and MACEDO et al. (2017) the Logistic model is the most suitable for the description of dry matter accumulation in garlic plants.
Curves of plant growth usually present heteroscedasticity between observations in different periods, asymmetry in the data set, and even outliers. Nonlinear quantile regression presents an alternative, also applicable to functions that are not linear in the parameters. The quantile regression allows an estimate of conditional quantiles (Q[Y|X]), adjusting the relationship between the time and percentiles of the dependent variable. This is different from the ordinary least squares method which uses conditional means (E[Y|X]) to obtain the functional relationship between the variables. As a result it is possible to obtain more robust models even in the presence of outliers, asymmetry and heteroscedasticity (KOENKER, 2005;HAO & NAIMAN, 2007).
Nonlinear quantile regression models were considered in the study of growth curves of plants, such as SORRELL et al. (2012) and MUGGEO et al. (2013). PUIATTI et al. (2018) used different quantiles of a nonlinear quantile regression model to classify garlic accessions according to their growth rate and asymptotic weight.
In studies related to genetic diversity, it is also useful to compare parameters of the growth curves and identify those with similar performances. With cluster analyses it is possible to separate individuals among groups, maximizing the homogeneity inside each group, and the heterogeneity among different groups (SILVEIRA et al., 2011;AZEVEDO et al., 2012;FARIA et al., 2012). Therefore, it is possible to identify accessions with closer estimates of parameters, and it is expected that these will have similar dry matter accumulation curves.
The objective of this research was to adjust nonlinear quantile regression models for the study of dry matter accumulation in different genotypes of garlic (Allium sativum L.) over the planting period, and to compare them to models fitted by the ordinary least squares method.

MATERIALS AND METHODS
Nine Allium sativum L. (Table 1) accessions were evaluated. The experiment was carried out in the period from March to November, in an experimental area belonging to the Plant Science Department of the Universidade Federal de Viçosa (UFV). The location was in the Zona da Mata region of Minas Gerais, Brazil, geographical coordinates: 20º45'S and 42º51'W, at an altitude of 650 m. A completely randomized experimental design with four replicates was used.
The experimental units were composed by four longitudinal rows of 1 m length, with planting space of 0.25 x 0.10 m, with the plants of the two central rows considered as useful. The total dry matter of the plant (TDMP), expressed in grams by  plant, was evaluated in four periods: 60, 90, 120, and 150 days after planting (DAP). Descriptive statistics of the data are presented in table 1. The TDMP was obtained by the sum of the dry matter content of leaves, pseudostems, bulbs and roots. The Logistic regression model was used to describe the total dry matter accumulation of the plant over the planting period. Models were adjusted for each one of the accessions, using all of the observations of the accession. The method of estimation for the parameters of the models was the ordinary least squares (O. L. S.), using Gauss-Newton's iterative method. The Logistic model was defined as, where: y i = observation of the response variable, the total dry matter of the plant (TDMP) expressed in grams; x i = predictor variable, represented by the periods of evaluation of the dry matter of the plant (days after planting, DAP); β 1 = parameter that represents the asymptotic weight of the accession; β 2 = location parameter without biological interpretation; β 3 = growth rate of the accession; and ε i is the random error, with . A nonlinear quantile regression model (Q. R.) was also adjusted for each accession, using the quantile τ = 0.5. This model was adjusted by an Interior Point Algorithm proposed by KOENKER & PARK (1996). The model is specified as, where τ refers to the assumed quantile, with τ ∈ [0,1].
The Akaike Information Criterion (AKAIKE, 1974) was calculated to compare the goodness of fit of the models of both approaches. Smaller values of the Akaike Information Criterion indicated better adjustment of the model to the data.
For each one of the approaches, the accessions were grouped using the parameters with biological interpretation (β 1 and β 3 ) as variables. This analysis makes it possible to obtain the most similar accessions, according to the adjusted models. The dissimilarity measure adopted was the standard squared Euclidean distance. The accessions were grouped using the Unweighted Pair Group Method with Arithmetic Mean, UPGMA (CRUZ et al., 2011). The MOJENA (1977) procedure was used to determinate the optimum number of groups, with the stopping rule k = 1.25. After the clustering of the accessions, a curve of O. L. S. and another of Q. R. were fitted for each group of accessions.
The computational analysis was performed using the R software, version 3.2.1 (R DEVELOPMENT CORE TEAM, 2018).

RESULTS AND DISCUSSION
A model of ordinary least squares (O. L. S.) and a model of quantile regression (Q. R.) were adjusted for each one of the accessions in the study. The difference among the accessions is shown in table 2, observable from the parameters β 1 and β 3 that represent the asymptotic weight and the growth rate, respectively. The estimates of the asymptotic weight vary from 13.171 to 36.823 g in the models adjusted by the method of O. L. S., and from 15.243 to 39.203 g  in the Q. R. (τ = 0.5) models. The estimates showed a smaller coefficient of variation when compared to the estimates and, with values varying from 0.081 to 0.165 (O. L. S.) and of 0.054 to 0.123 (Q. R.). The estimates of the parameter β 2 presented the largest variation, and were bigger in the O. L. S. models.
Of the accessions analyzed by the O. L. S. models, the accession Patos de Minas (BGH 4505) presented the largest asymptotic weight and a growth rate above the average. The unidentified (2) (BGH 4899) is the accession that reached maturity faster, even though possessing smaller asymptotic weight. The accessions 2, 3, 6, 7 and 9 possess some outliers in their observations of total dry matter of the plant (TDMP), which is reflected in the parameters estimated for their models. The curves of these models distanced themselves from the other points because the average is easily affected by these extreme values (CECON et al., 2012). Many authors opt to remove extreme values to avoid inconveniences these bring out, but that sacrifices information in the data because the outlier is part of the phenomenon in study (LY et al., 2013;BARROSO et al., 2015).
In addition to the existence of possible outliers, there is still the issue of the heteroscedasticity: the variances are not same for all of the observations. As the plant grows, the values of TDMP visibly distance themselves, resulting in a greater variance in the last periods than in the initial periods. This didn't represent a problem for nonlinear regression when the sample is big enough, because the inference in this case is based in the asymptotic theory (GUJARATI, 2011). Some accessions also present asymmetry in their observations, which makes the curve generated by the O. L. S. distance itself from the central position of the distribution due to the displacement of the mean (ARAÚJO JÚNIOR et al., 2016).
The mean of the estimated parameter β 1 was 25.118 for the Q. R. (τ = 0.5) models. The estimates of the parameter growth rate β 3 presented the smallest variation, and average of 0.090. For these models, the accession Patos de Minas (BGH 4505) was also the one that presented the largest asymptotic weight, but it possessed the smallest growth rate if compared to the other estimates. It is followed by the accessions Cateto Roxo (BGH 4567) and Branco de Dourados (BGH 4491). Branco de Dourados showed a growth rate closer to the one of Patos de Minas, while the Cateto Roxo showed one of the highest growth rates. The accession Amarante Novo Cruzeiro (BGH 5940) possesses the highest growth rate and the second lowest asymptotic weight. The accession unidentified (2) (BGH 4899) presented the lowest asymptotic weight, and a medium growth rate. Figure 1 shows the curves of total dry matter accumulation of the garlic plants adjusted by the two methodologies for the nine accessions. Comparing the models obtained by the two methods, the estimates of the asymptotic weight presented lower variation. Some estimates were very close, and the model Q. R. (τ = 0.5) presented larger values for the parameter β 1 for most of the accessions. The estimates of O. L. S. were smaller, possibly underestimating the final weight of the accession. The estimates of the parameter β 2 varied considerably, but this is just a location parameter and that didn't interfere in the interpretation of the other parameters. In relation to the estimate of the growth rate, the O. L. S. models presented larger values for most of the accessions, affirming that the ripening happens earlier than in the Q. R. (τ = 0.5) models. In general, the Q. R. curves were more concordant with the observed values; the same result is verified by ARAÚJO JÚNIOR et al. (2016).
The Akaike Information Criterion (AIC) varied from 180.122 to 253.846 in the evaluations of the models adjusted by the O. L. S. method. The model fitted for the accession unidentified (2) (BGH 4899) received the best evaluation, while the model fitted for Patos de Minas (BGH 4505) received the worst classification.
Among the evaluations of the fitted Q. R. (τ = 0.5) models, the AIC varied from 169.177 to 238.179. All of the Q. R. (τ = 0.5) models had smaller values of AIC than the respective models adjusted by O. L. S. The Q. R. (τ = 0.5) model adjusted for the accession unidentified (2) (BGH 4899) was the best appraised by the AIC. This accession possesses a smaller total dry matter accumulation and smaller variance in each analyzed period, which did contribute to a better evaluation of the O. L. S. and the Q. R. (τ = 0.5) models. The model fitted to Patos de Minas (BGH 4505) received the worst value of AIC. It is possible that happened due to the fact that this accession presented the largest values of total dry matter accumulation and great dispersion among the observations in each period, which penalized it in this evaluation.
The models were grouped using the UPGMA method. The Mojena procedure determined the cut points in the distance 4.235 for the dendrogram of the Q. R. (τ = 0.5) models, and 6.772 for the dendrogram of the O. L. S. models ( Figure  2). After the clustering of the accessions, a curve was adjusted for each one of the groups of garlic accessions (Figure 3). For the adjustment of the curves of groups  The estimates of the parameters β 1 , β 2 and β 3 , and the AIC value for the adjusted models are shown in table 3. The clustering of the Q. R. (τ = 0.5) models formed two groups, one with six accessions and the other with three. The group I includes accessions with larger growth rates and smaller asymptotic weights. Among them, the accession Cateto Roxo (BGH 4567) presents the second largest maturity rate and second largest asymptotic weight. The group II presents accessions with larger estimates of final weight, but smaller growth rates. The clustering of the O. L. S. models also formed two groups, one with eight accessions and other with just one accession. The accession unidentified (2), which was isolated in a group, stood out for smaller asymptotic weight and larger growth rate according to the estimate of the O. L. S. model. Accessions with larger growth rates present smaller final weight. Also, those with larger asymptotic weights take longer to reach that final weight, which is in accordance with the results of REIS et al. (2014), MACEDO et al. (2017) andPUIATTI et al. (2018). This group formation showed that the Q. R. models have more balanced estimates, and that they were less affected for heterogeneous variances and outliers.
In practical terms, the accessions of the group II are suitable when accessions with larger final weight are preferred. The total dry matter accumulated reflects the productive potential of the plant and accessions with larger weight at the end of the crop are more promising economically (DIRIBA-SHIFERAW, 2016). Accessions of the group I are recommended when the harvest happens before 120 DAP.

CONCLUSION
The nonlinear quantile regression is efficient for the adjustment of models for dry matter accumulation in garlic plants over time when  compared to the ordinary least squares regression. These models received better classification by the Akaike Information Criterion, and the estimated parameters were more uniform and robust in the presence of asymmetry in the distribution of the data, heterogeneous variances and outliers.