Regression models for productivity prediction in cactus pear cv. Gigante Modelos regressão predição produtividade

: The understanding of plant behavior and its reflexes on yield is essential for rural planning; thus, the biomathematical models are promising in the yield prediction of cactus pear cv. Gigante. This study aimed to adjust, through simple and multiple regression analysis, models for predicting the yield of cactus pear cv. Gigante. The study, using homogeneous treatments, was developed at the Instituto Federal Baiano, Campus of Guanambi, Bahia, Brazil. Data were collected in an area consisting of 384 basic units (plants), in which the yield, defined as a dependent variable, and the predictor variables: plant height (PH), cladode length (CL), cladode width (CW), and cladode thickness (CT), number of cladodes (NC), cladode area (CA), and total cladode area (TCA) were evaluated. Simple linear regression models, multiple regression models only with simple effects for the explanatory variables, and the multiple regression models considering the simple and quadratic effects, and all its possible interactions were adjusted. From this last model, a reduced model was obtained by discarding the less relevant effects, using the Stepwise methodology. The use of the vegetative traits, TCA, NC, CA, CL, CT, and CW, through the adoption of multiple linear regression, quadratic interaction or just the variable TCA by the use of simple linear regression, allows the yield prediction of cactus pear, with adjusted R² of 0.82, 0.76, and 0.74, respectively.


Introduction
The cactus pear cv. Gigante (Opuntia fícus-indica Mill.) presents excellent forms of adaptations to the semiarid ecosystem, mainly due to the photosynthetic process CAM (Crassulacean Acid Metabolism) characterized by stomatal opening and CO 2 capture at night (Taiz et al., 2017), and with efficient mechanisms of the water use .
Due to the high nutritional, energy, and water value, this forage stands out as a strategic food source in the nutrition of ruminants. Likewise, besides the potential to meet the needs of the herd, in balanced diets, the species assumes singular importance in the period of food scarcity and water restriction (Marques et al., 2017).
However, the success of agricultural activity goes beyond production. In this context, proper planning is essential because it allows the producer a tool to estimate production by non-destructive morphometric measures (Guimarães et al., 2013;2019). Thus, organize a technical reserve to ensure the raw material supply to the animals continuously and safely, especially in advance of the dry season.
In the search for understanding about which vegetative descriptors are most associated with the production, as well as the possibility of using these to predict yield (Guimarães et al., 2014), aiming at defining the number of animals to be fed or biomass volume to be commercialized, the use of simple linear regression (SLR) (Bertolin et al., 2017), multiple linear regression (MLR) (Soares et al., 2014;Mantai et al., 2015) and the polynomial and quadratic regression models (Amaral et al., 2017) have been used as a reliable tool.
Given the above, the analysis of plant behavior and its reflexes on productivity is essential for rural planning. Thus, biomathematical models are promising in the prediction of crop yield. Therefore, this study aimed to adjust, through simple and multiple regression analysis, models for predicting the yield of cactus pear cv. Gigante.

Material and Methods
The study was carried out at Instituto Federal Baiano, Campus of Guanambi, Bahia, Brazil, between 2009 and 2011, at geographical coordinates, 14°13'30'' S, 42°46'53" W and altitude of 525 m. The soil was classified as Entisols Lithic. The average annual precipitation and temperature are 670.2 mm and 25.9 °C, respectively (CODEVASF, 2018).
The study followed the format of treatment homogeneity or uniformity trial, in which the entire area implanted with the cactus pear cv. Gigante was submitted to the same agronomic conditions and evaluated at 930 days after planting (DAP) in the third production cycle.
The useful planting area was composed of eight central rows, with 48 plants per row, making 384 basic units (plants). The fresh mass yield of the cladodes (Prod, t ha -1 ), considered the response variable, was determined in the third production cycle. Also, the following predictor variables were evaluated, plant height (PH, cm); cladode length (CL, cm); cladode width (CW, cm), measured using a graduated measuring tape; cladode thickness (CT, mm), defined by the caliper measuring in the central part of the cladode; the number of cladodes (NC, nº), direct count; cladode area (CA, cm 2 ), and total cladode area (TCA, m 2 ), which were estimated by Eqs. 1 and 2, respectively, according to models adopted by Donato et al. (2014) andPadilha Junior et al. (2016).
Through Pearson's correlation, associations between the morphological variables analyzed were evaluated. In the sequence, the simple linear regression models (Eqs. 3 and 4), the multiple regression models only with main effects for the explanatory variables (Eqs. 5 and 6), and the multiple regression models considering simple and quadratic effects and all its possible interactions (Eq. 7) were adjusted by the methods of least squares. (3) where: Prod i -Yield of green mass of cladodes associated with i th observation, t ha -1 ; PH -plant height, cm; TCA -total cladode area, m 2 ; NC -number of cladodes, nº; CA -cladode area, cm²; CL -cladode length, cm; CT -cladode thickness, cm; CW -cladode width, cm; β 0 -intercept; β 1…n -regression coefficients of the models; and, e i -the error associated with the i th observation.
The determination coefficient (R 2 ), the adjusted determination coefficient (R 2 aj ), the Akaike Information Criterion (AIC) (Akaike, 1974), the Bayesian Information Criterion (BIC), and the selection criterion defined by the loglikelihood, which represents the logarithm value of the likelihood function considering the parameter estimates were considered for the selection of regression models. Based on the model represented by the equation (Eq. 7), the Stepwise methodology was used to discard the less relevant variables. Regression analyzes were performed using the R software with the aid of the lm and step functions.
The regression analysis of the estimated productivity was performed with the observed values to test the predictive ability of the regression models. Subsequently, the point of intersection at the origin of the Cartesian plane was fixed, and the significance of the slope of the line was tested by the t-test, assuming as a null and alternative hypothesis the possibility of this coefficient being equal to or different from 1, respectively. Thus, if the coefficient of determination is high and the slope of the line does not differ from 1, the efficiency of prediction is assumed. The data were analyzed using the R software (R Development Core Team, 2016).

Results and Discussion
The coefficients of variation and correlation values of vegetative traits with the yield of cactus pear cv. Gigante, as well as their significance, are shown in Figure 1. About the variability of generic traits, Gomes (2000) proposed stratifying the coefficient of variation (CV) at four categorical levels. Thus, when the range of variation is included in the classes of <10; 10.01 -20; 20.01 -30, and >30%, the variability is considered low, medium, high, and very high, respectively.
The CVs of the evaluated descriptors ranged between 6.91 and 60.19%, with the lowest values in the traits associated with the cladode, such as the area, length, and width of the cladode, except for the cladode thickness which showed very high variability (Gomes, 2000). Donato et al. (2014) report that the dimensions of the cladodes, especially the length and width, are determined by genotypic factors, with the low influence of the environment. However, the proper management of the crop favors the cladode thickness and, consequently, the increase in yield.
On the other hand, several studies report the wide variability of cladode thickness, as this descriptor varies over its length, although usually the thickest or central region of the cladode is measured. Also, the cladode thickness, as it is linked to photosynthetic capacity and water storage (Scalisi et al., 2016), is greatly influenced by the growth and vegetative development stage of the crop (Silva et al., 2010;Pinheiro et al., 2014;Silva et al., 2015).
The evaluated descriptors have a positive linear association with each other, which denotes, besides the high degree of relationship of vegetative variables with the yield, considerable potential of these variables to compose the prediction model.
Similarities between these results are found in Pinheiro et al. (2014) with cactus pear for all evaluated clones. Since, in this referenced study, the number of cladodes of the cactus pear expressed a high correlation with the structural traits It is observed that the highest values of the correlation coefficient were associated with the traits, total cladode area, number of cladodes, and plant height, followed by variables directly related to the cladode, such as the area, length, width, and thickness ( Figure 1). These results are similar to other studies on phenotypic correlation, in which, usually, the variables total cladode area and the number of cladodes express a strong relationship with the variability of cladode yield (Silva et al., 2010;Pinheiro et al., 2014;Padilha Junior et al., 2016).
By the simple linear regression procedure, compact functions were adjusted to estimate yield in cactus pear cv. Gigante, with the significance of the regression coefficients and similarity between R² and adjusted R² (Figures 2A and B).
Predictive models allow estimating yield practically and objectively in the field since it only includes an explanatory variable that is easy to determine. However, besides the best adjustment of R², the equation Prod 1 has a higher predictive quality by the AIC information criterion when compared to the model Prod 2.
Models composed of variables that are easy to measure in the field are studied because they ensure practical applicability, favoring the use of the predictive tool with the insertion of values of a variable of direct measurement in the field, mainly as observed for the model of simple linear regression using the variable, number of cladodes ( Figure 2). In this context, Guimarães et al. (2013) adjusted models with components of the simple determination to estimate banana yield only by directly counting the number of hands in the bunch.
By regression analysis with the multiple linear function, models were tested to determine the yield of cactus pear cv. Gigante, according to the results presented in Table 1. The t-test for the regression models was highly significant (p ≤ 0.001).
The yield of cactus pear cv. Gigante showed a significant correlation with all the traits analyzed (Figure 1), thus justifying the use of these variables as yield predictors. Besides the predictive capacity, the variables that make up the models presented have the advantage of direct measurement in the field in a non-destructive way (Guimarães et al., 2013;. The quality indicators AIC, BIC, and loglikelihood demonstrated that the model Prod i 8 has more significant potential for the prediction of cactus pear yield with R² of 0.7626 and R² aj = 0.7613 (Table 1).
However, for the tested models, the determination coefficient remained with the same approximate adjustment quality (Table 1), despite excluding vegetative traits with moderate and high correlation with yield ( Figure 1), but with no significant effect to compose the model, such as plant height and the number of cladodes, respectively (Table 1). Similarly, Soares et al. (2014) and Leal et al. (2015) showed the stability of R 2 with the association of significant variables with the prediction model.
Based on the adjustment indexes of the models, presented in Table 1, and on the behavior of the equations that estimate yield, the multiple linear regression model allows to predict, in an acceptable way, the yield of the cactus pear through the *, *** -Significant at 0.01 < p ≤ 0.05 and at p ≤ 0.001 by t test, respectively  Table 1. Parameters of the multiple linear regression analysis of the yield (Prod) according to the traits: PH: plant height; TCA: total cladode area; NC: number of cladodes; CA: cladode area; CL: cladode length; CT: cladode thickness; CW: cladode width vegetative traits, total cladode area, and cladode thickness, with simple determination in the field Padilha Junior et al., 2016), which favors the practical use of the model.
The predicted values and the observed values were listed in Figure 3A, considering the value of the slope as a determinant of the model to attest to the quality of this multiple linear regression model. This procedure is justified both by the statistical bias in search of highly significant parameters and by the need to obtain a more compact and robust model. Figure 3B represents the relationship between cactus pear yield and the predictor variables, total area of cladode, and cladode thickness. The coefficients of variation shown in Figures 3A and 3B are associated with the observed yield data. In this context, Soares et al. (2015) add that the model of the easy practical application must be composed by the smallest number of variables possible, with objective determination in the field and precise answer about the inference carried out.
Still, regarding the adjustment of the regression models expressed by the coefficient of determination (R²), there was no difference between the equations (Table 1) regarding the predictive quality to explain the behavior of the data. However, the model Prod. 8 = -93.7883** + 247.7903***TCA + 5.6736***CT as it contains only two descriptors directly related to the cladode (TCA and CT), it becomes more simplified, adequate, and practical.
As for the indexes that define the quality of the equation adjustment, AIC, BIC, and loglikelihood, the lowest estimated values were associated with the Eq. 8 model, therefore, defined as the most appropriate (Table 1) as it presents the greatest proximity between the observed values and the estimated ones (Mello et al., 2018). Leal et al. (2015) argue about the importance of tools that measure the accuracy of the model to substantiate selection in practice.
The values of AIC, BIC, and loglikelihood are directly proportional to the sum of squares of errors. Therefore, the lower the value, the better the quality of the adjustment, defined by the smaller relative distance between the predicted and the real values (Leal et al., 2015).
Thus, the variables, being easy to determine in practice and in a direct non-destructive way, enable the researcher or producer to estimate, with high efficiency, the yield of the cactus pear cv. Gigante. With this, it is configured as an essential tool for the success of rural planning, above all, about the size of the herd to be fed in the drought period or dry season; in which, usually, due to lack of planning, the highest mortality rate of animals occurs in the Brazilian semiarid region, compromising the economic viability of the activity and, consequently, the permanence of man in the field (Marques et al., 2017).
Also, it is worth considering that water is a limiting factor in animal production in regions of arid and semi-arid climates, and the use of palm in the diet of ruminants in drought periods helps animals to supply most of their water requirements (Borland et al., 2014). With this, the estimate of the productivity of the cactus pear cv. Gigante is of great importance since the possibility of predicting the food volume for ruminants achieves a dual purpose with the supply of dry matter and water.
The regression models obtained to estimate the yield of cactus pear cv. Gigante considering interactions and quadratic effects are shown in Table 2. Thus, the most appropriate model was selected according to its highest precision, which is determined by the lowest AIC value, 3290.37.
Besides to the better suitability presented by the AIC, the R² and R 2 aj were superior to the other models with adjustments equal to 0.8187 and 0.7988, respectively, which denotes greater reliability and predictive safety (Figure 4).
Similarly, to the present study, Amaral et al. (2017) made inferences about the yield of white oats in different succession CV -Coefficients of variation associated with observed yield data; TCA -Total cladode area; CT -cladode thickness; ns -Not significant (p > 0.05); **, *** -Significant at p ≤ 0.01 and at p ≤ 0.001 by t test, respectively  systems with other forages. Among the models adjusted to estimate the yield of vegetable biomass and grains, the linear polynomial equations and quadratic regression reached the highest values of R 2 aj , above 0.87, with the highest values for quadratic models.
Although the model with quadratic interactions is composed of a higher number of predictive variables, they are easy to determine in the field to predict the yield of cactus pear cv. Gigante, which ensures the practical viability of the model, as it has been valued in studies on agricultural modeling (Guimarães et al., 2013(Guimarães et al., , 2014Soares et al., 2014;Mello et al., 2018).

Conclusion
The use of vegetative traits: total cladode area; the number of cladodes; area, length, thickness, and width of cladodes using multiple linear regression; quadratic interaction or only the variable, total cladode area, by using simple linear regression, allows the yield prediction of cactus pear cv. Gigante, with R 2 aj of 0.82, 0.76, and 0.74, respectively.