Logistic model to selection of energy cane clones. Logistic model to selection of energy cane clones

: Logistic regression analysis is a technique that may aid genetic breeding programs in the selection of clones, especially in the early stages where experimental accuracy is low. This research aimed to identify the most important agronomic traits for energy cane clonal selection, and to verify the efficiency of the logistic model in predicting the genotypes to be selected. Evaluations were carried out on 220 clones in the first ratoon. The data were subjected to binary logistic regression analysis. Stalk number per meter was the most important trait in the selection of energy cane clones. In addition, plants with lower grade for smut incidence had a greater chance of being selected. The predictive capacities of the qualitative and quantitative models were 94% and 88%, respectively. The use of a qualitative model proved to be effective at predicting the number of energy cane genotypes to be selected and could be used as a selection strategy.


INTRODUCTION
The constant search for alternative energy sources to replace the use of fossil fuels has prompted sugarcane breeding programs to develop sugarcane varieties with higher fiber content, called energy cane. The new energy cane varieties aimed to meet the need of the market to either produce second generation ethanol or to cogenerate electric energy.
Improving the efficiency of clonal selection in the T1 and T2 phases is still a challenge in obtaining sugarcane varieties (ZHOU et al., 2012). The early stages of the genetic improvement of sugarcane have low experimental precision due to the small amount of propagating material, which limits the number of repetitions. In the later stages, increasing the amount of propagation material allows a larger number of replications to be performed and, consequently, the experimental precision increases.
In visual selection, the rejected of a genotype involves a combination of different traits such as floral induction, presence of diseases, and sugar and fiber contents, which influence the decision (BRASILEIRO et al., 2016). Knowledge of characters that are most important in the selection of sugar cane Borella et al. clones could help breeding programs by reducing the number of characters to be evaluated.
Logistic regression analysis was developed for investigations in which the response variable is categorical (binary or multinomial). The choice of whether to select or reject a clone is binary, so logistic regression analysis can be applied to evaluate the effect or discriminating power of each agronomic character used as a selection criterion, as well as to predict the genotypes to be selected (AGRESTI, 2007).
The use of the logistic model as a statistical tool to support selection based on production components has been shown to be efficient in sugarcane (ZHOU et al., 2014;BRASILEIRO et al., 2016); however, to date, no studies have used this technique for the selection of energy cane clones. Therefore, the objective of this research was to identify the most important agronomic traits during the energy cane clonal selection process and to verify the efficiency of the logistic model in predicting the genotypes to be selected.

MATERIALS AND METHODS
The research was developed at the Experimental Station of Paranavaí, belonging to the Federal University of Paraná, with a latitude of 22º58'41.21 "S; longitude 52° 28'4.836 "W; altitude of 470 m. The climate of the region is classified as Cfa according to Köppen, with annual precipitation of 1200 to 1400 mm, mean annual temperature between 22 and 23 ºC (APARECIDO et al., 2016), and dystrophic Red Latosol soil (SANTOS et al., 2013).
A total of 220 clones, selected in families of complete siblings in the first selection phase, were evaluated during the second selection phase. An incomplete block design was used, with each block composed of 22 clones. The plots consisted of two, 5 m rows, spaced 1.4 m apart and with 18 buds planted per meter. The evaluations were carried out at the end of the first ratoon cycle in July 2016.
The evaluated traits were: stalk diameter (SD), mean of 10 stalks per plot measured in centimeters in the middle third of stalk; stalk number (SN), counted at the number of stalks present in a meter within the plot; stalk height (SH), measured in meters from ground level to the first visible auricle for 10 stalks per plot; stalk weight (SW), mean weight of 10 stalks per plot in kilograms; smut incidence (SI), the counting number of stalks affected in the plot; sucrose content (SC); and fiber content (FIB). The last two characters were obtained by technological analysis, following the technical standards for determining the quality of raw material, as recommended by FERNANDES (2011).
Visual evaluations were made of the following traits: straw removal (SRn), which consists of releasing the leaf blade; stalk diameter (SDn); stalk height (SHn); stalk number (SNn); lateral sprouting (LSn); bud prominence (BPn); growth habit (GHn); flowering (FLn); pith (Pn), the stalk was cut in the middle third and the proportion of the affected internal diameter was measured; plant vigor (PVn), tillering of the clone was taken into account; brown rust incidence (BRIn); orange rust incidence (ORIn); and smut incidence (SIn), according the notes presented in table 1 and table 2. Brown rust and orange rust diseases were evaluated using the diagrammatic scales developed by AMORIM et al. (1987) and KLOSOWSKI et al. (2013), respectively. In the evaluation of smut incidence, all plants in the plot were considered, and the number of stems with incidence of the disease was counted and plots were later assigned grades from 1 to 6. The diseases were evaluated at 9 months in the first ratoon, in March 2016.
Both the assignment of grades for the evaluated traits and mass selection were performed by a technician from the sugarcane genetic improvement program with experience in visual selection. The variable response consisted of selecting or not selecting a clone, classifying the clones that were selected as (1) and classifying the clones that were rejected in the selection process as zero (0). This was carried out in the field.
The data were submitted to binary logistic regression analysis using the R (R Development Core Team, 2016) statistical program. In the first step, the complete models with all variables were adjusted, later the variables with less significance were excluded, one by one, until the significant variables that made up the reduced models remained. For the prediction of the model, a cutoff point of 0.5 was used; in this way, individuals with a probability of selection above 50% were selected.
The estimates of the coefficients of predictor variables estimated by logistic regression were used to construct the functions of cumulative logistic regression distribution which was composed of two models, one equation using qualitative traits and the other using quantitative traits (Eq. 1,2). Finally, equations 1 and 2 (Eq 1 & Eq 2) were used to calculate the probability of selecting a clone.
Eq. (1) Eq. (2) The parameters were interpreted considering the odds ratio (OR) (HOSMER et al., 2013), which is the ratio between the probability of selecting and the probability of not selecting a clone, calculated from the exponential of the coefficient estimated by the regression associated with each explanatory variable.
The models were compared using the Akaike Information Criteria (AIC). The Predictive Capacity or Accuracy (PC: proportion of hits of a model) of the models were also calculated. That is, the proportion of true-positives and true-negatives (defined based on the clones selected in the mass selection) in relation to the number of observations was calculated as follows: Where TP = true positive; TN = true negative; n = number of observations.

RESULTS
The qualitative traits that were significant according to the logistic regression analysis for mass selection, applied in phase T2, were PVn, SIn, and SNn (Table 3). For the quantitative model, only SH, SN, SI, and SW were significant.
BRIn, ORIn, Pn, FLn, SRn, SDn, SHn, LSn, BPn, and GHn in the qualitative model and Table 1 -Description of the traits evaluated in the second selection phase of energy cane and their respective notes. Source: RIDESA.

Traits
- Table 2 -Notes on brown rust incidence (BRIn), orange rust incidence (ORIn), and smut incidence (SIn) in energy cane. -45 -SC, FIB, and SD in the quantitative model were not significant coefficients. New models (reduced models) were made using only the traits that were significant, after removing the variable by variable in decreasing order of significance. According to the odds ratios, SNn, SIn, and PVn, in that order, were more important in the qualitative model and SH, SW, SN and SI were important in the quantitative model (Table 4). The Akaike Information Criteria (AIC) indicated a better fit in the reduced model for qualitative traits (AIC = 68.277) compared to the full models, which had AIC values of 78.445 and 149.320, respectively (Table 3). Smaller AIC values reflect a better overall fit (AKAIKE, 1974) (Table 3 and  table 4). The removal of traits that do not contribute to improved decision making allows the adjustment of a better predictive model.
The energy cane clones that presented a higher grade for number of stalks were 4,433 times more likely to be selected compared to clones that obtained a lower score for SNn. Clones that scored low for smut incidence were 0.349 times more likely to be selected than clones that scored high for smut (Table 4).
The mass selection performed in the field selected 38 clones, while the qualitative and quantitative reduced models selected 42 and 20 clones, respectively. If the accuracy of the models is considered, the qualitative reduced model had a predictive capacity of 94% and the reduced quantitative model had a predictive capacity of 88% (Table 5).    The clones that were selected by the mass method, the reduced qualitative model, and the quantitative model presented a maximum of 15 stalks affected by smut. Of the 38 clones selected in the field by the mass method, only two clones showed low incidence of smut ( Figure  1A). The qualitative model selected clones with low incidence of smut and high productivity as well as clones with low incidence of smut and low productivity. The quantitative model selected a clone with low incidence of smut that showed high productivity (Figure 1 C).
For the model using qualitative traits, four genotypes were selected in relation to mass selection (Table 5). It was observed that the average fiber Table 4 -Parameters estimated by logistic regression models of qualitative traits (qualitative reduced model) and quantitative (quantitative reduced model) traits used in the logistic models adjusted to the mass selection performed in the second test phase, in energy cane.
Table 5 -Classification of the number of selected and discarded genotypes, using qualitative and quantitative logistic regression models with mass selection applied in the second selection phase as a response. content and sucrose content remained practically the same across all the models in relation to mass selection (Table 6).

DISCUSSION
Using logistic regression analysis, it was possible to identify the traits that significantly influenced the breeder's decision of whether to select a particular clone. Significant coefficients in the regression analysis indicated that the predictor variable influences the decision to select or reject a clone. Similar to multiple linear regression, the selection of variables is used to eliminate those that are not significant from the model (ZHOU et al., 2014).
Traits that did not contribute significantly to the selection decision (Table 3 and table 4) may also be less important in the decision of whether to select a clone or not, and be less important in terms of optimizing the selection process. In addition, traits with no significant contribution may indicate to the breeders that these characters are less variability in populations. According to ZHOU et al. (2013), production components identified as not significant in the adjusted model may indicate traits with lower variability in populations, and can be used to advise strategies to diversify specific traits and thus improve variability as well as achieve greater genetic gains.
The number of stalks is an important trait that influences decision making in stage T2, especially for genotypes designed to obtain biomass. The number of stalks is related to the production of stalks per hectare (TCH), due to its strong direct effect on TCH (SILVEIRA et al., 2015).
During the first phase of selection (T1) in sugarcane breeding programs in southern Africa, the number of stalks is also of great importance in the process of individual selection of sugarcane (ZHOU et al., 2014). PEDROZO et al. (2008) showed that plants that are selected with less than six stems in the T1 phase obtain low yields in the T2 phase, and thus recommended the selection of plants with more than five stalks. However, other traits must also be considered to ensure that the selected clones have high productivity.
The presence of smut was also an important trait in the decision of whether to select or not to select a clone. Although, energy cane is considered resistant to diseases, the present study identified that the presence of smut was the most influential trait in decision making, due to the high incidence that occurred in the study population.
Sugarcane cultivars are generally obtained from crosses between the species Saccharum officinarum and Saccharum spontaneum (WANG et al., 2018). Saccharum spontaneum dominates some characteristics, such as high fiber content (WACLAWOVSKY et al., 2010), low sucrose content, and resistance to pests, diseases, and abiotic stresses (MING et al., 2006). However, the Saccharum officinarum species has few stalks, but with large diameter and high sugar content (MING et al., 2006). Smut, caused by the fungus Sporisorium scitamineum, is an extremely important disease in sugarcane cultivation, since it can cause significant losses to productivity, being an important factor in the decision making at the time of selection. In the research done by BRASILEIRO et al., (2016), smut incidence was a trait that did not have much importance in the selection of the clones as the study was completed on another population in which the incidence of the disease was low due to the environmental conditions. Environmental factor influences the occurrence of diseases in the southern region of Brazil. For example, temperature and humidity conditions are more favorable for the occurrence of smut compared to the northeast region of Brazil, which makes the presence of smut an important variable in the selection process in the state of Paraná, where the research was carried out. The two logistic regression models adjusted for phase T2 in energy cane were efficient in predicting the individuals to be selected, as they are highly accurate. Therefore, the present research did not aim to replace mass selection, but showed that logistic regression models can be a tool to assist in the selection process, especially in the first stages, where there are many genotypes and, due to the little material propagation, repetitions are not possible, which makes visual selection subjective, especially for highly inheritable characters. The use of statistical techniques can be used as a tool to assist in the clone selection process.
Although, the qualitative and quantitative reduced models had high predictive capacities, when analyzing the traits considered in each model, it was observed that the qualitative model was more practical, since it used the assignment of notes to the traits. In genetic improvement, the use of qualitative variables makes the process simpler and faster in relation to the use of continuous variables, due to the need for measurements, time, and equipment. The explanatory variables used to establish the logistic model in the present research are easily evaluated.
The results of this study demonstrated the potential for logistic regression to assist in the clonal selection process, since the technique is able to contribute to the best decision using the same criteria used by the professionals responsible for the selection during the second selection phase of energy cane genetic improvement programs.

CONCLUSION
Stalk number and smut incidence were the most important characters in decision making during the selection of energy cane clones in the second selection phase.
The qualitative and quantitative models was efficient in predicting the clones to be selected and could be used in decision making during energy cane clone selection.