Selection of maize hybrids: an approach with multi-trait, multi-environment, and ideotype-design

The present study aimed to evaluate the applicability and efficiency of the FAI-BLUP index in the genetic selection of maize hybrids, using 84 maize hybrids that were evaluated for cycle, morphology, and yield traits in four environments. Models accounting for homogeneous and heterogeneous residual variances were tested, and variance components were estimated using the residual maximum likelihood. Genotypic values were predicted by best linear unbiased prediction, and factor analysis was applied to group the traits. The FAI-BLUP index was used for the selection of maize hybrids based on ideotype design. Three factors explained more than 70% of genotypic variability, with selective accuracies varying from low (0.46) to high (0.99). Predicted genetic gains were positive for traits related to yield and negative for traits related to cycle and morphology, as is desirable in maize crop.


INTRODUCTION
The growing demand for superior genotypes has led maize breeders to seek auxiliary techniques in the selection process. In plant breeding, multi-environment trials (MET) are useful for evaluating genotypes, testing their performance in a range of environments, and selecting the most superior . In MET, the genotype-by-environment (G×E) interaction is a factor influencing the performance of genotypes under environmental variation (Resende 2015), resulting in a change in genotyping ranking over different environments, which makes genetic selection difficult (Sripathi et al. 2018). In general, various traits have been evaluated in maize breeding, with the aim of supporting the selection and recommendation of the ideotype (i.e., genotypes with simultaneous superior performance in many traits).
Multivariate analysis, which allows genetic selection based on a set of traits, is an important procedure when dealing with MET data. This multi-trait selection is relevant, as superior varieties combine optimal attributes for several traits simultaneously. One possibility in multivariate analysis is to explore and reduce data dimensionality by grouping traits (Yan and Frégeau-Reid 2018). However, MA Peixoto et al. combining several traits in an efficient structure of selection indexes is often a complex and difficult task (Paiva et al. 2020) because there is a reduction in selection gains as the number of traits assessed increases (Cruz et al. 2014).
An outstanding method to deal with MET data is factor analysis (FA) (Nuvunga et al. 2015, Peixouto et al. 2016, Barbosa et al. 2019). This approach allows the selection of genotypes considering both multi-trait and multi-environment information, and overcomes the loss of biological meaning by identifying a smaller group of latent variables (Rocha et al. 2018). The basic aim is to group highly correlated traits in common factors (latent variables) that contain a large amount of information about trait interrelation (Barbosa et al. 2019, Woyann et al. 2020. In FA, the factors that best explain the set of traits analyzed are established. These factors are then isolated to summarize the information and simplify subsequent analyses.
The factor analysis and genotype-ideotype design (FAI-BLUP) index (Rocha et al. 2018) has been identified as an efficient method for genetic selection considering multi-trait, multi-environment, and ideotype design information. It has the advantage of including the correlation structure between traits and the direction of selection identified by the breeder to select genotypes closer to the ideotype (Rocha et al. 2018). This methodology combines the use of FA and mixed models. One of the main advantages of this method is the ability to incorporate the predicted genetic values (BLUPs) in the analysis , Rocha et al. 2019, Woyann et al. 2019. The REML/BLUP methodology is considered the standard procedure for genetic evaluation in crop breeding (Resende 2016, even though in some specific cases, the methods based on least squares return the same results. While the application of this method in maize is not well documented, it has been implemented in other crops, such as the common bean, soybean, and sorghum (Silva et al. 2018, Rocha et al. 2019, Woyann et al. 2020. In this study, several maize hybrid traits were evaluated in four environments to: i) explore the residual variance structures for the prediction of genotypic values in MET; ii) investigate the relationship between traits through factor analysis; and iii) use the factors (latent variables) for the selection of hybrids through the FAI-BLUP index.

MATERIAL AND METHODS
The experiment was carried out between January and July 2018 at four sites (considered as environments: ENV1, ENV2, ENV3, and ENV4), located in the southwest of the state of Goiás, Brazil (Supplementary material, Appendix A). The experimental network consisted of 78 interpopulation hybrids and six commercial hybrids most used in the region (AS 1633, P3646, 30F53, BM 709, P4285, and 30K75), totaling 84 hybrids assessed. In each environment, trials were conducted using a complete block design with three replications and 44 plants per plot. Plots consisted of four 4-m rows, with a spacing of 0.40 m between plants and 0.45 m between rows. To eliminate the competition effects of each plot with its neighbors, only the two central rows were evaluated.
The following cycle-related traits were assessed: time to tasseling (TA) and time to female flowering (FL). TA refers to the days from seeding to 50% of tasseling (when more than half of the tassel releases pollen), and FL refers to the days from seeding to 50% of flowering (when the ear starts silking, and it is possible to see the silk outside of the husk). The following morphology-related traits were assessed: plant height (PH) and ear height (EH), measured in meters from the ground to the flag leaf. Ear length (EL) and ear diameter (ED) were two of the yield-related traits evaluated, based on the mean length (EL) in centimeters, of a row of five unhusked ears, and mean diameter (ED), in centimeters at the center of each of the five, side by side. Ear yield (EY) and grain yield (GY) were also chosen as yield-related traits, using the weight of the plot, with 13% moisture content, converted to hectares, kg ha -1 .
The estimation of variance components and prediction of genotypic values for the traits assessed were made through the residual maximum likelihood/best linear unbiased prediction (REML/BLUP) procedure, according to (Patterson andThompson 1971, Henderson 1975). The statistical model associated with the evaluation of hybrids in a randomized complete block design with one observation per plot is given by the following equation: where y is the vector of phenotypes; r is the vector of replication effects (assumed as fixed), added to the overall mean; g is the vector genotypic effects [(assumed as random) (g ~ N(0, σ 2 g ), where σ 2 g is the genotypic variance]; and e is the vector of residuals [(random) e ~ N(0, σ 2 e ), where σ 2 e is the residual variance]. Uppercase letters (X and Z) represent the incidence matrices for r and g, respectively.
The statistical model associated with the evaluation of hybrids in a complete block design with one observation per plot in several environments is given by the following equation: where y is the vector of phenotypes; r is the vector of replication-environment combinations (assumed as fixed), which comprises the effects of environment and replication within environment, added to the overall mean; g is the vector of genotypic effects [(assumed as random) (g ~ N(0, ge is the G×E interaction variance]; and e is the vector of residuals [(random) e ~ N(0, σ 2 e )]. Uppercase letters (X, Z, and W) represent the incidence matrices for the r, g and i effects, respectively.
For the random effects, significance was tested by the likelihood ratio test (LRT) using the chi-square statistics with 1 degree of freedom and 5% probability of error type I. Models with different residual variance structures (homogeneous and heterogeneous) were tested using the Bayesian information criterion (BIC) (Schwarz 1978).
The selective accuracies (r ĝg ) were calculated using the following equation (Resende et al. 2014): where PEV is the prediction error variance extracted from the diagonal of the generalized inverse of the coefficient matrix of the mixed model equations.
The eight traits analyzed were summarized in factors by FA. The factor analysis model is expressed by the following equation: where X j is the jth environment, with j = 1, 2, ..., k; I jk being the factorial load for the jth environment, associated with the kth factor, where k = 1, 2, ..., m; F k is the kth common factor; ε is the specific factor. The number of factors formed was established such that the average proportion of variance of each environment explained by the common factors, or average commonality, reached at least 70%. The main goal of factor analysis is to describe the original variability of genotypic observations (BLUPs) in terms of a smaller number of random variables, called factors (Cruz et al. 2014). Therefore, the analysis starts with many variables that are reduced to a smaller number of latent variables (factors) representing the original variability. The rotation of the loads from the FA's latent variables (varimax rotation) was used (Mardia et al. 1979), with the aim of facilitating the interpretation of these latent variables.
To select superior hybrids containing a set of traits that approach the ideotype, the FAI-BLUP index was applied using the predicted genotypic values (BLUPs) (Rocha et al. 2018). The number of ideotypes was determined based on desirable and undesirable factors for the traits under selection, and the distance between the genotype and the proposed ideotype was obtained. This distance was converted into a spatial probability, enabling hybrid ranking. The ideotype design used for the FAI-BLUP analysis was assigned as "maximum" for the traits related with productivity (EL, ED, EY, and GY), and "minimum" for cycle-and morphology-related traits (TA, FL, PH, and EH). To compare the efficiency of selection in the FAI-BLUP index, direct selection based on the main trait in maize crop (GY) was performed for each environment, and the harmonic means of relative performance of genotypic values (HMRPGV) (Resende et al. 2014) were assessed over the four environments considering the GY trait (selection based on adaptability, stability, and productivity). In addition, indirect selection through HMRPGV (based on the GY trait) was conducted, and selection gains were calculated for all traits.
The selection gain (SG), considering 20% as selective intensity (17 hybrids), was obtained as follows (Resende et al. 2014 where GV is the genotypic value and p is the number of selected genotypes. The Kappa coefficient (K) (Cohen 1960) was applied to calculate the agreements among selected hybrids by all strategies used, as follows: where A is the number of selected hybrids, between pairs of environments; C is the number of selected hybrids due to chance (C = bD, where b is the selection intensity = 0.2) and D is the number of selected hybrids (17).

RESULTS AND DISCUSSION
The LRT indicated the existence of genotypic variability among hybrids, in both individual and joint analyses, for most traits (Supplementary material, Appendix B). In the joint analysis, only two traits (EY and GY) showed significant G×E interaction effects. The G×E interaction effects did not exhaust all genotypic variability of both traits; hence, the genotypic effects were significant. Regarding the residual variance structure, for EL and FL, the model with heterogeneous residual variance presented the best goodness-of-fit, according to BIC, whereas the remaining traits (TA, PH, EH, ED, EY, and GY) were best represented by the model with homogeneous residual variance (Table 1). Therefore, the superior model that accounted for a specific residual variance structure for each trait, i.e., the model that presented the lowest BIC value, was applied for the estimation of variance components and for the prediction of genotypic values.
The mean selective accuracy (r� ĝg ) of the hybrids ranged from 0.46 (FL in ENV2) to 0.99 (FL in the joint analysis) ( Figure  1). In addition, the joint analysis presented higher values of selective accuracy for all traits analyzed. Similar patterns have been reported in the literature for maize hybrids . Selective accuracy measures the closeness of the genetic values estimated to the real genetic value, as a correlation between these values (Resende 2015). It depends on the number of replications, residual variance, and proportion of residual and genetic variation (Resende and Duarte 2007). In this context, increasing the number of environments by considering the joint analysis implies the maximization of selective accuracy.
In addition, the values of mean selective accuracy increased in the joint analysis compared with the individual analysis for FL and EL traits (Figure 1), where the model with the best fit was related to heterogeneous residual variance. In this case, the selective accuracy was affected by the residual  Resende and Duarte (2007). . For MET analysis, selecting the best-fit model by modeling the residual variance structure allows the selective accuracy to be maximized. MET data are generally related to heterogeneous residual variance structures in many annual crops, such as maize, cotton, and the common bean (So and Edwards 2011, Rocha et al. 2019, Melo et al. 2020. Therefore, the modeling of the residual variance structure is a straightforward and reliable in MET analysis. It was verified by FA that the average commonality was superior to 75% when three factors (latent variables) were considered (considering those latent variables related to eigenvalues superior to 1) for the individual analyses (Table 2). According to Cruz et al. (2014), each latent variable consisted of a group of traits that were strongly associated with each other, but weakly correlated with other traits. In this sense, the first latent variables from the individual analyses were more influenced by TA and FL traits, being interpreted as cycle-related latent variables, as they were more influenced by cycle traits. Similarly, the second latent variable was more associated with PH and EH, allowing the association between the second factor and morphology-related traits; hence, it was directly influenced by traits in the morphology group. The third latent variable was influenced by EY and GY and related to traits from the yield group; therefore, these were termed yield-related traits.
EL and ED presented lower values of commonalities (< 0.60), which did not allow for their association with the latent variables formed. Four latent variables were formed for joint analyses ( Table 2). The first and second were formed by the same traits from the individual analyses, whereas the third was configured by EY, GY, and EL (all yield-related traits). The fourth latent variable was constituted by ED, which is considered a yield-related factor.
According to Murakami and Cruz (2004) and Oliveira et al. (2005), the factor analysis clusters high genetic correlation traits into the same factor and low genetic correlated traits into different ones. Therefore, each latent variable has a biological meaning based on the genetic correlation between pairs of traits. However, the absence of association with other traits (EL and ED) in the individual analyses and the association of these traits with others in the joint analysis was a result of the lower commonality and an indication of the presence of G×E interaction. Even though there is a correlation between these two traits and the others from the yield group, the lower proportion of the variation included  MA Peixoto et al. in the analyses explained by the factors (communality) and the presence of G×E interaction as a disturbance factor, result in these unexpected patterns.
The FAI-BLUP index ranked the hybrids toward the maize genotype-ideotype (Supplementary material, Appendix C). The hybrid ranking changed across the environments, indicating the existence of a G×E interaction. However, hybrids 51, 2, and 26 presented the best performance, including all traits analyzed, across all environments. Factor analysis and its strengths (mentioned previously) were used in the FAI-BLUP index to assess the performance of genotypes based on BLUPs for each environment, and accounting for all traits simultaneously (Rocha et al. 2018).
The Kappa index values (Figure 2) highlight the dissimilarities between environments in relation to hybrid performance through the FAI-BLUP index, apart from ENV3-ENV4, where higher similarities between the best genotypes were found (agreement of 0.85). However, these results demonstrated that the G×E interaction from yield-related traits (EY and GY) caused some disturbance in the analysis. In addition, this would be expected in the secondary traits in the analyses, where the oscillations caused by the G×E interaction also displayed an important factor that ultimately implies lower values of the coincidence index. Barbosa et al. (2019) has found similar patterns in the study of coffee genotypes and Nardino et al. (2020) revealed that the G×E interaction is a factor of disturbance after FA in maize diallel hybrids.
The results indicate that the G×E interaction was significant in only two traits, and the latent variables presented the same pattern in each environment. However, the lower values of coincidence demonstrate that the G×E interaction changes the ranking and reduces the success of the hybrid's indirect selection. This is a direct response to FA, where the traits are considered together in the selection process, resulting in low values of agreement by the Kappa index. According to van Eeuwijk et al. (2016), the presence of a G×E interaction affects the genotype performance over the environment, as gene expression is different at each site.
The three selection methodologies explored here are conceptually different. The FAI-BLUP index ranked the genotypes based on the genotypic aggregate of several traits combined, whereas direct selection (μ + g) ranked the genotypes based on the specific trait (GY in our case), and HMRPGV ranked the genotypes for each location, considering the effects of the G×E interaction (μ + g + ge). These dissimilarities among the selection strategies caused the values of Cohen's Kappa between each strategy to be lower (Figure 3), thereby indicating that the selection conducted by the breeding companies, considering only the GY trait, does not favor the selection of the maize genotype-ideotype.
Indeed, when selection is based only on one trait, indirect selection does not present persuasive results in secondary traits, depending on the magnitude of the genetic correlation between pairs of traits, thereby indicating a correlated Figure 2. Cohen Kappa agreement index of the 17 selected hybrids (selection intensity of 20%) by FAI-BLUP, direct selection based on grain yield trait, and harmonic mean of relative performance of genotypic values (HMRPGV), between pairs of environments. FB1, FB2, FB3, and FB4 = FAI-BLUP index in environment 1, 2, 3, and 4, respectively; GY1, GY2, GY3, and GY4 = grain yield direct selection at environment 1, 2, 3, and 4, respectively; HM1, HM2, HM3, and HM4 = indirect selection (u + g + ge) based on HMRPGV for the environment 1, 2, 3, and 4, respectively. response (Ertiro et al. 2020). For example, indirect selection through HMRPGV presented small gains for all traits except GY and EY, compared with FAI-BLUP (Table 3), whereas gains with values opposite to what was considered desired were found (i.e., PH trait: negative gains were desirable, and only positive gains were found). However, for GY and EY, the gains overtook those gains from FAI-BLUP, as expected, because the HMRPGV was built based on the GY trait, and the EY trait was highly correlated with GY. These facts also highlight the outstanding performance of the FAI-BLUP index in the selection of hybrids searching for the maize ideotype.
The selection of several traits combined often result in reduced selection gain for all traits assessed (Cruz et al. 2014), compared with the direct selection for GY, indirect selection through HMRPGV, and selection for adaptability, stability, and productivity through HMRPGV (Table 3). For example, the top superior hybrids selected in all environments by the FAI-BLUP index (51, 2, and 26), were not presented as superior by direct (only considering FY trait) and HMRPGV selection. Indeed, the aims of each strategy are slightly different, and even though the direct and HMRPGV selections presented higher gains compared to the FAI-BLUP index, the latter should be preferred in the selection process of superior hybrids considering a group of traits. Furthermore, the gain with selection from the FAI-BLUP index presented reasonable values for GY traits (Table 3), as well as for the other traits. Thus, multi-trait selection should be considered as an alternative to reduce costs and to select genotypes close to the maize ideotype.
Among the top hybrids indicated by the FAI-BLUP index were some interpopulational hybrids that overtook commercial hybrids. In general, in crops such as maize, selection is carried out based on GY, which can lead to a selection toward production, neglecting other traits Borém 2018, Coelho et al. 2020). However, the results indicated that selection of the ideotype was feasible. For example, in commercial fields, reducing the time to flowering (fitting the shorter cycle maize between other crop seasons), or the time for the selection process (speeding up the generation advancing, thereby creating more generations in a shorter time) is desirable for breeding programs, and the productivity traits are not harmed. The development of superior hybrids for ideotype design seldom involves simultaneous trait selection. A breeding program must identify hybrids that combine desirable traits from all groups (yield-, morphology-, and cycle-related traits).

CONCLUSION
Models accounting for different residual structures were indicated for each trait. Model selection is an important step in MET data, particularly for annual crops. The results of this study demonstrated that it is possible to reduce the number of traits in the maize genetic assessment using FA, as the groups formed were related to each trait group analyzed (yield-, morphology-, and cycle-related traits). The FAI-BLUP index is suitable for combined genetic selection and can be used to achieve selection gains in all traits analyzed simultaneously; it is superior when compared with direct selection for GY traits, indirect selection through HMRPGV, and selection for adaptability, stability, and productivity (HMRPGV). Table 3. Predicted genetic gain, in percentage, considering the 17 maize hybrids selected based on the FAI-BLUP index for each environment, direct selection (DS) for each environment, and indirect selection for all environments combined and for each environment (u +g +ge) based on harmonic mean of relative performance of genotypic values (HMRPGV). TA: tasseling; FL: lowering; PH: plant height; EH: ear height; EL: ear length; ED: ear diameter; EY: ear yield; GY: grain yield