Models for optimizing selection based on adaptability and stability of cotton genotypes. Models for optimizing selection based on adaptability and stability of cotton genotypes

: In multi-environment trials (MET), large networks are assessed for results improvement. However, genotype by environment interaction plays an important role in the selection of the most adaptable and stable genotypes in MET framework. In this study, we tested different residual variances and measure the selection gain of cotton genotypes accounting for adaptability and stability, simultaneously. Twelve genotypes of cotton were bred in 10 environments, and fiber length (FL), fiber strength (FS), micronaire (MIC), and fiber yield (FY) were determined. Model selection for different residual variance structures (homogeneous and heterogeneous) was tested using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The variance components were estimated through restricted maximum likelihood and genotypic values were predicted through best linear unbiased prediction. The harmonic mean of relative performance of genetic values (HMRPGV) were applied for simultaneous selection for adaptability, stability, and yield. According to BIC heterogeneous residual variance was the best model fit for FY, whereas homogeneous residual variance was the best model fit for FL, FS, and MIC traits. The selective accuracy was high, indicating reliability of the prediction. The HMRPGV was capable to select for stability, adaptability and yield simultaneously, with remarkable selection gain for each trait.


INTRODUCTION
Upland cotton (Gossypium hirsutum L.) is an herbaceous crop and is the most cultivated species worldwide for fiber production. It provides over 90% of the world's cotton. Its cultivation as an annual crop is widespread from south to north, from subtropical regions to temperate latitudes well over 30° (D'EECKENBRUGGE AND LACAPE, 2014). In this sense, the genotype × environment interaction (G × E) plays an essential role in genotypic expression and must be considered in the evaluation and selection of superior genotypes for cotton cultivation (MALOSETTI et al., 2013;VAN EEUWIJK et al., 2016;LI et al., 2017). In plant breeding, the G × E interaction refers to the differential performance of genotypes across environments (RESENDE, 2015). Differential genetic expression among various environments cause this variation in genotype ( VAN EEUWIJK, et al. 2016;LI, et al. 2017). Therefore, to ¹Departamento de Biologia Geral, Universidade Federal de Viçosa (UFV), Viçosa,MG, obtain elite cultivars that are adapted to cultivation regions, it is essential to evaluate genotypes in multi-environment trials (MET) (SMITH et al., 2005), from which obtained data can be analyzed for yield, adaptability, and stability. In this sense, statistical methods have been proposed over the last few decades to deal with G × E interaction ( VAN EEUWIJK, et al. 2016).
Currently, the estimation of variance components through linear mixed models by the restricted maximum likelihood (REML) method and the prediction of genotypic values by the best linear unbiased prediction (BLUP) of random effects are the standard procedures employed for genetic evaluation of G × E interaction in plant breeding (SMITH et al., 2005;RESENDE et al., 2014). In the context of the mixed models, the non-genetic effects, such as the residual effect, can be modelled by the R matrix of the residual (co)variances (DE FAVERI et al., 2015;PÁDUA et al., 2016;MELO et al., 2020). In the case of annual plants analyzed in a MET network, there are possibilities of different residual fit to the data.
To overcome the G × E interaction, the harmonic mean of the relative performance of genetic values (HMRPGV) has been proposed, in the context of linear mixed models. Thus, the genetic gain is simultaneously computed based on yield, stability, and adaptability. In addition to the simultaneous selection using the three defined criteria, the HMRPGV method also deals with unbalanced data, heterogeneity of variances, elimination of G × E interaction variation, consideration of the heritability of these effects, and correlated errors within locations. It generates genetic values discounted (penalized) from instability and generates results at the same magnitude or scale as the resources evaluated (RESENDE et al., 2014). All these factors are highly relevant to breeding programs that deal with G × E interaction.
Obtaining reliable estimates of genetic parameters and simultaneous selection for stability, adaptability, and genotypic traits for recommendation of genotypes can provide innovative and useful results to a breeding program. In cotton breeding, the main objective is to select cultivars that provide high fiber yield and longer fiber length (FARIAS et al., 2016). Consequently, the focus of studies of upland cotton breeding programs is to search for cultivars with high adaptability and stability in these traits. However, only a small portion of these programs have used genetic parameters and simultaneous selection of adaptability and stability to recommend genotypes in MET. Given these conditions, the goals of this study were to (i) test different residual variance structures during the assessment of cotton genotypes, (ii) estimate genetic parameters and predict genotypic values of cotton using REML/BLUP methodology, and (iii) measure the selection gain of genotypes based on stability, adaptability, and yield.

Experimental data
Ten trials were performed during the 2013/2014 and 2014/2015 cropping seasons in the Midwest region, Brazil ( Table 1). The trials consisted of a randomized complete block design, with 12 cotton genotypes with four replicates each (G1 = TMG41 WS; G2 = TMG43 WS; G3 = IMA CV690; G4 = IMA5675 B2RF; G5 = IMA08 WS; G6 = NUOPAL; G7 = DP555 BGRR; G8 = DELTA OPAL; G9 = BRS 286; G10 = BRS 335; G11 = BRS368 RF; G12 = BRS369 RF). The experimental unit (plots) consisted of four 5.0 m rows, with 0.90 m between rows and 45 plants per row. All plots were harvested, being the dataset balanced for further analyses. In each plot, 20 bolls were collected at maturity to determine fiber length (FL, mm), fiber strength (FS, gf tex −1 ), and micronaire (MIC, µg.inch -1 ), using a high-volume instrument. Cotton seed yield was evaluated for the two central rows by mechanically harvesting 4 m of each line, scattering 0.5 m at the end of each plot (border), correcting for 13% moisture, and extrapolating to kg ha −1 . A sample of each plot was used to determine the percentage of fiber in each sample unit. Then, the fiber yield (FY) was estimated by the multiplication of cotton seed yield and fiber percentage.

Statistical analyses
Variance components and genetic parameters were estimated through REML (PATTERSON AND THOMPSON 1971) and the prediction of genotypic values was made using BLUP (HENDERSON, 1975). The statistical model was determined by the following equation: where y is the vector of phenotypic data; b is the vector of replication-environment combinations (assumed to be fixed factor), which comprises the effects of environment and replication within the environment and is added to the overall mean; g is the vector of genotype effects (assumed to be random) ( ), where is the genotypic variance); i is the vector of G × E interaction effects (random) ( ), where is the G × E interaction variance); and e is the vector of residuals (random) ( , where R represents a matrix of residual variances). Capital letters (X, Z, W) represent the incidence matrices for b, g, and i, respectively.
The predicted genotypic value (GV) was obtained by the following equation: , where g is the genotypic effect. The harmonic mean of the genotypic values (HMGV), to evaluate the genotype stability and yield, was obtained by the following equation: , where n is the number of environments where genotype i was evaluated and GV ij is the genotypic value of genotype i in environment j, expressed as a proportion of the environmental mean. The relative performance of the genotypic value (RPGV), which was used to evaluate genotype adaptability and yield, was obtained by the following equation: , where Mj is the mean yield in environment j. The HMRPGV was obtained to evaluate genotype adaptability, stability, and yield by using following the equation: . The recommendation of genotypes was based on the predicted genotypic values of each trait. Selection gains (SG) in percentage were obtained by the following equation: , where X s is the overall mean of the predicted genotypic values for the selected genotypes and X o is the overall mean for all genotypes.
To perform all genetic and statistical analyses the ASReml-R software (BUTLER et al., 2009) integrated into the R software (R DEVELOPMENT CORE TEAM, 2020) was used.  (TURNER et al., 2016). The AIC tend to be more asymptotically efficiently, whereas BIC is a consistent criterion (Neath and Cavanaugh 2012;Cavanaugh and Neath 2019). However, BIC is more conservative compared to AIC, and its use is preferential to that of AIC when there is a strong preference for models of lower dimensionality (KASS et al., 2014). In addition, its consistence is related with the probability next to the unit (1) to select a true model among all  (YANG, 2005), which reinforces it superiority in this case. Thus, the BIC was preferred for the determination of the best fit model for the cotton genotypes.

RESULTS AND DISCUSSION
The residual structure selection is an important step in statistical analyses of MET and it is often neglected in the statistical analyses for plant breeding (SMITH et al., 2005;ZHANG AND HU, 2019). In a study conducted with maize, accounting for the heterogeneous residual variance improve the selection gains in over to 60% (SO AND EDWARDS, 2011), which indicates that is useful assessing the residual test in MET for annual crops, such as cotton. The better models were, fitted with heterogeneous residual variance. Thus, for the variance components, genetic parameter estimation, and genotypic value prediction, the model selected by BIC for each trait was adopted.
LRT detected significant genotype and G × E interaction effects for all analyzed traits (P < 0.01) (Figure 1). The values of heritability were 0.38, 0.39, and 0.28 for the FL, FS, and MIC traits, respectively (Table 2), implying the components of variance were significantly different from zero. Considering FY (residual variance as heterogeneous), the values of heritability varied from 0.07 to 0.15. The heterogeneous residual structure enabled the estimation of heritability in each environment for FY (RESENDE et al., 2014). These values estimated for each environment were more representative and made this procedure the appropriate choice.
Additionally, the genetic correlation between genotypes across environments was high for FL, FS, and MIC indicating high similarity in the ordering of the genotypes across the environments. In contrast, FY presented a lower value for correlations across environments, indicating strong changes in genotype rank among environments (RESENDE, 2015), which implies the necessity of a more sophisticated model, such as mixed models, to analyze the data for a more accurate prediction. However, selective accuracy had moderate to high values for all traits analyzed, denoting high correlations between true genetic values and predicted genetic values for all traits evaluated (RESENDE AND DUARTE, 2007), demonstrating reliability of the predicted values.
The SG (Table 3) varied among the traits, with higher values for FY and lower values for FL. The gains with selection via GV, HMGV, RPGV, and HMRPGV were equal for the FL, FS, and MIC traits. For the FY trait, the RPGV and HMRPGV presented equal gains and were like that of GV, which had the highest gain with selection. The selected genotypes and their ranking were similar among all methods for FL, FS, and MIC traits. Contrarily, FY exhibited differences in selection and ranking of genotypes when compared among the four methods (Table 4). Selection by GV, HMGV, RPGV, and HMRPGV assumed of the estimated genetic values being free of environmental interaction. Then, selected genotypes could be recommended for all environments evaluated The HMGV selection strategy considers stability and penalizes instability (RESENDE et al., 2014). The lower the standard deviation of genotypic value performance across locations, the greater the HMGV. Consequently, a genotype that presents stability is necessarily associated with the highest yield and the lowest sensitivity to environmental variation (LI et al., 2017). In other words, it is the most appropriate selection strategy for unfavorable environments, because genotypes with high stability are desirable for this type of environment. The selection strategy of RPGV is the most suitable for favorable environments because the selected genotypes have greater responsiveness to improvement of the environment. Because Table 2 -Estimates of variance components and genetic parameters for the traits: FY = fiber yield (kg/ha); FL = fiber length (mm); FS = fiber strength (gf tex −1 ); and, MIC = micronaire (µg.inch -1 ), evaluated in 12 cotton genotypes in ten environments (trials).     (FARIAS et al., 2016), which turns it into a hard trait for which to obtain large SG in the harvests. A negative selection gain for MIC was observed; however, it is a feature for which it is desirable to have smaller measurements. For FY, selection via GV produced greater SG for five of the selected genotypes; three were among the five genotypes classified by the HMGV method. However, selection via GV is not appropriate because this method does not select based on adaptability and genotypic stability. The RPGV and HMRPGV methodologies ranked the same top five genotypes for FY, and four of the five genotypes classified by the GV strategy coincided. Selection via HMGV showed smaller gains in FY, as expected because the genotypes here studied are considered Table 4 -Genotype ranking (R) based on GV, HMGV, RPGV, and HMRPGV methods for the traits: fiber yield (FY), fiber length (FL), fiber strength (FS), and micronaire (MIC), evaluated in 12 cotton genotypes in ten environments (trials).  GV  HMGV  RPGV  HMRPGV  1  1  1  1  1  3  3  3  3  2  8  8  8  8  12  12  12  12  3  2  2  2  2  1  1  1  1  4  3  3  3  3  8  8  8  8  5  6  6  6  6  2  2  2  2  6 12 12 12 12 5 9 5 5 7 5 5 5 5 7 5 9 9 8 9 9 9 9 9 7 7 7 9 10 10 10 10 11 11 11 11 10 11 11 11 11 6 6 6 6 11 4 4 4 4 10 10 10 10 12 7 7 7 7 4 4 4 4 GV: predicted genotypic value; HMGV: harmonic mean of the genotypic values; RPGV: relative performance of the genotypic value; and HMRPGV: harmonic mean of relative performance of the genotypic value. stable, i.e. they performance did not change across environments. By the end, only genotypes 3, 11, and 12 were selected by all four strategies.

CONCLUSION
The use of BIC as an information criterion for model selection based on different residual structure resulted in a more accurate estimation of genetic parameters in cotton breeding, which is sometimes neglected in statistical analyses of genetic data, especially in MET. The variance components and genetic parameters were efficiently estimated through REML/BLUP. The HMRPGV method has great potential in the selection of cotton cultivars and should be used in future studies in cotton breeding, as well as with other crops. The use of HMRPGV allows for optimal strategies in the simultaneous selection of genotypes for stability, adaptability, and yield in breeding programs.

ACKNOWLEDGMENTS
We appreciate the financial support from the Brazilian Government offered by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), and the CMPC Company for the partnership. This study was financed in part by the CAPES -Finance Code 001.

DECLARATION OF CONFLICTS OF INTERESTS
We have no conflict of interest to declare.