Soybean productivity, stability, and adaptability through mixed model methodology

The genotype × environment (G×E) interaction plays an essential role in phenotypic expression and can lead to difficulties in genetic selection. Thus, the present study aimed to estimate genetic parameters and to compare different selection strategies in the context of mixed models for soybean breeding. For this, data referring to the evaluation of 30 genotypes in 10 environments, regarding the grain yield trait, were used. The variance components were estimated through restricted maximum likelihood (REML) and genotypic values were predicted through best linear unbiased prediction (BLUP). Significant effects of genotypes and G×E interaction were detected by the likelihood ratio test (LRT). Low genotypic correlation was obtained across environments, indicating complex G×E interaction. The selective accuracy was very high, indicating high reliability. Our results showed that the most productive soybean genotypes have high adaptability and stability.


INTRODUCTION
Soybean [Glycine max (L.) Merrill] is the fourth most widely grown crop in the world and it is a source of oil, protein, and raw material for biodiesel production (SILVA et al., 2017). Currently, Brazil consolidates as the mainly world soybean producer and exporter (USDA, 2020), even though other countries, such as EUA, China, and India, are representative in soybean production. Actually, the soybean is spread cultivated from low to high latitudes (LIU et al., 2017). In this scenario, the genotype × environment (G×E) interaction plays an essential role in phenotypic expression and can lead to difficulties in genetic selection.
The G×E interaction is characterized by the differential response of genotypes to a change in environment (ALVES et al., 2020). The environment influences on the phenotypic expression, in order to provide the G×E interaction, is one of the biggest challenges that breeders deal with in plant breeding. As a result, a specific environment can affect the genetic the selection; however, in practice, the genetic material generally is recommended to several environments.
The mixed models is the main method for evaluating G×E interaction (LI et al., 2017)the Evangelista et al. environment, and the differential sensitivity of certain genotypes to different environments, also known as genotype by environment (G × E. This method has features to model the heterogeneity of genetic variances and correlations between environments, as well as to model spatial trends in individual trials (BERNARDO, 2020)summarized the alternatives available at that time and noted that all of these approaches could be classified as multiplicative models. Recently, mixed model approaches have become popular for the analysis of series of variety trials. There are numerous reasons for their use, including the ease with which incomplete data (not all varieties in all trials. The ability of genotypes to respond differently over a wide range of environmental conditions may be an essential factor in a breeding program. In this context, the study of stability and adaptability becomes relevant. A genotype is considered stable when it presents slight variations in the general deviation when evaluated under various environmental conditions, and is considered adapted when it has responsiveness to environmental improvement (RESENDE, 2004).
In the context of mixed models, simultaneous selection for productivity, stability and adaptability can be performed by harmonic mean of relative performance of genetic values method, HMRPGV (RESENDE, 2004). Currently, the HMRPGV method has been applied in several crops for breeding purpose (GONÇALVES et al., 2014;ALVES et al., 2018genetic selection is carried out based on several traits, which can be genetically correlated. In this case, selection bias may occur if these traits are analyzed individually. Thus, the present work aimed to evaluate the applicability and efficiency of multiple-trait best linear unbiased prediction (BLUP;SOUZA et al., 2020). This procedure allows the simultaneous selection via the three parameters mentioned and presents the following advantages: (i) it considers the genotypic effects as random and therefore provides genotypic and non-phenotypic adaptability and stability; (ii) it allows analyzes with unbalance data; (iii) it enables to use a non-orthogonal designs; (iv) it allows to deal with variance heterogeneity; (v) it allows to consider correlated errors within environments; (vi) it provides already discounted (penalized) genotypic values of the instability; (vii) it does not depend on the estimation and interpretation of other parameters; (viii) it eliminates the noise of the G×E interaction as it considers the heritability of these effects; (ix) it generates results on the magnitude or scale of the evaluated trait itself; and, (x) it allows to compute the genetic gain with the selection by the three attributes simultaneously (RESENDE, 2007).
Obtaining reliable estimates of genetic parameters based on their evaluation of phenotypic adaptability and stability are paramount in conducting soybean breeding programs. These estimates assist the selection process and serve as a theoretical reference in cultivars recommendation. Thus, the present study aimed to estimate genetic parameters and to compare different selection strategies in the context of mixed models for soybean breeding.

MATERIALS AND METHODS
Thirty improved lines soybean, belonging to the 6-7 relative maturity group, were evaluated at 10 municipalities, allocated in the soybean macroregion 2 (micro-regions 201, 202 and 204), in the 2013/2014 harvest. The experiments were conducted in a randomized block design with three replications. Each plot consisted of four lines of 5 m, spaced 0.5 m between lines and between plots. At maturity, the two central lines were harvested, totalizing an area of 5 m 2 . The trait evaluated was the grain yield (kg ha -1 ), correcting to 13% of moisture. Seed control and pest control management followed the technical recommendations for soybean cultivation at each site.
The variance components were estimated through restricted maximum likelihood, REML (PATTERSON & THOMPSON, 1971) and genotypic values were predicted through best linear unbiased prediction, BLUP (HENDERSON, 1975). The statistical model associated with the evaluation of genotypes in randomized complete block design in several sites, with one observation per plot, was given by the following equation: , where y is the vector of phenotypic data, f is the vector of replication-environment combinations (assumed to be fixed), which contemplates the effects of environment and replication within environment, added to the overall mean, g is the vector of the genetic effects (assumed to be random) ( ), where is the genotypic variance), i is the vector of G×E interaction effects (random) ( ), where is the G×E interaction variance), and e is the vector of residuals (random) ( ), where R represents a matrix of residual variances). Capital letters X, Z, and T represent the incidence matrices for f, g, and i, respectively.
Models with Identity variance (IDV) and diagonal (DIAG) residual error variance structures (RESENDE et al. 2014), as presented below, , and were tested by the Bayesian information criterion, BIC (SCHWARZ, 1978), given by: , where LogL is the logarithm of the restricted maximum likelihood function, p is the number of estimated parameters, n is the number of observations, and r(x) is the rank of the incidence matrix of fixed effects. The significance of the random effects of the statistical model was tested by the LRT (RAO, 1973) using chi-square statistics with one degree of freedom and a probability level of 1%.
Phenotypic variance ( ), heritability of the total genetic effects ( ), coefficient of determination of G×E interaction effect ( ), the genotypic correlation across environments ( ), and selective accuracy ( ) , were obtained by the following expressions (RESENDE et al., 2014): and where is the residual variance and is the prediction error variance.
For the adaptability and stability analyses the following statistics were used: harmonic mean of the genetic values (HMGV), associated with the concept of stability; relative performance of the genetic values (RPGV), associated with the concept of adaptability; and HMRPGV, associated with the concepts of adaptability, stability, and productivity, simultaneously.
The predicted genetic value (GV ij ) which is the genetic value of genotype i at environment j, HMGV, RPGV, and HMRPGV, of each genotype, were obtained by the following expressions: and where g i is the genetic effect of the genotype i, n is the number of environments (n = 10), and m j is the mean productivity in environment j.
The direct selection were obtained by: m j + g i + ge ij , where m j is the mean productivity in environment j, is the effect of genotype i, and ge ij is the effect of genotype i × environment j interaction). Further, the indirect selection for all environments were measured by: (i) µ . + g i. , (where: µ . is the general productivity mean and g i. is the effect of genotype i); (ii) considering stability and productivity (HMGV), (iii) adaptability and productivity (RPGV); and, (iv) stability, adaptability and productivity simultaneously (HMRPGV). The selection gain (SG) and SG in percentage, from direct and indirect selection, were obtained by the following expressions: (RESENDE, 2015): where is the genotypic value, p is the number of selected genotypes, M s is the productivity mean of selected genotypes, and M 0 is the mean productivity of the evaluated population. The SG was calculated considering a selection intensity of 20% (six genotypes). The concordances between the selection strategies were calculated using the Kappa coefficient (K) (COHEN, 1960), given by: where: A is the number of matching selected genotypes, C is the number of selected genotypes due to chance (C = bD, where b is the selection intensity) and D is the number of selected genotypes (six).
The variance components, genetic and non-genetic parameters, and genotypic values obtained using the Selegen-REML/BLUP software (RESENDE, 2016). The model accounting for different residual variance structures were tested using ASReml software (GILMOUR et al., 2015).

RESULTS AND DISCUSSION
Traditional analyses of multi-environment trials, such as additive main effects and multiplicative interaction (AMMI) (GAUCH, 1992) and genotype + G×E biplot (GGE biplot) (YAN et al., 2000) environment (E, are based on the analysis of variance Evangelista et al. (ANOVA), thus assuming fixed genotype effect and IDV residual variance (homoscedasticity) (LI et al., 2017). In the context of mixed model methodology (REML/BLUP), besides the effect of genotypes being advantageously treated as random, the needed of homoscedasticity can be circumvent, that is, REML/ BLUP deal with DIAG residues (SMITH et al., 2005). This methodology allows; therefore, to model the G×E interaction, using for this purpose, different (co) variances structures. Thus, it is important to consider the structure of residual variance (for example, identity variance and diagonal) to select the model that provides the best fit (SMITH et al., 2005).
In this sense, several criteria for model selection were proposed, especially BIC (SCHWARZ, 1978). With this criterion, is implicit that there is a model that describes the relationship between the variables involved and the criterion tries to maximize the probability of choosing the true model. Thus, the BIC indicates the selection of parsimonious models, that is, the model that involves the fewest possible parameters to be estimated and that explains the behavior of the response variable well (RESENDE et al., 2014)the Akaike (AIC. In this study, the BIC values for grain yield trait were 11606.40 and 11620.32 for models with IDV and DIAG residual variance, respectively. Therefore, the best model was the one with IDV residual variance (lowest BIC). Thus, this model was adopted to estimate variance components and to genotypic values. Simply put, this result means that the residuals having the same scatter.
The variance components have a great importance in plant breeding, since the population and breeding strategy to be used depend on information that can be obtained from these components (RESENDE, 2015). According to the LRT, the random effects of the model (genetic and G×E interaction effects) were significant (p<0.01) for the grain yield trait (Table 1). The significance of the estimation of genetic and G×E interaction effects via LRT demonstrates the existence of genetic variability and presence of G×E interaction. Similar results are reported for the grain yield trait in soybean crop (SHAW et al., 2016, WHALEY & ESKANDARI, 2019. The determination coefficients indicate the amount of each effects where explained in relation to phenotypic variance. For instance, genotypic and G×E interaction effects explained, respectively, 18% (heritability) and 22% of phenotypic variance (Table 1). According to the classification presented by RESENDE (2015), the heritability of individual plots in the broad sense, that is, the total genetic effect (0.18) was of moderate magnitude. The weak genotypic correlation across environments (0.45) indicates the presence of complex type G×E interaction, which is problematic for the breeder. Thus, efficient selection methods are needed to capitalize on the favorable effects or circumvent the adverse effects of the G×E interaction (RESENDE et al., 2014). The higher the accuracy in evaluating a genotype, the greater the reliability of its predicted genotypic value. According to RESENDE & DUARTE (2007), the accuracy obtained in this study (0.90) is classified as very high, therefore, this result indicated a scenario very favorable to selection and recommendation.
As observed and expected, direct selection (u j + g i + ge ij ) ( Table 2) has always led to greater gains with selection. However, when selecting via u j + g i + ge ij (where G×E interaction is captured) recommendations are restricted to each environment (GONÇALVES et al., 2014). The selection via HMGV, RPGV, and HMRPGV led to equal gains among them and similar those obtained via u . + g i. (Table 2). In this way, the results showed that most productive soybean genotypes have high adaptability and stability. The HMGV classifies genotypes by genotypic values and stability, the lower the standard deviation of genotypic performance between environments, the higher the HMGV value. Thus, selection based on this statistic implies simultaneous selection for productivity and stability (RESENDE & DUARTE 2007). Therefore, indirect selection via HMGV is the most suitable strategy for known unfavorable environments, since for this type of environment genotypes with high stability are desirable. The RPGV classifies genotypes by genetic values expressed as a ratio of the overall mean of each environment (RESENDE, 2007). Thus, selection based on this RPGV implies simultaneous selection for productivity and adaptability (RESENDE, 2007). RPGV selection is the most appropriate strategy for known favorable environments because the selected genotypes have greater responsiveness with the improvement of the environment. Finally, HMRPGV combines HMGV and RPGV statistics and implies simultaneous selection for productivity, stability and adaptability; and therefore encompasses favorable and unfavorable environments. According to the Kappa coefficient, the concordances between the selection strategies ranged from -0.04 to 1.00 ( Figure  1). The HMGV, RPGV, and HMRPGV methods showed perfect agreement with each other. Moderate to near perfect agreement in ranking genotype was found among the environments 1, 2, 3, 5, 7, and 9 with the HMGV, RPGV, and HMRPGV methods. For environment 10, insignificant concordances, except with the environment 8, were observed.
From the six genotypes selected via HMGV, RPGV, and HMRPGV five were among the six genotypes selected via u . +g i. , so these methods showed high agreement. This result demonstrated the efficiency of the HMGV, RPGV, HMRPGV, and u . + g i. methods in the selection of soybean genotypes. GONÇALVES et al. (2014), evaluating sugarcane clones, found that four of the five clones selected via u . + g i. were the same selected by the HMGV, RPGV, and HMRPGV methods and, by the three methods, the same clones were selected. COLOMBARI-FILHO et al. (2013), evaluating elite rice genotypes, reported 84% of coincidence in the selection by the u . + g i. and HMRPGV methods, and verified that the elite genotypes selected via HMRPGV were the same ones selected via RPGV. These authors concluded that the HMRPGV method is an appropriate tool for the selection of stable, adapted and high yield potential rice genotypes. COLOMBARI-FILHO et al. (2013), evaluating genetic progress, showed a significant increase in stability and adaptability of rice genotypes developed during the period from 1996 to 2010. According to those authors, this fact is due to the introduction of new genotypes in the germplasm bank of the Embrapa breeding program, provided by the USA and France, in 1996. This allowed new crosses; and consequently, with the selection over the years it provided more adapted and stable genotypes. The company GDM seeds, holder of the soybean genotypes under evaluation in this study, originated in 1982, in Argentina. Today, the company also manages soybean breeding programs in several countries in South America, US and Canada. With its growth, there were introductions of new genotypes to the germplasm bank and high selection intensities, making it possible to obtain genotypes with high adaptability, stability and productivity, simultaneously.

CONCLUSION
The genetic parameters are easily estimated by mixed model methodology, being the very high accuracy values an indicative of model reliability in the genotypes ranking. The most prominent strategy for selection accounting for soybean productivity, stability, and adaptability is the HMRPGV.