GGE biplot-based genetic selection to guide interspecific crossing in Corymbia spp.

: Tree breeders are prioritizing the increase of the interspecific diversity in the breeding programs due to the pronounced environmental changes, and expansion of the forest frontier to less productive areas. In this scenario, there is a growing interest for species of the genus Corymbia due to their good growth, resistance to biotic and abiotic factors, and industrial properties. Also, Corymbia spp. can cross with each other, generating interspecific hybrids. However, there are great difficulties on defining the best crosses. Thus, the objectives of this study were to evaluate the genetic variability of 64 progenies of Corymbia maculata and 64 progenies of Corymbia torelliana and to use the GGE biplot method to point the most appropriated crosses to broad and specific environments in order to generate stable and productive interspecific hybrids of C. maculata and C. torelliana . The GGE biplot was an efficient method for evaluating the genetic variability between progenies, for visualizing the genotype by environment interaction pattern and for identifying the best crosses in order to generate interspecific hybrids of the species. By adopting a selection intensity of 10%, the recommended progenies to generate interspecific hybrids for the Aracruz municipality (ARA) were 10, 40, 49, 64, 20, 9 and 47 of C. maculata and 24, 56, 41, 64, 43, 31 and 12 of C. torelliana . For Três Lagoas municipality (TLA), the recommended progenies were 11, 17, 1, 48, 7, 63 and 21 of C. maculata and 42, 46, 61, 44, 10, 63 and 28 of C. torelliana .


INTRODUCTION
Forest plantations are a strong ally to preserve natural resources and provide social and economic benefits (Carle et al. 2020). Also, they are fundamental to meet the growing demand for forest products. By 2050, it is expected that the manufacture of forest products will have to triple (Payn et al. 2015) due to the growing on human population. In this context, there is an increase interest for species more resistant to adverse conditions, such as those belonging to the genus Corymbia (Lee 2007;Lee et al. 2009). Corymbia maculata and Corymbia torelliana present good industrial and silvicultural characteristics (Shepherd et al. 2007). However, there is little information about the selection strategies for these species and its hybrids

Experimental design and genetic material
In December 2014, two half-sibling progeny trials were implemented in the Aracruz municipality, Espírito Santo state (19.8188°S, 40.2736°W, hereafter named ARA), the first one of C. torelliana and the second of C. maculata. Both trials were replicated in the Três Lagoas municipality, Mato Grosso do Sul state (20.7882°S, 51.7030°W, hereafter named TLA), totaling four trials. The ARA altitude is 60 m a.s.l. The soil is classified as medium-textured yellow latosol, flat relief and strongly drained (Embrapa 2000). According to Köppen and Geiger classification, updated by Kottek et al. (2006), the ARA climate is classified as tropical Aw, with an average temperature of 24.4 °C and an average annual rainfall of 1157 mm. The TLA altitude is 319 m a.s.l. The soil is classified as very porous dark-red latosol, strongly drained. According to Köppen and Geiger classification, updated by Kottek et al. (2006), TLA climate is classified as tropical Aw, with an average temperature of 24.7 °C and 1340 mm of average annual rainfall (Climate-data.org 2021).
The experimental design was an augmented block design, with 64 progenies of Corymbia and six commercial checks of Eucalyptus, totaling 70 treatments. It was used 40 repetitions. Into each repetition, the treatments were randomized into seven blocks, being each block with 10 treatments. In this design, each treatment (tree) represents a plot, being named single tree plot (STP) (Sandon 1958). The progenies for both species were codified from 1 to 70, being the treatments 65, 66, 67, 68, 69 and 70, the checks. At 29 months old, the circumference at breast height (cm), at 1.30 m from ground level, were measured on all trees using a flexible and graduated tape. The diameter at breast height (DBH) was obtained from the π division of the circumference at breast height.

Individual analyses
The restricted maximum likelihood (REML) (Henderson 1975) and best linear unbiased prediction (BLUP) (Patterson and Thompson 1971) procedures were used to estimate the components of variance and to predict the genetic values, respectively. The best progenies in each experiment were selected based on their predicted genetic values by the model (Eq. 1): GGE biplot-based selection in Corymbia spp.

y = Xr + Za + Wb + e
(1) where: y is the phenotypic data vector; r is the vector of the repetition effects added to the general mean and the checks effects (fixed); a is the vector of the additive genetic effects (assumed to be random, a ~ NID (0, σ a 2 ), where σ a 2 is the additive genetic variance; b is the vector of the blocks effects (assumed to be random, b ~ NID (0, σ b 2 ), where σ b 2 is the block variance; and e is the vector of residuals effects (random), (e ~ NID (0, σ e 2 ), where σ e 2 is the residual variance). The capital letters (X, Z and W) represent the incidence matrices for r, a, and b effects, respectively.

Joint analysis
The best progenies of each species were selected based on their predicted genetic values by the model (Eq. 2) where: y is the phenotypic data vector; r is the vector of replication-environment combinations (assumed to be fixed), which comprises the effects of environment and replication within environment, added to the overall mean and the checks effects (fixed); a is the vector of the individual additive genetic effects (assumed to be random, a ~ NID (0, σ a 2 ); b is the vector of block effects (assumed to be random, b ~NID (0, σ b 2 ); i is the vector of GEI effects (assumed to be random, i ~NID (0, σ i 2 ), where σ i 2 is the GEI variance; and e is the vector of residuals effects (random) -(e ~ NID (0, σ e 2 ). The capital letters (X, Z and W) represent the incidence matrices for r, a, and b effects, respectively.

Likelihood ratio test
The significance of the additive genetic effects and GEI effects were tested using the likelihood ratio test (LRT) (Wilks 1938) (Eq. 3): where: LogL F is the logarithm of the restricted likelihood function of the full model, and LogL R is the logarithm of the restricted likelihood function of the reduced model, which is the one without the individual additive genetic effects or without the GEI effect. The models were compared at the level of 1% probability and one degree of freedom by the chi-square test. ) were obtained, respectively, by Eqs. 4-8. The coefficient of genetic variation (CV g ) and the coefficient of experimental variation (CV e ) were obtained, respectively, by Eqs. 9 and 10:

Genetic parameters
where μ is the overall mean.
The standard errors of the estimates of variance components and of the genetic parameters were obtained through mixed model output and post-processing "pin" function of the ASReml software (Butler et al. 2017).
Accuracy was estimated by Eq. 11 (Resende et al. 2014): where PEV is the prediction error variance, extracted from the diagonal of the generalized inverse of the coefficient matrix of the mixed model equation.

GGE biplot
The biplots were created by plotting the first principal component scores of the genotypes and the environments against their respective scores for the second principal component. These scores were obtained by singular value decomposition (SVD) of environment-centered data using Eq. 12, proposed by Yan et al. (2000).
where: y ij is the additive genetic values for genotype i at environment j; is the overall additive genetic mean of all families in the environment j; λ 1 and λ 2 are the singular values of the first and second principal components (PC1 and PC2, respectively); ξ 1i and ξ 2i are the values of eigenvectors for genotype i , for PC1 and PC2, respectively; η 1j and η 2j are the values of eigenvectors for environment j, for PC1 and PC2, respectively; and e ij is the error associated with the y ij observations. The biplot was based on singular value decomposition of the standardized table. Scaled by standard deviation = 1, centering = 2 and row metric preserving (SVP = 1) was used to generate the biplots (Frutos et al. 2014).

Progenies selection
The joint selection for both species was made based on the performance and stability of the families on both environments, which could be accessed on the "mean versus stability" biplot. Two likely approaches were considered, the first one would be cloning the more stable and high mean genotypes for planting as pure species. The second would be the generation of interspecific hybrids where two strategies could be adopted: choosing both stable and good average genotypes to maximize the expression of favorable genes to both environments, or crossing a stable genotype with high performance with a high mean genotype (without considering stability). Both approaches can favor the hybrid vigor expression due to allelic interaction.
The gain with selection (GS( %)) for the progenies, on the joint analysis were estimated by Eq. 13: where: X si is the mean of additive values for the selected progenies for DBH; X oi is the mean of additive values for the original population for DBH.

Likelihood ratio test
The LRT tested the null hypothesis, which assumes that there is no difference between the complete and reduced models by removing the additive genetic effect (individual and joint analyses) and the GEI effect (joint analysis). All effects were significant (p < 0.01) on both models and for both species, except the block effects (Table 1).

Genetic parameters
The additive genetic and phenotypic variance were greater for C. maculata ( Table 2). The narrow-sense heritability between progenies was 0.37 and 0.34 for C. maculata and C. torelliana, respectively. The coefficient of genetic variation (CV g ) and the coefficient of residual variation (CV e ) and the mean were greater for C. maculata. The genotypic correlation among environments was greater for C. torelliana. The selective accuracy ( " #" ) was 0.77 and 0.79 for C. maculata and C. torelliana, respectively (Table 2). Table 2. Variance components and genetic and non-genetic parameters for the joint analysis of C. maculata and C. torelliana progenies for diameter at breast height on two environments.

Components/Parameters
C. maculata C. torelliana μ is the genotypes by environments interaction variance; μ is the narrow-sense heritability between progenies; μ is the coefficient of determination of genotypes by environments interactions; μ is the genotypic correlation between performance in different environments; CV g (%) is the coefficient of genetic variation in percentage, CV e (%) is the coefficient of residual variation in percentage, μ is the selective accuracy, and μ is the overall mean.

GGE biplot and genetic selection
For C. maculata and C. torelliana, the PC1 and the PC2 explained a total of 100% of genotype and GEI variation (Fig. 1). For both, C. maculata and C. torelliana, the "which-won-where" biplot divided the graph into ten and six distinct environments, respectively (Fig. 1). For both species, the studied environments (ARA and TLA) were in different sectors, meaning that they cannot be clustered into a single environment. For each polygon, the genotypes in each vertex represents those with best performance in each sector. For C. maculata, the progeny 10 was the one that had better performance in ARA and the progeny 42 was the one that had best performance on TLA (Fig. 1). For C. torelliana, no families have stood out on ARA; however, family 18 was the one that had best performance on TLA (Fig. 1).
The "mean versus stability" graph allows the visualization of the mean and stability of genotypes in all environments (Fig. 2). An average environment is calculated from the mean of PC1 and PC2 scores for all environments and is represented by a red point. The average environment abscissa (AEA) is represented by a single head arrow that passes through the biplot origin and the average environment and is pointing towards higher yield. The approximated mean of each evaluated genotype can be estimated by projecting the genotypes onto the average environment abscissa (Yan 2002). The narrow is pointed to the higher mean performance; therefore, it is possible to rank the best genotypes. For C. maculata, the five higher mean progenies were 42, 14, 38, 16, and 21 (Fig. 2). For C. torelliana, the progenies 18, 55, 13, 16, and 2 presented the higher means (Fig. 2). The average environment ordinate (AEO) is represented as a doubleheaded arrow perpendicular to AEC abscissa. It is pointed towards lower stability in both directions and associates the GEI with each genotype. Thus, greater projection onto the AEC ordinate, means greater instability of the genotypes, regardless of the direction. Therefore, for C. maculata, the five more stable genotypes were 38, 2, 61, 34, 54 and 25 (Fig.  2). For C. torelliana, the five more stable genotypes were 16, 4, 26, 48 and 14, once they are on the top of the average environment abscissa (Fig. 2). By adopting a selection intensity of 10%, it is possible to indicate the potential genotypes to be selected. In case the objective is cloning the best individuals for planting as pure species in both environments, it is recommended to select genotypes that simultaneously have high means and good stability, once the desire is to maximize genetic gains (Borém et al. 2013). In this scenario, for C. maculata, the progenies 42, 38, 16, 14, 37, 15 and 12 and, for C. torelliana, the progenies 18, 55, 13, 16, 23, 17 and 59 could be selected (Fig. 2). However, if the objective is to generate interspecific hybrids it is recommended to cross more genetically distant progenies in order to expand the possibility of high-performance individuals in the population due to heterosis (Borém et al. 2013). Therefore, in the ARA environment, the progenies 10, 40, 49, 64, 20, 9 and 47 of C. maculata can be crossed with the progenies 24, 56, 41, 64, 43, 31 and 12 of C. torelliana (Fig. 2). On the other hand, in the TLA environment, the progenies 11, 17, 1, 48, 7, 63 and 21 of C. maculata can be crossed with the progenies 42, 46, 61, 44, 10, 63 and 28 of C. torelliana (Fig. 2). The biplot indicated 32 and 33 progenies with genetic values above average for C. maculata and C. torelliana, respectively (Fig. 2). The gain with selection (%) was 14.33% for C. maculata and 6.63% for C. torelliana when the best 20 families were selected, which is recommended by Resende (2002) to start a breeding program. The representativeness of a given environment was measured by the angle between the vector of a given environment and the AEC axis (the line that passes through the biplot origin and the average environment). The average environment is represented by a star (Fig. 3). Hence, both ARA and TLA were not representative. The "ideal" environment is represented by a point on the AEC axis, with an arrow point to it. In theory, it is the most discriminating and representative environment. It can be seen that ARA and TLA are not far from the "ideal" environment concentric line, meaning that they are discriminative (Fig. 3). Note. The star and the red point indicate the average and "ideal" environment, respectively.

Likelihood ratio test
The significance of the additive genetic effects (p < 0.01) indicates that there is genetic variability between the progenies for both species, allowing the genetic selection (Zimback et al. 2011). Moreover, there is significant GEI, which can difficult the selection of the same genotypes for both environments. One strategy to deal with the GEI effect is to choose genotypes with high adaptability and stability for the environments analyzed.

Genetic parameters
The narrow-sense heritability between progenies ( ℎ "# $ ) values for DBH in half-sibling progenies of Corymbia was lower than the ones reported by Araujo et al. (2020), which varied from 0.41 to 0.69. According to Lima et al. (2019), the traditional mixed model methodology commonly overestimate genetic parameters estimates and consequently narrowsense heritability for growth traits in comparison to methodologies that consider the genomic relationship matrix. Despite this fact, the heritability values obtained by the present study was similar to those observed by genomic studies for halfsibling families, which varied from 0.29 to 0.48 (Tambarussi et al. 2018).
The coefficient of genetic variation allowed to make inferences about the magnitude of the variability present in the populations, once it is a measure of the fraction of phenotypic variation that is genetic, normalized by the mean (Downie et al., 2020). The lowest coefficient of genetic variation was obtained for the C. torelliana (9.82 %). According to Resende (2002) coefficients of genetic variation of around 10% or more are sufficient for practicing effective selection. The CVe (%) was classified as high magnitude for C. maculata and very high for C. torelliana (Mora et al. 2016), which is common in progeny trials of forest species (Bertii et al. 2011;Costa et al. 2015). The selective accuracy is a measure of the proximity between the predicted and true genetic values. The greater the accuracy, the greater is the reliability on the selection, because the deviations between the parametric and predicted genetic values are smaller. The selective accuracy for DBH is greater than 0.7, being classified as high magnitude (Resende and Alves 2020). Thus, there is a favorable scenario for the genetic selection for both species.

GGE biplot and selection
Commonly, high productive genotypes are preferred by breeders (Bornhofen et al. 2017). In the selection of parents for hybridization it is not different (Assis 2000). The GGE biplot methodology selects based on the progeny pattern that considers mean and stability (Santos et al. 2016). It extracts information about GEI and crossover interaction in multi environment trials and display it on a simple, elegant and visual way. Thus, GGE biplot would be considered a suitable methodology to study GEI.
Understanding GEI is crucial for the progress of any breeding program (van Eeuwijk et al., 2016). The "which-wonwhere" patterns of an MET (Fig. 1) allow the identification of more responsive families for each environment, which are the ones located farthest from the origin (Yan et al. 2007). The genotypes close to the origin are less responsive in their respective directions (Yan and Kang 2002). The progenies that are not placed into any environment are considered unfavorable to the tested environments (Karimizadeh et al. 2013). The biplot is divided into sectors by perpendicular lines. Each sector has a vertex genotype, which mean that it presents the highest mean for that particular environment (Yan and Kang 2002). Additionally, the large number of mega environments formed indicates the genetic potential of these families to be used in other environments not yet tested.
The first two principal components account for 100% of the total variation for both species, indicating that the GGE biplot methodology was efficient on explaining the variation of genotypes and the GEI effects (Yan and Holland 2010). In addition, the right angle formed between the two environments on the GGE biplot suggests moderate correlation between the environments (Yan and Holland 2010), indicating the impossibility of forming mega-environments containing ARA and TLA. Therefore, they belong to different agronomical areas, which can be explained due to the different geographic coordinates, altitudes and edaphoclimatic conditions among the two environments. Additionally, they have significant GEI, which explain the changes in the ranking of genotypes from one environment to another (Alwala et al. 2010).
Environment ARA and TLA were not the best environments for selecting superior cultivars, once they are not representative, what explains why most of the genotypes were not allocated in these environments (Fig. 1). However, according to Yan and Kang (2002), they are discriminating environments, being useful for selection stable cultivars, which was an important task for this work. For future works, it is recommended to use a larger number of environments to increase the representativeness.
This work aimed to select progenies to be planted on both environments, therefore it seeks to avoid the GEI by selecting based on mean and stability (Yan and Kang 2002). Stability is often overlooked by plant breeders due to the difficulty to weigh between mean performance and stability. However, some investigation has been made considering it on different cultures and using different methods, such as HMRPGV (Ferreira et al. 2021), AMMI (Malosetti et al. 2013) and GGE biplot (Xu et al. 2014). The advantage of the GGE biplot is the graphical representation of the genotype and GEI patterns on multi environmental trials, considering stability. Despite that, no work has been found linking GGE biplot analysis and Corymbia progenies in MET.
Therefore, in possession of the selected progenies, an interesting alternative for the future would be starting an interpopulation recurrent selection (Comstock et al. 1949) aiming at the improvement of both species and the increase of heterosis between them. In this way, it will be possible to obtain, assess and recombine the best individuals, improving F.M. Ferreira et al. crosses between C. torelliana and C. maculata populations (Chen et al. 2013). In these crosses, C. torelliana is a better fit to be the female genitor, since this specie has a 30% higher rooting capacity compared to many other species of the genus (Reis et al. 2014). Therefore, exploiting this advantage can increase the chance of crossbreeding and hybrid propagation.

CONCLUSION
There is genetic variability for diameter at breast height for C. maculata and C. torelliana suggesting a positive scenario for breeding.
The GGE biplot analysis discriminate the more stable and high productive progenies of each specie in both environments. The patterns provided by the biplot allowed to point the best progenies to clone selection and the best progenies to be crossed in order to generate interspecific hybrids of C. maculata and C. torelliana.