Multi-trait genomic selection indexes applied to identification of superior genotypes

Most studies on genomic selection in plant breeding compare different statistical methods of univariate approach. However, multi-trait methodologies should be considered since they allow the simultaneous selection of superior genotypes in several economic traits. Here, the aims were to compare the selection accuracy and efficiency of the multivariate partial least square (MPLS) method compared with random regression best linear unbiased predictor (rrBLUP), Bayesian Lasso (Blasso) and univariate partial least square (UPLS) and to develop genomic selection indexes efficient for superior genotypes identification in plant breeding. Ten F2 populations with 800 individuals were simulated, considering four traits with different heritabilities. Genomic selection analyses using rrBLUP, Blasso, UPLS, and MPLS were conducted. Four genomic selection indexes were elaborated by the sum of the marker effects obtained for each trait, weighted by the respective residual variance. Multi-trait indexes were developed based on the assumptions of each methodology mentioned (rrBLUP, Blasso, UPLS, and MPLS), and were denominated I-rrBLUP, I-Blasso, I-UPLS, and I-MPLS. Processing time, selective accuracy, selection gains, and selection coincidence were used to compare the methods and the selection indexes proposed. The MPLS method had similar results compared to UPLS method for the low heritability traits and was less efficient than the rrBLUP and Blasso. The genome selection indexes provided the highest total genetic gains. The I-rrBLUP and I-MPLS indexes stood out for high efficiency in selecting superior genotypes in the shortest processing time. Results suggest that the genomic selection indexes proposed in this study may be promising for plant breeding programs.


INTRODUCTION
Genomic selection (Meuwissen et al. 2001) is based on the prediction of the genomic estimated breeding value (GEBV) from the relationship between the individuals' phenotype and thousands of molecular markers widely distributed in the genome. This high density of markers increases the probability of each quantitative trait loci (QTL) of the target trait to be in linkage disequilibrium with at least one marker. It allows for better selection accuracy and early direct selection without limiting the number of QTL that control the expression of a quantitative trait (Bernardo and Yu 2007;Meuwissen et al. 2001;Xu et al. 2017).
Genomic selection has been currently described by several authors and has provided promising results in several areas of genetics and plant breeding Persa et al. 2020;Silva et al. 2017). Many statistical methods can be applied to genomic selection (Crossa et al. 2017), and the choice and efficiency of application of these methods depend mainly on the genetic architecture of the quantitative trait under study (Daetwyler et al. 2013;de los Campos et al. 2013).
The multi-trait selection is relevant, hence superior varieties combine optimal attributes for several traits simultaneously. In addition, it can impact positively decreasing the cycle time in plant breeding programs, especially using genomic selection https://doi.org/10.1590/1678-4499.20200381 PLANT BREEDING Article (Covarrubias-Pazaran et al. 2018;Watson et al. 2019). However, combining multi-traits in an efficient structure of selection indexes is often a complex and difficult task, once there is reduction in selection gains and correlation between traits as the number of traits assessed increases (Cruz et al. 2012). Moreover, univariate approaches are most used in genomic selection (Sant' Anna et al. 2019). Thus, studies proposing and testing the efficiency of multi-trait genomic selection in plant breeding programs should be developed. The regression method via partial least square (PLS) can be an efficient alternative for allowing the multi-trait selection (Azevedo et al. 2013). The general purpose of PLS is the formation of components that capture the highest amount of information possible from the explanatory variables to predict the principal dependent variable (Boulesteix and Strimmer 2007). Moreover, this method is efficient to solve the problems of multicollinearity and high dimensionality caused by the high number of markers in relation to the number of individuals genotyped (Azevedo et al. 2013).
Breeders and biometricians have advocated the use of selection indexes. This strategy consists in establishing a new trait from a linear combination of all the target traits, considering that the weight coefficients are estimated to maximize the correlation between the index and the true genetic values of the genotypes to be selected. Several studies have addressed the application of selection indexes to plant breeding, leading to gains in a set of traits simultaneously (França et al. 2016;Junqueira et al. 2016;Kumar et al. 2016;Silva et al. 2016). Thus, the interest for studying the methodology efficiency in genomic selection has increased (Ceron-Rojas et al. 2015;Covarrubias-Pazaran et al. 2018;Fernandes et al. 2018).
Therefore, this study aimed to: (i) compare the selection accuracy and efficiency of the multivariate partial least square (MPLS) method with univariate genomic selection methods as random regression best linear unbiased predictor (rrBLUP), Bayesian Lasso (Blasso), and univariate partial least square (UPLS); and, (ii) develop genomic selection indexes efficient in the identification of superior genotypes in different traits, simultaneously, to be applied in plant breeding programs.

Simulation of phenotypic and genotypic data
The phenotypic and genotypic data used in the application of genomic selection statistical methods were simulated using the Genes software (Cruz 2013). First, the genome was simulated considering ten linkage groups, similar to a diploid species 2n = 2x = 20. Each linkage group was simulated with 100 cM, comprising 100 codominant molecular markers, spaced equidistantly (1 cM), totaling 1000 markers. The number of markers used in the simulation is considered adequate for the study purpose and is similar with other simulations studies (Oliveira et al. 2021;Peixoto et al. 2016, Sant' Anna et al. 2019. Afterward, contrasting homozygous parents were simulated. For each locus, one parent was designated as A 1 A 1 homozygote, and the other was designated as A 2 A 2 contrasting homozygote. Thus, the cross between parent 1 and parent 2 generated the F 1 population, with all the markers being heterozygous (A 1 A 2 ).
Using the genome and the parents obtained from the selfing of individuals of the F 1 population, ten F 2 mapping populations were simulated, each one containing 800 individuals. Each individual from the F 1 population produced 5,000 gametes, and the random combination of two of these gametes generated the F 2 population. This procedure was repeated until forming all the individuals in each population. The simulated F 2 populations were coded with 0, 1, and 2, where 0 and 2 corresponded to homozygous individuals (A 1 A 1 or A 2 A 2 ) and 1 referred to heterozygote individuals (A 1 A 2 ), for a given locus.
Then, the phenotypes, i.e., the quantitative traits, were simulated considering the binomial distribution, the additive gene action, and the absence of dominance among the alleles (additive model). Four quantitative traits, controlled by 200 loci each, were simulated. Traits were simulated with heritability values of 20, 40, 60, and 80 (hereafter C1, C2, C3, and C4, respectively). Two alleles per locus without the presence of QTL with major effects were considered. Thus, the effect of each QTL was defined by A 1 A 1 = μ + a; A 1 A 2 = μ; A 2 A 2 = μ -a, in which a expresses the additive effect of each gene in the F 2 population. Therefore, the phenotypes of the individuals (Y i ) were generated according to the model in Eq. 1: where μ is the overall mean of the trait; α j is the genetic effect in each locus; and ε i is the environmental effect.

Cross-validation
For all genomic selection methodologies evaluated, cross-validation was performed considering five folds with 50 replications. Thus, each F 2 population with 800 individuals was divided into five equal groups containing 160 individuals each and used as a validation population. The training populations were composed of 640 individuals and used to estimate the marker effects. The agreement between the genetic values predicted through estimates from the training population was validated in each group considered as a validation population.

Genomic selection methods used
The rrBLUP (Meuwissen et al. 2001) uses best linear unbiased prediction (BLUP), considering that all markers have the same variance (absence of major effect genes). The rrBLUP was analyzed using the mixed.solve function of the rrBLUP package (Endelman 2011).
The Blasso method (Park and Casella 2008) considers a variance for each marker. This method was analyzed using the Bayesian generalized linear regression (BGLR) function of the BGLR package (Pérez-Rodriguez and de los Campos 2011). A total of 20,000 burn-ins, 10 thin, and 100,000 saved iterations, as obtained from the Markov chain Monte Carlo (MCMC) method, was used.
The UPLS and MPLS methods were analyzed in the R software (R Development Core Team 2020), using the PLS package and the partial least square regression (plsr) function. For these methods, the optimal number of components that explained 80% of the total variation of the markers was used (variable X). Thus, approximately 30 components were considered in the analysis for the different simulated populations.

Elaboration of genomic selection indexes
The principal of the genomic selection index proposed in this study is the estimation of the final effect for each marker, using the sum of the effects of this marker for each trait, which are weighted by their respective residual variance, as shown in Eq. 2: where Ef m is the final effect of marker m weighted by the four traits evaluated, considering m = 1, 2, ..., 1000 markers; M mi is the estimated effect for marker m for trait i ( i= 1, 2, 3, 4); and σ i 2 is the residual variance obtained for trait i. Following this principle, four genomic selection indexes were developed, one for each evaluated method: rrBLUP selection index (I-rrBLUP); Blasso selection index (I-Blasso); UPLS selection index (I-UPLS); and MPLS selection index (I-MPLS). Thus, the final effects of the markers (Ef 1 , Ef 2 ,..., Ef 1000 ) estimated by the genomic selection index proposed for each method were considered in the estimation of the GEBVs.

Comparison between methods and indexes of genomic selection
The phenotypic (ac f ) and genotypic (ac g ) selective accuracy were calculated, as in Eqs. 3 and 4: where h 2 is the heritability value of the trait; PV is the phenotypic value; GV is the true genetic value; and GEBV is the genomic estimated breeding value. The processing time of each method was computed in seconds. An AxB factor analysis was performed in a completely randomized design (CRD). The selection accuracy (A factor) and time of each genomic selection method (B factor) were compared by Tukey's test at the 5% probability, considering each evaluated trait.
Afterward, 80 individuals (10%) with the highest phenotypic values and true genetic values were selected, and the selection gains (SG) were estimated by the direct (selection of individuals with superior performance in a trait and estimation of the gain with selection is calculated in the same trait) and indirect selection (selection of individuals with superior performance in a trait and gain with selection is calculated in the other traits) methods, using Eq. 5: where SD is the selection differential, which was estimated by: is the mean of the selected individuals, and Similarly, 80 individuals with the highest GEBVs, estimated from each genomic selection method and genomic selection index, were selected. By the individuals ranked from the highest GEBVs, the respective SG was calculated, using the phenotypic values and true genetic values.
The calculation of the SG using the true genetic values simulated for each trait considered a heritability value of 100%, i.e., h 2 = 1. Conversely, the calculation of the SG using the phenotypic values considered simulated heritability values of 20, 40, 60 and 80% for traits C1, C2, C3 and C4, respectively. The selection efficiency of the methodologies evaluated was estimated through the selection coincidence (SC), for 10% of superior individuals, identified for the true genetic value, phenotypic value and each genomic selection method or index. Thus, the SC was estimated as in Eq. 6: where NS is the number of individuals selected on both methods in contrast and NT is the total number of individuals selected. All measures used to compare the different methodologies evaluated in this study were estimated considering the mean of the ten populations simulated for the four traits with different heritability values. Table 1 shows that the analysis of processing time for the Blasso method was significantly greater than that of the other genomic selection methods, evaluated by Tukey's test at 5% probability level, regardless of the heritability value of the trait. Thus, considering the lower computational demand (shorter processing time), these results suggest that the rrBLUP, UPLS, and MPLS methods were more advantageous. Tukey's test revealed significant differences for selection accuracy for the genomic selection methods, considering the four traits with different heritability values from the phenotypic value and true genetic value (Table 2). No significant differences were observed between rrBLUP and Blasso methods, in any of the selection accuracy evaluations. The selection accuracy of the UPLS method was inferior compared with other genomic selection methods, considering the low heritability traits. However, for the trait of higher heritability value, results obtained by the UPLS method were statistically equal to those of the rrBLUP and Blasso methods (Tables 2). Table 2. Comparison between the genomic selection methods random regression best linear unbiased predictor (rrBLUP), Bayesian Lasso (Blasso), univariate partial least square (UPLS) and multivariate partial least square (MPLS) regarding the genotypic and phenotypic selection accuracy for the four traits evaluated. Unlike the other methods, the selection accuracy results for the MPLS method did not increase with the increase in the heritability value of the trait ( Table 2). The selection accuracy results obtained for the MPLS method were statistically equal to those detected for the rrBLUP and Blasso methods for the low heritability traits. However, for the high heritability traits, the results of the MPLS were lower than those from rrBLUP and Blasso methods. Table 3 shows the estimates of direct and indirect selection gains, considering the 80 individuals selected from the highest phenotypic values and true genetic values for the four traits evaluated. The greatest gains were obtained by direct selection. As expected, the direct gains in each trait obtained from the true genetic values were greater than those based on the phenotypic values.

Methods
For phenotypic values, high heritability traits (C3 and C4) had the greatest total selection gains, 5.52 and 5.92% (Table 3). These values correspond to only 50.41 and 54.31%, respectively, of the maximum total gains that can be achieved when considering the true genetic values for these traits (10.95 and 10.59%). Total gains obtained for the low heritability traits, considering the phenotypic values (3.96 and 4.55%), were far from the maximum total gain that can be achieved with the true genetic values (10.75 and 10.34%) ( Table 3). This fact confirms that phenotypic selection is less efficient for low heritability traits.  Table 4 shows the direct and indirect selection gains estimated in the true genetic values, based on the 80 individuals ranked by the GEBVs, obtained in each genomic selection method. Direct and indirect gains, as well as total gains provided by the rrBLUP and Blasso methods, were remarkably like each other (Table 4). These results were as expected when considering the maximum gains that could be achieved via true genetic values (Table 3). These methods stood out due to higher total gains obtained in the selection for the low heritability traits (C1 and C2), showing superiority when compared with the selection based only on the phenotypes of the individuals (Table 3). The UPLS and MPLS methods exhibited the lowest direct gains for the four traits evaluated when compared with the rrBLUP and Blasso methodologies (Table 4). Furthermore, these methods were efficient only for the high heritability traits since the total selection gains tend to be close to the gains obtained with the phenotypic selection for the low heritability traits (Table 3). The total gains obtained from the true genetic values when applying any genomic selection index proposed in this study (Table 5) are greater than those obtained by direct and indirect selection in any methodology presented previously (Tables 3 and 4). These results suggested the superiority of the selection indexes in relation to both the multi-trait method MPLS and the univariate methods rrBLUP, Blasso and UPLS. The total gain for the phenotypic selection did not exceed 6% in the best case (Table 3), while with the use of the indexes the total gain was above 6.5% (Table 5), with a better distribution of gains among the four traits. As expected, the selection indexes can achieve higher total gains, but the gains obtained for each trait are lesser than the maximum gains that could be achieved in the direct selection of each trait based on the true genetic values (Table 3). Among the selection indexes, the I-UPLS had the least total gain when selecting the individuals based on the true genetic values. Conversely, it presented the highest total selection gain when considering the phenotypic values (Table 5).
The selection coincidence of 80 individuals, between the methods genomic selection, genome selection indexes, phenotypic values, and true genetic values for the low (C1 and C2) and high (C3 and C4) heritability traits are shown in Tables 6 and 7, respectively.
The selection coincidence estimated among the individuals selected from the phenotypic values and true genetic values ranged from 30.75 to 65.13%, from the lowest to the highest heritability value (Tables 6 and 7). The rrBLUP and Blasso methods were more efficient in the selection of superior genotypes based on the phenotypic values, mainly for the low heritability traits. This fact is evidenced in Table 6, which shows that the selection coincidences of the individuals selected by the GEBVs obtained by the rrBLUP and Blasso methods, considering that the true genetic value for C1 was twice (70% for both methods) when compared with the selection coincidence between the phenotypic value and the true genetic value (30.75%).
Also, the selection coincidence obtained between the rrBLUP and Blasso methods in the selection of 80 individuals were higher than 98%, regardless of the heritability of the trait (Tables 6 and 7). These results confirm once again that the rrBLUP and Blasso methods, despite the different approaches, showed similar behavior and results are equally efficient in the identification and selection of genetically superior individuals.
Conversely, for C1 trait (Table 6), the selection coincidences obtained by the individuals selected based on the GEBVs, estimated by the UPLS and MPLS methods, and the true genetic values were 32.25 and 33.75%, respectively. These values are remarkably similar to that obtained by the individuals selected via phenotypic value and true genetic value (30.75%). Therefore, the MPLS and UPLS methods had the same efficiency as the phenotypic selection for the low heritability traits, not justifying the application of these methodologies in these cases. It was seen that the indexes produce a ranking that generates a greater gain in all the traits. Comparing the selection through phenotypic value and selection through I-Blasso index, both for C3 trait (Table 7), the I-Blasso index has a 57.37% higher efficiency than the ranking from the phenotypic value for the trait C3. In most scenarios, the selection coincidences between the individuals selected by the genomic selection indexes and the true genetic values, regardless of the heritability of the trait, were higher when compared with those obtained between the individuals selected by phenotypic values and true genetic values. Table 7. Comparison between genomic selection methods and indexes by the selection coincidence analysis for the simulated traits: C3 (h 2 = 60%), above the main diagonal, and C4 (h 2 = 80%), below the main diagonal.  Among the proposed indexes, the highest selection coincidence of the superior individuals was observed between the indexes I-Blasso and I-rrBLUP (99.25%) for C4 trait, which selected practically the same individuals. Despite the low to moderate efficiency revealed by the MPLS method in the selection of superior genotypes for the low heritability traits (C1 and C2), the I-MPLS was indicated as more efficient in some scenarios (Table 6). Conversely, the individuals selected by the I-UPLS resulted in the lowest selection coincidence with the individuals selected via true genetic values when compared with the other indexes. Moreover, for some traits, this selection coincidence was lower than or equal to the selection coincidence between the phenotypic value and the true genetic value (Tables 6 and 7).

DISCUSSION
Phenotypic selection in plant breeding is difficult since most of the traits of interest are of quantitative nature, i.e., they are controlled by several genes, have low heritability value, and are consequently strongly influenced by the environment (Falconer and Mackay 1996). This fact is evidenced by the present results since the phenotypic selection showed less efficiency in the selection coincidence with the superior genotypes, especially for the low heritability traits (Tables 6 and 7). In addition, the direct and indirect selection gains through phenotypic values were lower compared with the maximum genetic gains that can be achieved with the selection (Table 3), resulting in lower gains in relation to what could be achieved with the use of more accurate selection methods, such as genomic selection (Spindel et al. 2015;Zhang et al. 2016).
The UPLS and MPLS methods allowed a high dimensionality reduction (97%), which resulted in a shorter analysis processing time, being more advantageous than Blasso (Table 1). Methodologies based on partial least squares regression allow large amounts of data to be quickly analyzed; they have high statistical efficiency and computational speed to obtain the GEBVs estimates of the individuals (Boulesteix and Strimmer 2007;Solberg et al. 2009). However, these methods did not show greater computational advantages over the rrBLUP method, considering the evaluation of 800 individuals and 1000 markers ( Table 1).
The number of markers used in this study was low, however was considered adequate for the purpose and agree in other similar studies (Oliveira et al. 2021;Sant' Anna et al. 2019. Bhering et al. (2015) and Spindel et al. (2015) showed that a small number of markers can be used for high precision genomic selection for many traits. Peixoto et al. (2016), in a study with simulated data, concluded that a genomic selection model that uses 300-800 markers is sufficient to capture all the genetic variance, and to decrease the residual variance, to obtain the maximum prediction precision of an F 2 population.
In general, the UPLS method provided low selection accuracy for the low heritability traits (Table 2). Solberg et al. (2009), based on simulated data, compared different genomic selection methodologies to be applied in animal breeding and verified the superiority of the selection accuracy of a Bayesian method (Bayes B) in relation to the UPLS method. These authors demonstrated that the selection accuracy of the UPLS method decreased for the low heritability traits. Azevedo et al. (2014) reported low predictive capacity for the UPLS method, indicating the inferior performance of this method in relation to rrBLUP.
The UPLS method was very similar to the phenotypic selection, mainly for the low heritability traits, resulting in low total gains (Tables 3 and 4). This fact reveals the high selection coincidence of the superior individuals by the UPLS method and the phenotypic selection. This selection coincidence was higher than that obtained when considering the true genetic values of the individuals (Tables 6 and 7). Thus, these results suggest that the UPLS method was inefficient in the identification and accurate selection of genetically superior individuals. The prediction inefficiency of this method was also reported by Azevedo et al. (2014) in a comparative analysis of six genomic selection methodologies applied to animal breeding.
The selection accuracy obtained by the MPLS method was greater than or equal to UPLS method (Table 2). Azevedo et al. (2013) reported higher accuracy for the MPLS method and inferred that this methodology is more appropriate for genomic selection than the UPLS method, since the MPLS realistically captures the nature of the traits, considering the correlations between them. However, the present study revealed that, for traits of lower heritability values, considering the selection gains (Table 4) and the selection coincidences in the selection of the genetically superior individuals (Tables 6 and 7), the MPLS had similar behavior compared to UPLS, resulting in low total gains and low efficiency in the identification and selection of genetically superior individuals. These results indicate that, for the low heritability traits, the MPLS approach is similar to the phenotypic selection and is recommended for genomic selection only for traits with 60% heritability.
In this study, 200 QTL were simulated for each quantitative trait, distributed randomly in the genome, without the presence of QTLs of major effects, which resulted in similarity between the Blasso and rrBLUP methods. Considering the normal distribution of QTL throughout the genome, methods based on mixed models are equally efficient to Bayesian methods. Bhering et al. (2015), comparing rrBLUP and Blasso, verified no significant difference between the methods of the selection accuracy. Therefore, results of the present study indicate that the rrBLUP method was the most efficient, since it presented high values of selection accuracy, high gains, and selection coincidence of superior genotypes in the shortest processing time.
The selection indexes proposed in this study were more efficient and provided the highest total selection gains when compared with the other univariates (Blasso, rrBLUP, and UPLS) and multi-trait (MPLS) methods (Tables 4  and 5). According to Dekkers (2007), the application of the index theory in genomic selection is promising and can significantly increase selection gains, especially for the low heritability traits, which was evidenced in this study by the results.
However, the direct gains estimated through selection indexes for each trait were lesser than those obtained by the other genomic selection methods (Tables 4 and 5). These results are in agreement with the theory of selection indexes, which provide relatively higher total gains, causing reduced gains in each individual trait, compensating for this reduction by the better distribution of favorable gains in the other traits . In this sense, in relation to the GEBVs, the selection coincidences obtained for the selection indexes were always lesser than those obtained via the Blasso and rrBLUP methods (Tables 6 and 7). However, the individuals selected by the indexes have higher performance in all the traits, simultaneously, unlike those selected by the genomic selection methods, which consider the improvement in one trait at a time.
The I-MPLS led to high total genetic gains and high efficiency in the selection and identification of the superior individuals, showing results similar to those observed for I-Blasso and I-rrBLUP (Table 5). Covarrubias-Pazaran et al. (2018) demonstrated that the use of multi-trait models could increase selection accuracy and consequently result in more powerful and reliable selection indexes. To obtain the proposed genomic selection indexes, first, data should be analyzed by genomic selection methods, aiming to estimate markers effects. Thus, considering the low efficiency demonstrated by the UPLS, the lowest selection coincidences between the indexes in the selection of superior individuals are in relation to the I-UPLS (Tables 6 and 7). This index had the lowest efficiency in the identification and selection of superior genotypes, resulting in the lowest selection gains and coincidences when considering the true genetic value, being the least indicated for application in breeding programs.
On the other hand, the I-rrBLUP, I-Blasso and I-MPLS indexes properly identified the best genotypes, allowing gains in all traits simultaneously. Genomic selection indexes are a promising tool to be applied in plant breeding programs, since they allow the direct early selection by the information of molecular markers, as well as the selection of superior individuals in a set of traits of economic interest, in a shorter processing time (Ceron-Rojas et al. 2015). The results showed that the I-rrBLUP and I-MPLS indexes are the most indicated since they were highly efficient in the identification and selection of superior genotypes, in a shorter time, and with less computational demand. The genomic selection indexes developed in this study were efficient in the identification of superior genotypes in different traits, simultaneously, and are suitable for plant breeding programs.

CONCLUSION
The rrBLUP is the most recommended method for selecting superior genotypes based on selection accuracy, selection coincidence, selection gain and processing time. However, from the selection indexes proposed, I-rrBLUP and I-MPLS indexes were the most advantageous due to the higher efficiency in the selection of superior genotypes in shorter processing time. The indexes proposed here overcome the results from the univariate models, regardless the heritability, being its use largely indicated genomic selection in any breeding program aiming the improvement of several traits, simultaneously.