Updating knowledge in estimating the genetics parameters: Multi-trait and Multi-Environment Bayesian analysis in rice

ABSTRACT Among the multi-trait models selected to study several traits and environments jointly, the Bayesian framework has been a preferred tool when constructing a more complex and biologically realistic model. In most cases, non-informative prior distributions are adopted in studies using the Bayesian approach. However, the Bayesian approach presents more accurate estimates when informative prior distributions are used. The present study was developed to evaluate the efficiency and applicability of multi-trait multi-environment (MTME) models within a Bayesian framework utilizing a strategy for eliciting informative prior distribution using previous data on rice. The study involved data pertaining to rice (Oryza sativa L.) genotypes in three environments and five crop seasons (2010/2011 until 2014/2015) for the following traits: grain yield (GY), flowering in days (FLOR) and plant height (PH). Variance components, genetic and non-genetic parameters were estimated using the Bayesian method. In general, the informative prior distribution in Bayesian MTME models provided higher estimates of individual narrow-sense heritability and variance components, as well as minor lengths for the highest probability density interval (HPD), compared to their respective non-informative prior distribution analyses. More informative prior distributions make it possible to detect genetic correlations between traits, which cannot be achieved with non-informative prior distributions. Therefore, this mechanism presented to update knowledge for an elicitation of an informative prior distribution can be efficiently applied in rice breeding programs.


Introduction
Rice (Oryza sativa L.) is one of the most important sources of the global population's daily caloric and nutritional requirement (FAO, 2020).Across the world, the population is increasing while, at the same time the available area of suitable wetlands is decreasing (Ray et al., 2013).It is estimated that by 2050 the agricultural production of rice should increase by between 60 and 110 % (Hunter et al., 2017;Juliana et al., 2019).Thus, the evaluation of multiple traits in rice cultivation aims to maximize grain yield potential (Liang et al., 2021).In general, in a plant breeding program aimed at identifying the most genetically superior genotypes, the selection is based on one trait only (Suela et al., 2019;Sabri et al., 2020).However, this approach can cause problems if its performance in another desirable trait is not evaluated (Cruz et al., 2014).Genetic evaluation of multiple traits is relevant since superior varieties combine optimal attributes for several traits simultaneously in plant breeding (Torres et al., 2018).In these cases, selection can be made indirectly, based on easy to measure secondary traits of low environmental influence genetically correlated with the target trait, which is an exciting alternative for maximizing accuracy (Santos et al., 2018).
Among the multi-trait models used for modeling several traits and environment jointly, the Bayesian framework has been the preferred tool when using a more complex and biologically realistic model (Dunson, 2001).Several studies have demonstrated the potential of the Bayesian approach to genetic evaluation in plant breeding by considering multi-trait evaluation (Torres et al., 2018;Peixoto et al., 2021).However, in the majority of these studies, non-informative prior distributions are used.The Bayesian approach tends to be less biased and presents more accurate estimates than classical analysis when it uses informative prior distributions (van de Schoot et al., 2021) that should be preferable for breeding purposes aimed at improving selection accuracy.
Systems for updating knowledge of the hyperparameters and building informative prior distributions were proposed by Silva et al. (2013) and Azevedo et al. (2022).The first study used the scaled inverse chi-square prior distributions for the parameters in univariate analysis in maize breeding, drawing on the previous phenotypic data in two selection cycles.The second study used inverse gamma prior distributions considering ten crop seasons in white oat (Avena sativa L.).However, these procedures for eliciting informative prior distributions have not yet been presented with multi-trait analysis.Thus, the present study aimed to evaluate a strategy for eliciting informative prior distribution using previous data from rice.For such, phenotypic data of three traits associated with eighteen genotypes of rice evaluated in five crop seasons were used.
The useful area consisted of four meters of three internal rows (4 m × 0.9 m, 3.60 m 2 ).The experiments were conducted on floodplain soils with continuous flood irrigation.The cultural treatments were carried out following the recommendations for irrigated rice cultivation in the evaluated regions (Soares et al., 2005).

Model and Bayesian inference
The traits (GY, FLOR and PH) were analyzed using multi-trait models featured in the Markov Chain Monte Carlo (MCMC) Bayesian approach.The first idea was to compare the full model (considering the interaction between the genotypes and environments) with the null model (not considering the interaction).The full fitted multi-trait statistical model was given by: y = Xb + Z 1 r + Z 2 u + e which can be rewritten as: where y ij is the vector of phenotypic values of the i-th trait (i = 1,2,3) in the j-th environment (j = 1,2,3); b ij the vector of effect of the j-th environment in the i-th trait; r ij i the vector of the block effects of the i-th trait in the j-th environment; u ij the genetic values vector of the i-th trait in the j-th environment, and e ij the residual vector of the i-th trait in the j-th environment.X is the incidence matrix of systematic effects, Z 1 the incidence matrix of block effects and Z 2 the incidence matrix of genetic effects.
The prior distributions for the parameters of the model were given by: b~N(0,I⊗∑ b ) where I is the identity matrix, ∑ b , ∑ r , ∑ u and ∑ e are the (co)variance matrix with prior distributions given by: where IW is the inverted Wishart distribution; V b , V r , V u and V e the matrices with known values, and h b , h r , h u and h e the known constants known as hyperparameters.The (co)variance matrix estimates are given by:

…
e e e e 2 1 2 22 1 3 where are, respectively, the block, genetic and residual variance of the i-th trait in the j-th environment; s rii'(j) , s uii'(j) , s eii'(j) the block, genetic and residual covariance between the i-th trait and the i'-trait in the j-th environment, and s rii'(j,j') , s uii'(j,j'), s eii'(j,j') the block, genetic and residual covariance between the i-th trait of j-th environment and i'-trait of j'-th environment respectively.∑ b is a diagonal matrix with values equal to 10 8 .The Bayesian estimation of the parameters (b,r,u,∑ r ,∑ u ,∑ e ) is based on their posterior marginal distributions, which are indirectly generated through the MCMC algorithms and create a chain of values for each parameter.The w-th value of the chain of individual narrow-sense heritability associated with the i-th trait and the j-th environment h i w 2( ) , is given by: where and s ei j w ( ) are, respectively, block, genetic and residual variances of w-th iteration and the i-th trait in the j-th environment.The relative variation coefficient is the ratio of the coefficient of genotypic variation to the coefficient of residual variation, i.e.
CV CV g e .The w-th value of the chain of genetic correlation between traits i and i' in the j-th enviroment, r ii j w ′( ) ( ) , is given by: where is the genetic covariance between traits i and i'; is the genetic variance of the i-th trait, and V ui j c( ) (w) 2 i the genetic variance of the i'-th trait in the j-th environment and the w-th iteration of the MCMC algorithm.The w-th value of the chain of genetic correlation of the i-th trait between environments j and j', r i j j ′ ′ is given by: where is the genetic covariance between environments; j and j's ui j w ( ) ( ) 2 the genetic variance of the j-th environment, and i the genetic variance of the j'-th environment of the i-th trait in the w-th iteration of the MCMC algorithm.The efficiency of indirect selection of the i-th trait in the j'-th environment relative to direct selection in the targeted the j-th environment EISj(j') proposed by Windhausen et al. (2012) is given by: where r i(j,j') is the posterior mean of genetic correlation of the i-th trait between environments j and j', h i(j ) ′ 2 is the posterior mean of individual narrow-sense heritability associated with the i-th trait and the j'-th environment, and h i(j) 2 is the posterior mean of individual narrow-sense heritability associated with the i-th trait and the j-th environment.
The informative worth of prior distribution is associated with the values of the hyperparameters and, consequently, in this study, with the (co)variance matrices of the normal distribution (van de Schoot et al., 2021).Based on phenotypic databases containing several years of collection, it is possible to update the hyperparameters and increase our knowledge of the (co)variance matrices and thus create informative prior distributions.In univariate analyses with ten seasons of data, the updating knowledge for the p-th season should be carried out using information from the (p -1)-th season only and not the previous seasons (Azevedo et al., 2022).Therefore, in this study, we used two prior distributions, one noninformative and one informative prior distribution created according to Azevedo et al. ( 2022) though in our case the multivariate approach was adopted.Thus, in this study, two analyses were performed: i) five multi-trait and multienvironment models which considered each crop season separately and used non-informative prior distributions in all seasons; ii) five multi-trait and multi-environment models which considered informative prior distributions, where the (p -1)-th crop season contributed to the p-th crop season.In the first season, non-informative prior distribution was used.
In non-informative prior distribution, we consider the hyperparameter to be equal to V r = V u = V e = I and h r = h u = h e = 0.02 (Hadfield, 2010).For the construction of the informative prior distributions, we know that if the (co)variance matrix is Σ~IW(V, h) (in this study, the dimension of ∑ is 9 × 9), then the expected value of ∑ is given by V K K 10 (h ≥ 10) and the mode of ∑ is given by V K K 10 .Thus, the posterior mean of (co)variance components (Σ) obtained in the analysis of (p -1)-th season and its respective posterior mode (M o ) were equalized to the expected value and mode of the ∑~IW(V, h) distribution.Through these expressions, it was possible to find the following equality The following parameters were calculated in order to assess the impact of prior knowledge insertion : i) the posterior coefficient of variation (CV) of the estimates of the components of variance, individual narrow-sense heritability, genetic correlation and additive genetic values; ii) length of the Highest Posterior Density intervals (HPD) of the parameter estimates; iii) the deviance information criterion (DIC), when possible, since the quality of the fit can only be compared using the DIC when the model uses the same data; iv) agreement between genetic estimates by both non-informative and informative prior distribution, considering 30 % of the selection differential (a total of six genotypes).
All computational implementations of the analysis were performed using MCMCglmm (Hadfield, 2010) in R (R software, version 4.1.0).A total of 3,000,000 samples were generated, assuming a burn-in period and sampling interval of 100,000 and ten iterations, respectively, which resulted in 290,000 samples.The convergence of MCMC chains was assessed by Geweke's diagnostic (Geweke, 1992), which was performed using the CODA R package (Plummer et al., 2006).The computational routine is available at https://github.com/licaeufv/Multitrait-Multi-Environment-Rice.

Model selection and convergence of parameters
Overall, except for the 2012-2013 crop season, the entire model (model with the interaction effect) presented lower DIC values compared with those obtained from the null model (model without the interaction effect) (Table 1).The lower values of DIC indicate better goodness-offit of the full model (Spiegelhalter et al., 2014), except for the 2012-2013 crop season analysis using noninformative prior distributions.For all parameters, the p of Geweke's Z statistics were more significant than 1 % (Tables 2 and 3), indicating that convergence was achieved, and the inferences could then be made.

Comparison between informative and noninformative prior distributions
The lower posterior coefficient of variation (CV) values of the genetic variance and individual narrow-sense heritability (Table 2) and genetic correlation (Table 3) were observed by considering the informative prior to the estimation process.Using this approach, the hyperparameters from the prior distributions were obtained by analyzing the previous year.Therefore, the length of the HPD interval is also shorter due to the higher precision provided by this informative prior (Tables 2 and 3).The same results were found by Silva et al. (2013) when considering univariate analyses in maize.However, the same was not observed in the genetic values.In most analyses, the CV of genetic value presented increased amplitude in the informative priors.Despite these amplitude values, considering a selection differential of 30 %, the agreement between the selected genotypes in both prior distributions is more than 50 % (Table 4).

Individual narrow-sense heritability of traits
Considering the results obtained by the informative prior distributions, the estimates of individual narrowsense heritability for GY, FLOR and PH were low to high, respectively, with 0.  2).It is worth emphasizing that the low heritability values observed did not depend on the number of evaluated genotypes, since the Bayesian approach is recommended essentially for small sample sizes (Torres et al., 2018).In addition, GY is quantitative and highly affected by the environment (Rao et al., 2017;Li et al., 2018;Kumar et al., 2019;Zhang et al., 2020).
Studies using the Bayesian approach for estimating genetic parameters are rare in rice.Using multi-trait and multi-environment Bayesian analysis but with noninformative prior distributions and only the one crop season, Silva Junior et al. ( 2022) found heritabilities of 0.28 and 3.32e-6 for GY also in the localities of Lambari and Janaúba, respectively.As for FLOR, Silva Junior et al. (2022), in this same study, found heritabilities of 0.31 and 0.27 in Lambari and Janaúba, respectively.Using rice genotypes in assays performed at the Dale Bumpers National Rice Research Center (DBNRRC), Sharma et al. (2021) found heritabilities of 0.60 and 0.93 for GY and PH, respectively.The heritabilities observed by Bhandari et al. (2019) in three managed environments in the Philippines ranged from 0.75 to 0.84, 0.54 to 0.91, and 0.71 to 0.96 for the FLOR, GY, and PH traits, respectively.
We observed increased additive genetic variance and heritability in the use of informative prior on the results of the non-informative prior distribution for all traits, except for PH, in the locality of Lambari, and FLOR, in the locality of Janaúba (Table 2).Using informative prior distribution based on the data can be conducted on biased variance component estimates (over and underestimated) (Silva et al., 2013).Furthermore, the posterior mean of heritability can increase using an informative prior.However, the Bayesian approach is more often recommended using interval estimates, such as HPD intervals than point estimates.Among the 18 rice genotypes evaluated, the GY trait in the Janaúba locality showed the highest additive genetic variance, while the lowest value was found for PH in the Lambari is the additive genetic variance, h 2 is the individual narrow-sense heritability and u is the additive genetic value.ϯ Minimum and maximum of the additive value and CV.Grain yield (GY), in kg ha -1 ; Flowering (FLOR) in days and Plant Height (PH), in cm.Sci.Agric.v.80, e20220056, 2023 locality.We also observed the highest heritability value of 0.79 in the Janaúba locality for PH and the lowest heritability for GY, with a value of 0.14 in the Leopoldina locality.
The GY, FLOR and PH traits presented coefficients of variation (CV g ) from 0.95 % to 2.58 %, 3.25 % to 3.84 % and 1.06 % to 2.48 %, respectively, for informative prior distributions for each place studied.These can be considered adequate when compared to the method for the classification of coefficients of variation for rice cultivation, proposed by Costa et al. (2002), which determined that the coefficients of variation should be below 51.36 %, 7.62 %, and 17.27 % for grain yield, flowering in days and plant height, respectively.The relative variation coefficients (CV g / CV e ) that are greater than the unit suggest that genetic variation is more influential than residual variation (Torres et al., 2018).This was observed in this study for FLOR, in Lambari and Leopoldina, and for PH, in Janaúba and Leopoldina.

Genetic correlation between environments
The genetic correlations between environments for all traits ranged from 0.14 to 0.47 for GY; 0.14 to 0.36 for FLOR; and 0.34 to 0.47 for PH, which indicates the existence of interaction between environments (Table 5).The genetic correlations between environments were positive for all traits.Leopoldina was the environment that presented the highest correlation with other locations.Considering the genetic correlations below 0.30 as low and above 0.60 as high, Oliveira et al. (2020) suggest the occurrence of high (0.14-0.22) and moderate (0.34-0.47)G × E, i.e., the performance of genotypes varied between environments.
The efficiency of indirect selection compared to direct selection is presented in Table 6.For all evaluated traits, direct selection proved to be more efficient.However, as expected, indirect selection was more efficient in more correlated environments where the heritability of the indirect selection environment was greater than in the direct selection environment, which varied according to the traits under study.For GY, the highest efficiency of indirect selection was observed in  The percentage of agreement considering a selection differential of 30 % was calculated to compare the ranking of genotypes between the three environments for each trait, as described above (Table 5).For the GY trait, 83.33 %, 0.00 % and 16.67 % of coincidence between the environments were observed.For the FLOR trait, 83.33 %, 50 % and 66.67 % of coincidence between the environments were observed.For the PH trait, all coincidences observed were 16.67 %.This result suggests that the PH and GY traits are more influenced by the environment than FLOR.
The difference between the genotype rankings in this study indicates that if breeders selected genotypes using only individual trial results, their selection would change between trials.In addition, there is still the possibility that high-yield genotypes are discarded and low-yield genotypes are chosen for other environments.This low correlation between the rankings for GY and PH also indicates that the rice genotypes carry many alleles that are differentially adapted to the evaluated environments, highlighting the importance of multienvironmental trials for this data set to address and deal with the genotype by environment interaction.

Genetic correlation between the traits
We verified that the HPD lengths of genetic correlation, using the informative prior distribution, decreased over the years (Figures 1, 2 and 3).In addition, four pairs of traits and environment (GY × FLOR in the Lambari and Leopoldina locality and GY × PH) were not significant in the first years.With the accumulation of information over the years, these correlations were significant.In contrast, all the correlations obtained using the non-informative model were not significant.
The correlations obtained, using the informative model, for the GY and PH traits were significant for all locations.For the localities of Lambari and Leopoldina, the correlations were 0.14 [0.09, 0.20] and 0.15 [0.10, 0.21], respectively, while for Janaúba, the correlation was -0.50 [-0.54, -0.46].Similar results were observed by Lakshmi et al. (2014) and Oladosu et al. (2018) in their study on rice genotypes under tropical conditions, which found correlations of 0.18 and -0.34, respectively.This divergence can be explained by the effect of the environment on the expression of these traits, as observed in the results of Table 5.The estimated correlation values for GY and FLOR were 0.11 [0.05, 0.16] for Lambari and 0.13 [0.07,0.18]for Leopoldina, which corroborates the correlation of 0.11 estimated by Lakshmi et al. (2014).
The Bayesian estimation of parameters such as genetic correlation is advantageous compared to the classical estimation using the maximum likelihood method (Nustad et al., 2018).In classical statistics, confidence intervals are only possible through Bootstrap and delta method procedures (Manichaikul et al., 2006).These intervals generally have great     amplitudes (Beyene and Moineddin, 2005).The Bayesian approach makes it possible to estimate credibility intervals (in general, they are shorter than the confidence intervals).Thus, shorter intervals make it easier to detect correlations between traits and even between environments.

Conclusions
We demonstrated the feasibility of the proposed multi-trait multi-environment Bayesian model for plant breeding involving a low number of genotypes that are evaluated for multiple traits across a range of environments.In addition, we presented a knowledgeupdating mechanism for eliciting an informative prior distribution.More informative prior distributions make it possible to detect genetic correlations between traits.This was not feasible with the use of non-informative prior distributions.
were used as hyperparameters of the prior distribution of the p-th crop season.

Figure 1
Figure 1 -(A) Mean posterior (in bars) and highest posterior density (in arrows) of genetic correlation (GC) between GY and FLOR traits, (B) absolute value of coefficient of variation (CV) of genetic correlation (GY × FLOR), using the non-informative and informative prior distribution the five years.Grain yield (GY), in kg ha -1 and Flowering (FLOR) in days.

Figure 2
Figure 2 -(A) Mean posterior (in bars) and highest posterior density (in arrows) of genetic correlation (GC) between GY and PH traits, (B) absolute value of coefficient of variation (CV) of genetic correlation (GY × PH) using the non-informative and informative prior distribution over the five years.Grain yield (GY), in kg ha -1 and Plant Height (PH), in cm.

Figure 3
Figure 3 -(A) Mean posterior (in bars) and highest posterior density (in arrows) of genetic correlation (GC) between FLOR and PH, (B) absolute value of coefficient of variation (CV) of genetic correlation (FLOR × PH) using the non-informative and informative prior distribution over the five years.Flowering (FLOR) in days and Plant Height (PH), in cm.

Table 1 -
Deviance information criteria (DIC) for the full (considering G × E interaction) and null (not considering the interaction).

Table 2 -
Mean, 95 % highest probability density interval (HPD) and coefficient of variation (CV) of the posterior densities of the genetic parameters for traits relative to 2014-2015, considering non-informative and informative prior distributions, the statistics of convergence and DIC (Deviance information criterion).

Table 3 -
Mean, 95 % highest probability density interval (HPD) and coefficient of variation (CV) of the posterior densities of the genetic correlation (r g ) for traits relative to 2014-2015, considering non-informative and informative prior distributions and the statistics of convergence.

Table 4 -
Agreement between genetic breeding values estimated via Bayesian approach with non-informative and informative prior distribution, considering each crop season, environment and trait.

Table 5 -
Mean and 95 % highest probability density (HPD) interval of the genetic correlation (r g ) between environment (upper diagonal), relative variation coefficient (diagonal) and agreement between genetic breeding values estimated for each pair of environments relative to 2014-2015 (under diagonal).Grain yield (GY), in kg ha -1 ; Flowering (FLOR) in days and Plant Height (PH), in cm.

Table 6 -
The efficiency of indirect selection of the i-th trait in the j'-th environment (columns) relative to direct selection in the targeted j-th environment.