Quantifying individual variation in reaction norms using random regression models fitted through Legendre polynomials: application in eucalyptus breeding

: An accurate, efficient and informative statistical method for analyses of genotype × environment (G × E) interactions is a key requirement for progress in any breeding program. Thus, the objective of this study was to quantify individual variation in reaction norms using random regression models fitted through Legendre polynomials in eucalyptus ( Eucalyptus spp. ) breeding. To this end, a data set with 215 eucalyptus clones of different species and hybrids evaluated in four environments for diameter at breast height (DBH) and Pilodyn penetration (PP) was used. Variance components were estimated by restricted maximum likelihood, and genetic values were predicted by best linear unbiased prediction. The best-fitted model for DBH and PP was indicated by the Akaike information criterion, and the significance of the genotype effects was tested using the likelihood ratio test. Genetic variability between eucalyptus clones and very high accuracies ( r g^g > 0.90) were detected for both traits. Reaction norms and eigenfunctions generated genetic insights into G × E interactions. This is the first study that quantified individual variation in reaction norms using random regression models fitted through Legendre polynomials in eucalyptus breeding and demonstrated the great potential of this technique.


INTRODUCTION
Eucalyptus (Eucalyptus spp.) is widely cultivated in tropical and subtropical regions. Its cultivation is mainly intended to produce pulp, bio-oil, firewood, and charcoal. The international pulp trade and the intense search for alternative energy sources have increasingly motivated the establishment of eucalyptus plantations in several countries worldwide (Fonseca et al. 2010). In this sense, eucalyptus breeding programs have sought to identify more efficient selection techniques to increase yield and quality of traits of industrial interest.
The genotype × environment (G × E) interactions are characterized by the differential behavior of genotypes in relation to environmental variations (Resende 2015). These interactions can be expressed in various ways and with different intensities and can generate significant obstacles for genetic selection (Li et al. 2017;van Eeuwijk et al. 2016), including eucalyptus (Nunes et al. 2017;Resende et al. 2017;2018). Thus, the use of accurate, efficient and informative statistical methods that capture the information present in this source of variation and advantageously exploit its effects is fundamental in any breeding program.

Experimental data
The data used in this work come from evaluation of a clonal field test of different eucalyptus species (E. grandis, E. urophylla, E. saligna, and E. pellita) Table 1. In each environment, a field trial in a randomized block design was established, with 215 clones in single tree plots and 30 replications. Trees were planted at a spacing of 3.5 m between rows and 2.6 m between plants. This work used data from the assessment of all surviving trees at three-years of age in the field tests (22,295 trees in total) for diameter at breast height (DBH) (cm) and Pilodyn penetration (PP) (mm). The DBH was measured using a diameter tape and the PP using a Pilodyn. According to Greaves et al. (1996), PP, which is an indirect method to determine the basic density of wood, has been effective to evaluate eucalyptus.

Statistical analyses
In order to use Legendre polynomials, phenotypic mean of each environment (μ i ) must be scaled to range from -1 to +1. The environmental gradient values (E i ) were obtained as follows (Eq. 1) (Schaeffer 2016): (1) Variance components were estimated by restricted maximum likelihood (REML) (Patterson and Thompson 1971) and genetic values were predicted by best linear unbiased prediction (BLUP) (Henderson 1975). Random regression models were fitted through Legendre polynomials for DBH and PP as follows (Eq. 2): (2) where Y ijk is the i th genotype (i = 1, 2, …, 215) in the j th environment (j = 1, 2, 3, 4) in the k th replication (k = 1, 2, …, 30); μ is the overall mean; S j is the fixed effect of environment j; R/S jk is the fixed effect of replication k nested in environment j; d is the polynomial degree, ranging from 0 to D (D = number of environments -1); α id is the random regression coefficient for the Legendre polynomial for the genotype effect; Φ ijd is the d th Legendre polynomial for the j th environment for the i th genotype; and e ijk is the residual random effect associated with Y ijk .
In the matrix notation, the above model is described as follows (Eq. 3): (3) where y is the vector of phenotypic data; β is the vector of the replication-environment combination that comprises the fixed effects of environment and replication within the environment, added to the overall mean; α is the vector of genotype effects (random); and e is the vector of residuals (random). Uppercase letters represent the incidence matrices for these effects. In addition, α~N(0,K g ⊗ I 215 ) and e~N(0,R), where I 215 is an identity matrix of order 215, ⊗ denotes the Kronecker product, K g is the covariance matrix for the coefficients of genetic effects, and R represents the matrix of residual variances. The polynomial order in random regression models were selected using the Akaike Information Criterion (AIC) (Akaike 1974) as follows (Eq. 4): where LogL is the logarithm of the maximum (L) of the restricted likelihood function, and p is the number of estimated parameters. Besides that, different residual variance structures (homogeneous and heterogeneous) were tested.
The significance of the genotype effects was tested using the likelihood ratio test (LRT) (Wilks 1938) as follows (Eq. 5): where LogL R is the logarithm of the maximum (L R ) of the restricted likelihood function of the reduced model (without genotype effects). Estimates of genetic variance (σ g 2 ) and predicted genetic values (g ij ), in the original scale, were obtained by Eqs. 6 and 7 (Kirkpatrick et al. 1990): Phenotypic variance (σ p 2 ), broad-sense individual heritability (h 2 g ), and accuracy (r ĝg ) were estimated by Eqs. 8-10 ( Resende et al. 2014): (8) and (9) (10) where σ e 2 is the residual variance and PEV is the prediction error variance, obtained by the diagonal elements of the inverse of the coefficient matrix (information matrix) of the mixed model equations.
The eigenfunctions (ψ f ) of the genetic coefficient covariance matrix, aiming to study the G × E interactions, were obtained by Eq. 11 (Kirkpatrick et al. 1990): where (Cψ f ) d is the d th element of the f th eigenvector of K g , and Φ d is the normalized value of the d th Legendre polynomial.
The areas under the reaction norms (A), aiming to rank the clones, were obtained by Eq. 12: where x d is the environmental gradient. Statistical analyses were performed using the ASReml 4.1 (Gilmour et al. 2015) and R (R Core Team 2018) software. The ASReml program files are available in supplementary material.

Selection of models
According to the AIC (Akaike 1974), the best model for DBH and PP is denoted by Leg.4.Rhe (Legendre polynomial of the three degree for genotype effects, with a heterogeneous residual variance structure) ( Table 2), since the lower AIC value reflect a better overall fit. Thus, this model was adopted to estimate the variance components and to predict the genetic values along the environmental gradients. According to the LRT, genetic variability (p < 0.01) was detected for DBH and PP (Table 2).

Variance components and genetic parameters
For DBH, genetic variance was not stable over the environmental gradient, ranging from 0.9073 (E2) to 2.5426 (E4); and residual and phenotypic variances increased over the environmental gradient (Table 3). Heritability estimates were not stable over the environmental gradient, ranging from 0.22 (E1) to 0.39 (E3); and mean accuracies were higher than 0.90 in all environments (Table 3).
For PP, genetic, residual, and phenotypic variances were not stable over the environmental gradient (Table 3). Genetic variance ranged from 3.6123 (E1) to 4.6763 (E2); and residual and phenotypic variances reached a peak in E2 (Table 3). Heritability estimates were not stable over the environmental gradient, ranging from 0.45 (E2) to 0.63 (E3); and mean accuracies were higher than 0.95 in all environments (Table 3).

Random regression and model selection
The covariance functions developed by Kirkpatrick and Heckman (1989) that uses orthogonal base functions, such as Legendre polynomials, allows the fit of virtually any shape of growth curves or reaction norms (Calus et al. 2004). Reaction norms model the trajectory of genetic values along the environmental gradient, i.e., as a deviation from other fixed and random effects included in the model (Resende et al. 2014). Kirkpatrick et al. (1990) demonstrated the equivalence between random regression and covariance functions.
Among the various criteria for selection of models, the AIC is prominent (Cavanaugh and Neath 2019). The selected model fits heterogeneous residuals (i.e., one residual variance for each environment), and the genetic effects was modeled by Legendre polynomials of degree three for DBH and PP. This implies the estimation of 14 parameters of covariance. Heterogeneous residuals were also reported by Resende et al. (2017) evaluating clonal trials of eucalyptus.
The random regression model can be considered a reduced and simplified multiple-trait model, which allows the same parameters of interest (heritability and genetic correlation among all pairs of environments) to be estimated, but with lower parameterization and with less computational effort (Resende et al. 2001). These models directly define covariance functions, and since there are reliable estimates of variance components, they allow the prediction of genetic values of a genotype in different environments, based on evaluation in only one environment (Alves et al. 2020).

Variance components and genetic parameters
The estimation of variance components and prediction of genetic values are essential procedures in any breeding program. Currently, REML/BLUP is the standard procedure for estimation of variance components and optimal selection in plant breeding (Resende 2016). Knowledge of genetic parameters is of great importance in plant breeding, since the breeding strategy to be used depends on the information obtained from these parameters (Resende 2002).
One of the most relevant parameters for evaluation of the effectiveness of the inference about the predicted genetic value of a genotype is selective accuracy (Resende and Duarte 2007). This parameter indicates the correct arrangement of the genotypes for selection and recommendation purposes. This parameter does not only depend on the magnitude of the residual variation and the number of replications, but also on the ratio between the genetic and residual variations associated with the traits under evaluation (Resende and Duarte 2007). In this study, very high accuracies (r ĝg ≥ 0.90) were estimated for DBH and PP in all environments, indicating high reliability and a favorable scenario for recommendation of superior clones since high accuracy allows correct ranking of the genotypes.

Reaction norms
The presence of G × E interactions is very clear since the reaction norms are non-constant, the genotypes show genotypic plasticity; and when the reaction norms intersect, a complex G × E interaction occurs (van Eeuwijk et al. 2016). This type of G × E interactions has more severe consequences for breeders as it changes the rank of genotypes in accordance with environmental conditions, i.e., it indicates that the superior genotype in one environment will not normally perform as well in another environment (Resende 2015).
Genotypic plasticity is essential for genotype performance in changing environments . Reaction norms shows that the evaluated clones present various forms of genotypic plasticity. In this context, genotypic plasticity can be considered as favorable or unfavorable changes for genotype adaptedness (van Eeuwijk et al. 2016). Resende et al. (2018) investigated the environmental uniformity, site quality and tree competition interact to determine stand productivity of clonal eucalyptus and showed the importance of adopting environmental gradient-based approaches in tree genetic testing and clone recommendation as a way to more accurately match genotypes to specific sites. Marchal et al. (2019) investigated the role of genotypic plasticity on construction of hybrid larch (Larix decidua × Larix kaempferi) heterosis and on expression of its quantitative genetic parameters. They used random regression models fitted through Legendre polynomials to model reaction norms of ring width and wood density with respect to water availability and concluded that hybrid larch appeared to be the most plastic taxon as its superiority over its parental species increased with increasing water availability.

Eigenfunctions
The estimation of covariances between the random regression coefficients produces estimates of covariance functions (Kirkpatrick et al. 1990), which refer to a continuous description of the covariance structure of the trait along the environmental gradient. The analyses of the eigenfunctions are given by the total variance decomposition considering the principal components analyses (Arnal et al. 2019). This approach is similar to genetic correlations among the environments (Van der Werf et al. 1998).
According to Kirkpatrick et al. (1990), the first eigenfunction clustered general adaptability genes that was equally expressed in all environments. This can be interpreted as the genetic correlation that exists among the environments. The second eigenfunction clustered specific adaptability genes that expressed themselves depending on environmental differences. This can be interpreted as a lack of genetic correlation among the environments. The third and fourth eigenfunctions showed small eigenvalues and represent deformations for which there is little (or no) genetic variation.

Area under the reaction norms
In plant breeding, G × E interactions can reduce heritability and genetic gain with the selection. Li et al. (2017) comment that breeders have been adopting two selection strategies in the presence of significant G × E interactions: selecting stable genotypes that are not sensitive to environmental changes or selecting genotypes for specific environments in order to maximize genetic gain with the selection at that environment. In context of the present work, both selection strategies can be applied.
The genotype ranking was performed based on the areas under the reaction norms. The advantage of this strategy is that selection response can be predicted not only in genotypic expression in any environment but also in quantifying the environmental sensitivity of the trait through the reaction norm (robustness or responsiveness to changes in the environment), and it can be used for any number of environments. It is important to highlight that in this study, random regression models were used to fit realistic reaction norms, allowing investigation of changes in genetic covariances along the environmental gradient, as suggested recently by several authors (Marcatti et al. 2017;Li et al. 2017;Marchal et al. 2019).
Spatial genotypic plasticity is generally what interests' breeders most because of the operational implications for deployment of varieties (Marchal et al. 2019). For DBH, clones with larger area under the reaction norms are desirable as they relate directly to the volume of the tree (Alves et al. 2018). However, for PP, which is related to the basic density of wood, the direction of the selection depends of the purpose of raw material (pulp, bio-oil, firewood, charcoal, among others) (Fonseca et al. 2010).