Multiple-trait model by Bayesian inference applied to environment efficient Coffea arabica with low-nitrogen nutrient

ABSTRACT Identifying Coffea arabica cultivars that are more efficient in the use of nitrogen is an important strategy and a necessity in the context of environmental and economic impacts attributed to excessive nitrogen fertilization. Although Coffea arabica breeding data have a multi-trait structure, they are often analyzed under a single trait structure. Thus, the objectives of this study were to use a Bayesian multitrait model, to estimate heritability in the broad sense, and to select arabica coffee cultivars with better genetic potential (desirable agronomic traits) in nitrogen-restricted cultivation. The experiment was carried out in a greenhouse with 20 arabica coffee cultivars grown in a nutrient solution with low-nitrogen content (1.5 mM). The experimental design used was in randomized blocks with three replications. Six agromorphological traits of the arabica coffee breeding program and five nutritional efficiency indices were used. The Markov Chain Monte Carlo algorithm was used to estimate genetic parameters and genetic values. The agromorphological traits were considered highly heritable, with a credibility interval (95% probability): H2 = 0.9538 – 5.89E-01. The Bayesian multitrait model presents an adequate strategy for the genetic improvement of arabica coffee grown in low-nitrogen concentrations. Coffee arabica cultivars Icatu Precoce 3282, Icatu Vermelho IAC 4045, Acaiá Cerrado MG 1474, Tupi IAC 1669-33, Catucaí 785/15, Caturra Vermelho and Obatã IAC 1669/20 demonstrated greater potential for cultivation in low-nitrogen concentration.


INTRODUCTION
Brazil is the world's largest producer and exporter of arabica coffee (Coffea arabica L.), responsible for the production of 47.7 million bags of processed coffee (Conab 2022). Coffee growing regions, in general, have naturally acidic soils with low fertility. Therefore, soil pH correction and the use of large amounts of chemical fertilizers are necessary to ensure the maximum productive potential of coffee trees. Among the important macronutrients for plant growth and development, nitrogen is the most required by coffee plants, as it performs fundamental biochemical functions, such as amino acid and protein synthesis, impacting photosynthesis, formation of flower buds, in addition to the effects on the chemical composition of the fruit (Clemente et al. 2015). Brazil is the fourth largest consumer of nitrogen fertilizers in the world, which makes it dependent on the import of these inputs (GloboFert 2022). Faced with the current fertilizer supply crisis, research focusing on mineral nutrition and the identification of more efficient genetic materials in the use of nutrients have been identified as an alternative to reducing the use of these inputs. https://doi.org/10.1590/1678-4499.20220157

SOIL AND PLANT NUTRITION Article
The selection of cultivars adapted to soils with low fertility is a challenge for genetic improvement programs. The evaluation of several morphological and physiological characters is necessary, which makes the evaluation and selection process difficult since superior cultivars combine ideal attributes for several characteristics simultaneously. Thus, statistical methodologies must be used to evaluate information with a multitrait structure, which correctly represents the genetic and phenotypic variation in the data (Malosetti et al. 2008).
Bayesian multitrait models are suitable for genetic evaluations in plants (Junqueira et al. 2016, Volpato et al. 2019, Silva Junior et al. 2022). In addition, these models allow the estimation of variance components and genetic values for each trait (Peixoto et al. 2021, Silva Junior et al. 2022, jointly modeling multiple traits. Mora and Serra (2014), Junqueira et al. (2016), Torres et al. (2018), Volpato et al. (2019) andSilva Junior et al. (2022) demonstrated the potential of the Bayesian approach for genetic evaluation in plant breeding, considering multi-environment and multi-trait models. However, there is still a lack of information from multi-trait models under a Bayesian approach for the cultivation of arabica coffee in environments with low fertility.
The objectives of this study were to estimate genetic parameters of arabica coffee grown under low-nitrogen conditions using a Bayesian multi-trait model and to select arabica coffee cultivars with better genetic potential.

Field experiments
The experiment was carried out in the state of Minas Gerais, Brazil, by the Agricultural Research Company of Minas Gerais/Southeast (EPAMIG/Southeast), in a greenhouse located at the experimental field Diogo Alves de Melo, in the Universidade Federal de Viçosa (20° 45' S, 42° 52' W, 648 m).
Twenty Coffea arabica cultivars (Table 1) were evaluated for different agromorphological traits in an aerated static nutrient solution containing low nitrogen (1 mmol.L -1 ). The experiment was carried out in a randomized block design with three replicates. The plots consisted of two plants grown in pots with nutrient solution. The cultivars were sown in a sand bed sterilized with hydrochloric acid (HCl) (0.1 mol.dm -3 ). At 120 days, two seedlings at the cotyledonary leaf stage were transplanted into pots containing 8 L of nutrient solution (Hoagland and Arnon 1950). The nutrient solution was completed weekly with deionized water, and the pH was adjusted between 5.5 and 6.5 using HCl (0.1 mol.dm -3 ) and sodium hydroxide (NaOH) (0.1 mol.dm -3 ). The electrical conductivity (EC) was monitored by the change of the nutrient solution when its depletion reached 30% of the initial EC.
At 168 days after transplanting, the following morpho-agronomic traits were evaluated: stem diameter (SD, mm), measured with a caliper at 5 cm from the stem base; plant height (PH, cm), measured from the base of the orthotropic branch to the plant apex; internode length (IL, cm), calculated as a ration between plant height and node number; number of leaf pairs (NLP), obtained by counting the whole plant; number of nodes (NNO), obtained by counting the nodes in the main branch (orthotropic); leaf area (LA, dm 2 ), quantified after harvesting using the leaf area meter model AT Delta-TDevices.
The plants were sectioned into roots, stem, and leaves, dried in an oven with forced air circulation at 70 °C for 72 h and weighted to get: root dry matter (MSR), stem dry matter (MSC), and leaf dry matter (MSF). Shoot mass (MSPA) consisted of the sum of MSF and MSC, while the total dry mass (MST) was obtained by the sum of MSPA and MSR. The dried plant material was ground in a "Willey" mill to determine the nitrogen content according to the protocol of Empresa Brasileira de Pesquisa Agropecuária (Embrapa 2009). Nitrogen content was estimated as the product between nutrient content in different parts of the plants and the total dry mass.

Biometric analysis
The data was analyzed using the univariate and multi-trait models through Markov Chain Monte Carlo (MCMC) Bayesian approach. The multi-trait model was given by Eq. 1: in which: y = the vector of phenotypic data. The conditional distribution was given by Eq. 2: G = the matrix of genotypic covariance; R = the matrix of residual covariance; I = an identity matrix; β = vector of systematic effects (genotypes mean and replication effects), assumed as β ~ N (β, Σβ⊗I); g = the vector of genotype effects, assumed as g|G, ~ N (0, G⊗I); e = the vector of residuals, assumed as e |R, ~ N (0, R⊗I).
The uppercase bold letters X and Z refer to the incidence matrices for the effects β and g, respectively. The R package MCMCglmm (Hadfield 2010) was used to fit the model.
We assumed that G and R follow an inverted Wishart distribution WI (v, V), with hyperparameters v and V (Sorensen and Gianola 2002). Hyperparameters for all prior distributions have been selected to provide non-informative or flat prior distributions. For the systematic effect (β), a uniform distribution was assigned. In addition, the parameters β, g, G, and R were estimated following the set posterior distribution: P(β, g, G, R |y) α P(y | β, g, G, R ) × P(β, g, G, R).
In total, 1,800,000 samples were generated. A burn-in of 10,000 and thin of 10 iterations were assumed, resulting in a total of 1,790,000 samples. The convergence of the MCMC was verified by the criterion of Geweke (1992), using the R packages boa (Smith 2007) and convergence diagnosis and output analysis (CODA) (Plummer et al. 2006).
The model was compared using the deviation information criterion (DIC) proposed by Spiegelhalter et al. (2002) in which: D(θ) = a point estimate of the deviance obtained by replacing the parameters with their posterior means estimates in the likelihood function; p D = the effective number of model parameters.
Models with a lower DIC should be preferred over models with a higher DIC. Variance components, broad-sense heritability (H 2 ), genotypic correlation coefficients between traits, and breeding values were calculated from the posterior distribution. Intervals of higher posterior density (HPD) were estimated for all traits using the R package boa (Smith 2007). A posteriori estimates of H 2 for each trait and each iteration were calculated from the later samples of variance components, using Eq. 4: in which:

Selection based on selection index
The multi-trait index based on factor analysis and genotype-ideotype distance (FAI-BLUP) (Rocha et al. 2018) was used to identify superior coffee genotypes under low-nitrogen. The formula is as Eq. 5: in which: P ij = probability of the i th genotype (i = 1, 2, ..., n) to be similar to the j th ideotype (j = 1, 2, ..., m); d ij = genotypeideotype distance from i th genotype to j th ideotype, based on standardized mean distance. Selection gains (SG) were estimated from the FIA-BLUP considering three different selection intensities: 35, 50, and 60%, which refers to the selection of seven, ten, and 12 genotypes, respectively, as follows (Eq. 6): in which: X s = the mean of the selected genotypes; X 0 = the overall population mean.

RESULTS AND DISCUSSION
Geweke's criterion indicated convergence for all dispersion parameters generating 1,800,000 MCMC iterations, 10,000 samples for burn-in, and a sampling interval of 10, totaling 1,790,000 effective samples used to estimate the variance components (Fig. 1). All chains [components of (co)variance] reached convergence by this criterion. Similar posterior averages were obtained for the variance components, suggesting normal-appearing density. The DIC suggests that the full model for multi-trait is the one that best fits the data, which reveals the significance of genotypic effects (DIC = 1123.39 for the full model and 1,414.31 for the restricted one). This is justified by the lowest DIC value of the full model. Spiegelhalter et al. (2002) suggest that the use of the complete model can lead to a greater prediction in the estimation of parameters. Subsequent mean estimates for the variance components suggested χ 2 density and normal distributions (Fig. 1). Thus, it is possible to observe that all the characteristics presented a χ 2 distribution (of which the Wishart distribution is a generalization).  The H 2 estimates were different for the mean and posterior density range (HPD) ( Table 2). The highest H 2 values were observed for the PH and IL traits (greater than 80%). On the other hand, the lowest estimates were for the EA and LA traits. The low heritability observed estimate depends on the number of genotypes evaluated. The Bayesian approach used can be recommended for situations involving small sample sizes space (Torres et al. 2018, Silva Junior et al. 2022. The PH, NLP, NNO, IL, and EE traits were considered highly heritable, with a credibility interval (95% probability) ranging from 0.6800 to 0.9538; 0.5890 to 0.9326; 0.5920 to 0.9313; 0.6490 to 0.9494; and 0.6010 to 0.9224, respectively (Table 2). H 2 estimates greater than 70% for these same traits were also found in coffee in a potassium-restricted crop, using analysis of variance (ANOVA) (Moura et al. 2016). The multi-trait Bayesian model has been successfully used in several crops, such as flood-irrigated rice, where H 2 estimates were higher than 80% (Silva Junior et al. 2022), and in maize lines, where the heritability for nitrogen use efficiency was 50%, considered highly heritable (Torres et al. 2018). These authors reported that the Bayesian multi-trait model makes estimates more accurate than in individual models due to taking into account the correlation between the traits. Mora et al. (2019) evaluated E. globulus clones and found moderate to high heritability values for tree heights ranging from 12 to 41% (mode value of the posterior distribution of heritability).
In addition to the statistical model, the heritability of a trait is crucial to improving the prediction (Lorenz et al. 2011, Gill et al. 2021. Low heritability estimates result in lower accuracy in predicting individual trait (Heffner et al. 2009). The application of multi-trait models, in turn, can improve the prediction of poorly heritable characters using information from correlated characters that have high heritability (Jia and Jannink 2012, Jiang et al. 2015, Lado et al. 2018, Gill et al. 2021Bhatta et al. 2020). Jia and Jannink (2012) also indicated that a multi-trait model is more effective when the genetic correlation between traits is moderate. Guo et al. (2020) reported that characters with lower heritability performed better than those with high heritability through the multi-trait model, as it contemplates the interaction between traits × genotypes, and provides a better estimate of the correlation between characters. Schulthess et al. (2018) and Montesinos-López et al. (2018) showed that the performance of multi-trait analysis depends considerably on the number of missing characters in only some individuals or all individuals. Precise estimates of genetic parameters bring new perspectives on the application of Bayesian methods to solve modeling problems in the genetic improvement of arabica coffee for cultivation under low-nitrogen concentration. This is justified by the parameters that lie within the posterior density range (HPD 95%).
The estimates of genetic variance and repetition for each iteration were discrepant for the mean, median, and mode in each character (Table 3). The lowest values were observed for traits of efficiency indexes, NNO and NLP. On the other hand, PH and LA presented the highest estimates. Similar results were reported by Moura et al. (2016) for arabica coffee under potassium limiting conditions, using ANOVA. The multi-trait model used in the present work showed great performance in estimating genetic and residual variances since the estimates are within the posterior density range (HPD 95%). This model presents credibility intervals that are more accurate when compared to the confidence intervals obtained in frequentist inference (Gazola et al. 2016). Table 3. Genetic and residual variance of 11 traits of arabica coffee cultivars in low-nitrogen cultivation, using multi-trait models. The selection gains obtained by the FAI-BLUP index considering three different selection intensities: 35, 50, and 60%, which refers to the selection of seven, 10, and 12 genotypes, for 11 traits of arabica coffee cultivars in an efficient low-nitrogen environment, is represented in Table 4. The FAI-BLUP index indicated discrepant selection gains between different selection intensities for the same character (Table 4). Selection gains increased with increasing selection intensity. The highest GS was estimated for PH and IL regardless of selection intensity. On the other hand, the N absorption efficiency showed gains for all selection intensities. Table 4. Percentage of selection gains, factor number, and commonalities obtained by the FIA-BLUP index considering three different selection intensities: 35, 50, and 60%, which refers to the selection of seven, 10, and 12 genotypes, for eleven traits of arabica coffee cultivars in an efficient low-nitrogen environment.