Acessibilidade / Reportar erro

Combined index of genomic prediction methods applied to productivity

Índice combinado de métodos de predição genômica aplicado em características de produtividade em arroz

ABSTRACT:

Rice cultivation has great national and global importance, being one of the most produced and consumed cereals in the world and the primary food for more than half of the world’s population. Because of its importance as food, developing efficient methods to select and predict genetically superior individuals in reference to plant traits is of extreme importance for breeding programs. The objective of this research was to evaluate and compare the efficiency of the Delta-p, G-BLUP (Genomic Best Linear Unbiased Predictor), BayesCpi, BLASSO (Bayesian Least Absolute Shrinkage and Selection Operator), Delta-p/G-BLUP index, Delta-p/BayesCpi index, and Delta-p/BLASSO index in the estimation of genomic values and the effects of single nucleotide polymorphisms on phenotypic data associated with rice traits. Use of molecular markers allowed high selective efficiency and increased genetic gain per unit time. The Delta-p method uses the concept of change in allelic frequency caused by selection and the theoretical concept of genetic gain. The Index is based on the principle of combined selection, using the information regarding the additive genomic values predicted via G-BLUP, BayesCpi, BLASSO, or Delta-p. These methods were applied and compared for genomic prediction using nine rice traits: flag leaf length, flag leaf width, panicles number per plant, primary panicle branch number, seed length, seed width, amylose content, protein content, and blast resistance. Delta-p/G-BLUP index had higher predictive abilities for the traits studied, except for amylose content trait in which the method with the highest predictive ability was BayesCpi, being approximately 3% greater than that of the Delta-p/G-BLUP index.

Key words:
genomic prediction; selection index; genetic gain

RESUMO:

A cultura do arroz tem grande importância nacional e mundial por ser um dos cereais mais produzidos e consumidos no mundo, caracterizando-se como o principal alimento de mais da metade da população mundial. Em função de sua importância alimentar, desenvolver métodos eficientes que visam a predição e a seleção de indivíduos geneticamente superiores, quanto a características da planta, é de extrema importância para os programas de melhoramento. Diante disso, o objetivo deste trabalho foi avaliar e comparar a eficiência do método Delta-p, G-BLUP, BayesCpi, BLASSO e o índice Delta-p/G-BLUP, índice Delta-p/BayesCpi e índice Delta-p/BLASSO, na estimação de valores genômicos e dos efeitos de marcadores SNPs (Single Nucleotide Polymorphisms) em dados fenotípicos associados a características de arroz. A utilização de marcadores moleculares permite alta eficiência seletiva e o aumento do ganho genético por unidade de tempo. O método Delta-p utiliza o conceito de mudança na frequência alélica devido à seleção e o conceito teórico de ganho genético. O Índice é baseado no princípio da seleção combinada, utiliza conjuntamente as informações dos valores genômicos aditivos preditos via G-BLUP, BayesCpi ou BLASSO e via Delta-p. Estes métodos foram aplicados e comparados quanto à predição genômica utilizando nove características de arroz (Oryza sativa), sendo elas: comprimento da folha bandeira, largura da folha bandeira; número de panículas por planta; número de ramos da panícula primária; comprimento de semente; largura de semente; teor de amilose; teor de proteína; resistência a bruzone. O índice Delta-p/G-BLUP obteve maiores capacidades preditivas para as características estudadas, exceto para a característica Conteúdo de amilose, em que o método que obteve maior capacidade preditiva foi o BayesCpi, sendo aproximadamente 3% superior ao índice Delta-p/G-BLUP.

Palavras-chave:
predição genômica; índice de seleção e ganho genético

INTRODUCTION:

Rice (Oryza sativa) is one of the most important crops in the world. Increased rice production has played key roles in food security, especially in developing countries in Asia and Africa (CHEN, 2017CHEN, S. et al. Genome-wide study of an elite rice pedigree reveals a complex history of genetic architecture for breeding improvement. Scientific reports, v.7, p.45685, 2017.Available from: <Available from: https://www.nature.com/articles/srep45685 >. Accessed: Aug. 23, 2018. doi: 10.1038/srep45685.
https://www.nature.com/articles/srep4568...
). Currently, the production of this crop is approximately 12,327.8 thousand tons (CONAB, 2018CONAB. COMPANHIA NACIONAL DE ABASTECIMENTO. Acompanhamento de safra brasileiro - grãos: Nono levantamento, junho 2018 - safra 2017/2018. Brasília: Companhia Nacional de Abastecimento.2018 Available from: <Available from: http://www.conab.gov.br >. Accessed: Nov. 12, 2018.
http://www.conab.gov.br...
). Despite supplying world’s current population, it is estimated that by 2050, rice production in the world must increase from 60 to 110% to meet population demand (GODFRAY et al., 2010GODFRAY, H. C. J. et al. Food Security: The Challenge of Feeding 9 Billion People. Science, 327(5967), p.812-818, 2010. Available from: <Available from: http://science.sciencemag.org/content/327/5967/812 >. Accessed: Nov. 9, 2018. doi: 10.1126/science.1185383.
http://science.sciencemag.org/content/32...
; TILMAN et al., 2011TILMAN, D. et al. Global Food Demand and the Sustainable Intensification of Agriculture. Proceedings of the National Academy of Sciences, 108(50):20260-20264, 2011. Available from: <Available from: https://doi.org/10.1073/pnas.1116437108 >. Accessed: Nov. 9, 2018. doi: 10.1073/pnas.1116437108.
https://doi.org/10.1073/pnas.1116437108...
RAY et al., 2013RAY, D. K. et al. Yield Trends Are Insufficient to Double Global Crop Production by 2050. PloS One, 8(6): e66428, 2013. Available from: <Available from: https://doi.org /10.1371/journal.pone.0066428 >. Accessed: Nov. 9, 2018. doi: 10.1371/journal.pone.0066428.
https://doi.org /10.1371/journal.pone.00...
). Thus, there is a need for development of new lines considering improvements in yield over existing varieties. According to SPINDEL et al. (2015SPINDEL, J. et al. Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS genetics, v.11, n.2, p.e1004982, 2015. Available from: <Available from: http://journals.plos.org/plosgenetics/ article?id=10.1371/journal.pgen.1004982 >. Accessed: Sep. 10, 2018. doi: 10.1371/journal.
http://journals.plos.org/plosgenetics/ a...
), because the process is extremely time-consuming, using conventional breeding and selection methods, it takes ten years on average for elite varieties to be developed and identified.

The continued evolution of sequencing and genotyping technologies has led to a breakthrough in molecular genetics. Such advances have promoted the direct use of the information from the DNA in the identification of genetically superior individuals, thereby shortening the selection cycle to the benefit of plant breeding programs. For this purpose, MEUWISSEN et al. (2001MEUWISSEN, T.H.E. et al. Prediction of total genetic value using genome-wide dense marker maps. Genetics, v.157, p.1819-29, 2001. Available from: <Available from: https://doi.org/10.1534/genetics.116.189795 >. Accessed: Nov. 10, 2018.
https://doi.org/10.1534/genetics.116.189...
) devised a Genome Wide Selection (GWS), which consists of the analysis of a large number of single nucleotide polymorphisms (SNPs) distributed in the genome, capturing the genes affecting the quantitative trait of interest. According to MEUWISSEN et al. (2001)MEUWISSEN, T.H.E. et al. Prediction of total genetic value using genome-wide dense marker maps. Genetics, v.157, p.1819-29, 2001. Available from: <Available from: https://doi.org/10.1534/genetics.116.189795 >. Accessed: Nov. 10, 2018.
https://doi.org/10.1534/genetics.116.189...
, it can be assumed that some of these molecular markers are in linkage disequilibrium (LD) with quantitative trait loci (QTL), allowing its direct use in the prediction of genomic breeding values (GEBVs) of the individuals subject to selection.

Several methods, such as Bayesian methods, Bayesian Least Absolute Shrinkage and Selection Operator (BLASSO), BayesCpi, and the mixed-model method, Genomic Best Linear Unbiased Predictor (G-BLUP), have been extensively applied to GWS and are recommended for genomic prediction (DE LOS CAMPOS et al., 2012DE LOS CAMPOS, G. et al. Whole genome regression and prediction methods applied to plant and animal breeding. Genetics, v.193, p. 327-45, 2012. Available from: <Available from: https://doi.org/10.1534/genetics.112.143313 >. Accessed: Nov. 10, 2018.
https://doi.org/10.1534/genetics.112.143...
, AZEVEDO et al., 2015AZEVEDO, C.F. et al. Ridge, LASSO and Bayesian Additive-dominance genomic models. BMC Genetics, v.16, p.105, 2015. Available from: <Available from: https://bmcgenet.biomedcentral.com/ articles/10.1186/s12863-015-0264-2 >. Accessed: Sep. 11, 2018. doi: 10.1186/s12863-015-0264-2.
https://bmcgenet.biomedcentral.com/ arti...
). However, new methodologies have been proposed, such as the method called Delta-p (RESENDE, 2015RESENDE, M.D.V. Genética Quantitativa e de Populações. Visconde do Rio Branco: Suprema , 2015. 463p.; LIMA et al., 2019LIMA, L.P. et al. New insights into genomic selection through population-based non-parametric prediction methods. Sci. Agric. v.76, n.4, p.290-298, July/August, 2019. doi: http://dx.doi.org/10.1590/1678-992X-2017-0351.
https://doi.org/http://dx.doi.org/10.159...
), which does not demand an iterative computational method and consequently, does not require evaluation regarding the convergence of results. Such methodology divides the estimation population into two subpopulations, one associated with higher phenotypic values and the other associated with lower phenotypic values. Effects of the markers were estimated non-parametrically using the difference between the allelic frequencies and the genetic gain associated with these two subpopulations. With the goal of combining good properties of different methodologies, LIMA et al. (2019)LIMA, L.P. et al. New insights into genomic selection through population-based non-parametric prediction methods. Sci. Agric. v.76, n.4, p.290-298, July/August, 2019. doi: http://dx.doi.org/10.1590/1678-992X-2017-0351.
https://doi.org/http://dx.doi.org/10.159...
proposed the use of a genomic index, called the Delta-p/G-BLUP index, which combines estimated genomic values obtained by Delta-p and G-BLUP. The Delta-p/G-BLUP index was more accurate than G-BLUP in genomic prediction. However, the genomic index can combine predictions from several statistical methodologies, as indicated by the literature, such as BLASSO and BayesCpi. Use of other methodologies to compose the index can be interesting because it allows the use of specific properties of each method of genomic selection in terms of architecture of the evaluated traits. In addition, the Bayesian approach has been used successfully in other areas (MACEDO et al., 2014MACEDO, L.R. et al. Modelagem hierárquica Bayesiana na avaliação de curvas de crescimento de suínos genotipados para o gene halotano. Ciência Rural, v.44, n.10, p.1853-1859, 2014. Accessed: Nov. 08, 2018. doi: 10.1590/0103-8478cr20131278.
https://doi.org/10.1590/0103-8478cr20131...
, GARNERO et al., 2014GARNERO, A. V. et al. Inferência bayesiana aplicada à estimação de herdabilidades dos parâmetros da curva de crescimento de fêmeas da raça Nelore. Ciência Rural, v.43, n.4, p.702-708, 2014. Accessed: Sep. 20, 2018. doi: 10.1590/S0103-84782013005000029.
https://doi.org/10.1590/S0103-8478201300...
).

Consequently, the goals of the present study were to evaluate the Delta-p/BLASSO and Delta-p/BayesCpi genomic indexes and compare them to the Delta-p/G-BLUP index in terms of prediction efficiency of additive genomic values of the individuals in reference to photosynthetic yield traits, grain quality, yield, and blast resistance in rice.

MATERIALS AND METHODS:

Description of the database

Database used in this study was composed of nine traits referring to 352 rice accessions (Oryza sativa), which were genotyped for 44,100 SNPs markers. The dataset is publicly available, part of two projects, the OryzaSNP Project and the OMAP Project (AMMIRAJU et al., 2006AMMIRAJU, J.S.S. et al. The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Research, v.16, n.1, p.140-147, 2006. Available from: <Available from: http://www.genome.org/cgi/doi/10.1101/gr.3766306 >. Accessed: Nov. 9, 2018. doi: 10.1101/gr.3766306.
http://www.genome.org/cgi/doi/10.1101/gr...
), and available at https://ricediversity.org/data/.

Plantations were supervised throughout the access phase, from May to October of 2006 and 2007. A complete block design with two replications was used, in which the planting lines had a length equal to 5 m. Plants were spaced 25 cm apart and there was 0.50 m between rows. Further details can be reported in ZHAO et al. (2011ZHAO, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature Communication, v.2, p.467, 2011. Available from: < Available from: https://www.nature.com/naturecommunications >. Accessed: Nov. 9, 2018. doi: 10.1038/ncomms1467.
https://www.nature.com/naturecommunicati...
). Quality control procedures were made considering a call rate of 70% and minor allele frequency (MAF) less than 1%. After the quality control of the genomic database, the total was 36,901 SNPs markers.

The nine traits analyzed were considered to have relevant effects on the improvement of rice. They were flag leaf length (FLL), flag leaf width (FLW), amylose content (AC), protein content (PC), panicles number per plant (PNPP), seed length (SL), seed width (SW), primary panicle branch number (PPBN), and blast resistance (BR). The first two traits (FLL and FLW) are associated with the photosynthetic yield of the plant and traits AC and PC are associated with grain quality. Traits PNPP, SL, SW, and PPBN are associated with the production of the plant, whereas BR is associated with the main rice disease.

Delta-p method

The Delta-p method proposed by RESENDE (2015RESENDE, M.D.V. Genética Quantitativa e de Populações. Visconde do Rio Branco: Suprema , 2015. 463p.) and LIMA et al. (2019LIMA, L.P. et al. New insights into genomic selection through population-based non-parametric prediction methods. Sci. Agric. v.76, n.4, p.290-298, July/August, 2019. doi: http://dx.doi.org/10.1590/1678-992X-2017-0351.
https://doi.org/http://dx.doi.org/10.159...
) is based on the concept of changes in allele frequency because of selection and the genetic gain theory (the contrast between the averages of two subpopulations). The method consists of the following steps:

i) The training population is divided into two subpopulations (subpopulation one and subpopulation two), according to the phenotype of the character corrected for systematic effects;

ii) Calculation of difference between allelic frequencies of the subpopulations, Δpi=pi1-pi2 , where p i1 is the allele frequency of allele A of the ith marker ( i = 1, 2,..., n being the total number of markers) in subpopulation one and p i2 is allele frequency of allele A of the ith marker in subpopulation two;

iii) Calculation of the average difference between the allelic frequencies of the subpopulations, pm=i=1npin;

iv) Calculation of the average allelic substitution effect αm=0.5ha2u1-u22npm, where is the heritability in the restricted sense of the trait, u 1 and u 2 are the averages of phenotypic values in subpopulations one and two, respectively;

v) Calculation of the allelic substitution effect of the ith marker, α̂m=pipmα̂m .

vi) Calculation of the additive genomic value of the jth individual (j = 1, 2,..., N with N being the total number of individuals in the validation population), âj=i=1nwjiα̂i , where w ji are the elements of the jth line of the centered incidence matrix of marker W v of the validation population. Incidence matrix for the vectors of additive effects of markers (α) is parameterized according to VAN RADEN (2008VAN RADEN, P. M. Efficient Methods to compute genomic predictions. Journal of Dairy Science, Champaign, v.91, n.11, p.4414-4423, 2008. Available from: <Available from: https://doi.org/10.3168/jds.2007-0980 >. Accessed: Nov. 7, 2018. doi: 10.3168/jds.2007-0980.
https://doi.org/10.3168/jds.2007-0980...
), VITEZICA et al. (2013VITEZICA, Z. G. et al. On the additive and dominance variance and covariance of individuals within the genomic selection scope. Genetics, Austin, v.195, n.4, p.1223-1230, 2013. Available from: <Available from: https://doi.org/10.1534/genetics.113.155176 >. Accessed: Nov. 7, 2018. doi: 10.1534/genetics.113.155176.
https://doi.org/10.1534/genetics.113.155...
), and RESENDE et al. (2014RESENDE, M.D.V. et al. Estatística Matemática, Biométrica e Computacional: Modelos Mistos, Multivariados, Categóricos e Generalizados (REML/BLUP), Inferência Bayesiana, Regressão Aleatória, Seleção Genômica, QTL-GWAS, Estatística Espacial e Temporal, Competição, Sobrevivência. 1. ed. Visconde do Rio Branco: Suprema, 1v, 881p, 2014.) and is presented below:

W = Se , MM , então 2 - 2 p Se Mm , então 1 - 2 p Se mm , então 0 - 2 p

G-BLUP method

Genomic Best Linear Unbiased Predictor (G-BLUP) method is based on the following linear mixed model:

y=1μ+Za+e, where y is a vector of phenotypes (N x 1, N being the number of genotyped and phenotyped individuals); µ is the general mean and 1 the vector with dimension (N x 1); a is the vector of additive genomic values (N x 1) with incidence matrix Z (N x N) , whose assumed distribution is a~No,G2σa2, where is the genetic additive variance and G a (N x N) is the additive genomic relationship matrix; and e is the random residual vector, assumed to be e~No,Iσe2,where σe2 is the residual variance and an identity matrix.

Equations of mixed models for prediction via the G-BLUP method are equivalent to:

X ' X X ' Z Z ' X Z'Z + G a - 1 σ e 2 σ a 2 b ̂ a ̂ = X'y Z'y

with the components of variance, σa2 and σe2 , estimated by the restricted maximum likelihood (REML) method. According to VITEZICA et al. (2013VITEZICA, Z. G. et al. On the additive and dominance variance and covariance of individuals within the genomic selection scope. Genetics, Austin, v.195, n.4, p.1223-1230, 2013. Available from: <Available from: https://doi.org/10.1534/genetics.113.155176 >. Accessed: Nov. 7, 2018. doi: 10.1534/genetics.113.155176.
https://doi.org/10.1534/genetics.113.155...
), the genomic relationship matrix for additive effect, G a , is given by:

G a = WW' i = 1 n 2 p i q i

where p i and q i are the allelic frequencies of the ith marker, W is the incidence matrix for the markers in the training population.

BLASSO Method

The Bayesian version of the LASSO regression (BLASSO) for genomic selection was proposed by DE LOS CAMPOS et al. (2009DE LOS CAMPOS, G. et al. Predicting quantitative traits with regression Models for Dense Molecular Markers and Pedigree. Genetics, v.182, n1, p.375-385, 2009. Available from: <Available from: https://doi.org/10.1534/genetics.109.101501 >. Accessed: Nov. 7, 2018. doi: 10.1534/genetics.109.101501.
https://doi.org/10.1534/genetics.109.101...
). BLASSO (Bayesian Least Absolute shrinkage and Selection Operator) includes a common variance term for the genetic and residual effects of markers. Basic linear model for predicting the effects of markers is presented below:

y = 1 μ + + e , ( 1 )

where y is the vector of phenotypes of training population, µ is the general mean, 1 is the vector with the same dimension of y whose elements are equal to 1, α is the vector of allelic substitution effects of the markers with incidence matrix W, and is the residual vector. The a priori distributions of the parameters in terms of an increased hierarchical model are presented below:

e σ 2 ~MVN 0 , I σ 2

α i λ a , σ 2 N 0 , D a σ 2

p τ a 2 λ a 2 = i λ a 2 2 e - λ a 2 τ ai 2 2

in which MNV represents the multivariate normal distribution, λa is the “sharpness” parameter that can be estimated from the data by the MCMC (Markov chain Monte Carlo) method using a non-informative priori, σ2 has an a priori distribution consisting of a scaled inverse chi-square and Da=diagτ1a2,τ2a2,,τma2 . This leads to a double exponential distribution for the effects of allelic substitution (PARK & CASELLA, 2008PARK, T.; CASELLA, G. The Bayesian LASSO. Journal of the American Statistical Association, v.103, n.482, p.681-686, 2008. Available from: <Available from: https://doi.org/10.1198/016214508000000337 >. Accessed: Nov. 7, 2018. doi: 10.1198/016214508000000337.
https://doi.org/10.1198/0162145080000003...
), as follows:

αiλa2~ExpDupla0,σλa.

The additive genetic variance of each marker is given by σαi2=τai2σ2 with i = 1, 2,..., n. In this way, the additive genetic variance can be estimated using the relationship σa2=i=1m2piqiσαi2 and thus, σa2=i=1m2piqiταi2σ2. The additive genomic values are estimated through the â=Wα̂. The full conditional distributions for the parameters of the BLASSO are presented in detail by DE LOS CAMPOS et al. (2009DE LOS CAMPOS, G. et al. Predicting quantitative traits with regression Models for Dense Molecular Markers and Pedigree. Genetics, v.182, n1, p.375-385, 2009. Available from: <Available from: https://doi.org/10.1534/genetics.109.101501 >. Accessed: Nov. 7, 2018. doi: 10.1534/genetics.109.101501.
https://doi.org/10.1534/genetics.109.101...
).

BayesCpi method

The Bayes Cpi method was proposed by HABIER et al. (2011HABIER, D. et al. Extension of the Bayesian alphabet for genomic selection. BMC bioinformatics, v.12, n.1, p.186, 2011. Available from: <Available from: https://doi.org/10.1186/1471-2105-12-186 >. Accessed: Nov. 7, 2018. doi: 10.1186/1471-2105-12-186.
https://doi.org/10.1186/1471-2105-12-186...
) to allow election of variables and Bayesian learning with data. The a priori distributions assumed for the parameters in the model (1) considering this method are:

α i π , σ ma 2 ~ 1 - I ai 0 + I ai N 0 , σ α 2

σ α 2 ~ X - 2 υ a , S a 2

in which the indicator variable Ia = (Ia1 ...Ian ) follows a binomial distribution with probabilityp. Thus, the probability of mixing p will be assigned an a priori distribution beta. The additive genetic variance is given by σa2=i=1m2piqiσα2Iai.

In this study, for Bayesian methods, 300,000 iterations were used for the MCMC algorithms, of which 20,000 were discarded (burn-in) to guarantee the heating of the chain and there was a selection of one in 10 iterations (thin). The convergence analysis was performed using the criterion proposed by GEWEKE (1992GEWEKE, J. Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments. In Bayesian Statistics 4 (eds. J.M. Bernardo, J. Berger, A.P. Dawid and A.F.M. Smith), Oxford: Oxford University Press, 169-193, 1992.).

Genomic index

The genomic index is defined as (RESENDE, 2015RESENDE, M.D.V. et al. Multi generation index in the within progenies bulk method for breeding of self-pollinated plants. Crop Science, v.55, p.1202-1211, 2015. Available from: <Available from: https://dl.sciencesocieties.org/publications/cs/abstracts/55/3/1202 >. Accessed: Nov. 7, 2018. doi: 10.2135/cropsci2014.08.0580.
https://dl.sciencesocieties.org/publicat...
; LIMA et al., 2019LIMA, L.P. et al. New insights into genomic selection through population-based non-parametric prediction methods. Sci. Agric. v.76, n.4, p.290-298, July/August, 2019. doi: http://dx.doi.org/10.1590/1678-992X-2017-0351.
https://doi.org/http://dx.doi.org/10.159...
):

I=b1â1+b2â2,

in which it combines the genomic values predicted through G-BLUP, BLASSO, or BayesCpi (â 1) and through the Delta-p (â 2 ) method, weighted by coefficients b 1 and b 2, respectively. The weights, b 1 and b 2, are given respectively by:

b 1 = 1 - r a ̂ 2 y 2 2 1 - r a ̂ 1 y 2 r a ̂ 2 y 2 2 ; b 2 = 1 - r a ̂ 1 y 2 1 - r a ̂ 1 y 2 r a ̂ 2 y 2 2

in which râ1y2is the square of the correlation between the phenotype and the additive genomic values predicted via G-BLUP, BLASSO, or BayesCpi (â 1), râ2y2is the square of the correlation between the phenotype and the additive genomic values predicted through Delta-p (â 2 ) and 2=σa22σa12, that is, the ratio between the additive genetic variances.

Cross-validation and comparison between methods

The validation procedure chosen was the k-fold process with k = 4. Thus, the phenotypic dataset composed of 352 individuals was divided into 4 groups with 88 individuals each. Thus, for each replicate of the analysis, three groups were considered as training populations and used to obtain effects of SNP markers. The other group was considered a validation population and was used to predict the additive genomic values through estimations of effects of markers obtained in the estimation population. Later, the calculation of the efficiency measures was possible, as described below. The process was repeated such that at each step, one of the four groups constituted the validation population. After the end of the validation process, the arithmetic averages and standard deviations of the efficiency measures were used, such that it was possible to report the general results. The efficiency measures used are described below: (i) the molecular heritability was given by h2=σa2σy2, where σy2 is the phenotypic variance and is the additive genetic variance estimated by REML in G-BLUP and estimated through σa2=i=1m2piqiσαi2. Bayesian methods, where p i and q i were the allelic frequencies, and σαi2 was the additive genetic variance of the ith marker; (ii) the predictive ability (râ,y ) of the method consisted of the correlation between the estimated genomic value and the phenotypic value; (iii) the regression coefficient between the estimated genomic value and the phenotypic value was given by β̂â,y; (iv) the predictive ability, rI,y , consisted of correlation of the genomic value estimated by index and phenotypic value; and (v) the regression coefficient was between estimated genomic value via index and phenotypic value given by β̂I,y.

Computational resource

All the computational routines of the proposed methods were implemented in R software (R Development Core Team, 2018R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2018. Available from: <http://www.R-project.org>.
http://www.R-project.org...
). For the G-BLUP, the sommer package and the mmer function were used; for the BLASSO method and BayesCpi, the BGLR package and the BGLR function were used. Algorithms used for the development of the Delta-p and Delta-p/G-BLUP index methods were implemented by LIMA et al. (2019LIMA, L.P. et al. New insights into genomic selection through population-based non-parametric prediction methods. Sci. Agric. v.76, n.4, p.290-298, July/August, 2019. doi: http://dx.doi.org/10.1590/1678-992X-2017-0351.
https://doi.org/http://dx.doi.org/10.159...
) and are available at https://licaeufv.wordpress.com/pesquisas-research/.

RESULTS AND DISCUSSION:

Average results and the respective estimated standard deviations relative to molecular heritability, predictive ability, and regression coefficient between genomic value and phenotypic value associated with the Delta-p and G-BLUP methods, as well as predictive ability and regression coefficient between the estimated genomic value through the Delta-p/G-BLUP index and phenotypic value are shown in table 1.

Table 1
Average and standard deviation of molecular heritability (h2), predictive ability (râ,y) and regression coefficient between the genomic value and the phenotypic value (β̂â,y) from each method (Delta-p, G-BLUP), predictive ability (rI,y) and regression coefficient between the estimated genomic value via index and the phenotypic value (β̂I,y) from the method Delta-p/G-BLUP index.

The average results and respective standard deviations relative to molecular heritability, predictive ability, and regression coefficient between genomic value and phenotypic value associated with Bayesian methods (BLASSO and BayesCpi), as well as predictive ability and regression coefficient between the estimated genomic value through the index (Delta-p/BLASSO index and Delta-p/BayesCpi index) and phenotypic value are shown in table 2.

Table 2
Average and standard deviation of molecular heritability (h2 ), Predictive ability(râ.y ) and regression coefficient between the genomic value and the phenotypic value (β̂â.y ) from each method (BLASSO and BayesCpi), predictive ability (rI.y ) and regression coefficient between the estimated genomic value via the index and the phenotypic value (β̂I.y ) from each method (Delta-p/BLASSO index and Delta-p/BayesCpi index).

Predictive ability

Results showed that the Delta-p/G-BLUP, Delta-p/BayesCpi, and Delta-p/BLASSO indices for all traits presented higher predictive abilities than the G-BLUP, BayesCpi, and BLASSO methods, respectively. This can be easily seen when evaluating the relationship between the predictive abilities of the methods, which showed that the Delta-p/G-BLUP index, Delta-p/BLASSO index, and Delta-p/BayesCpi index were, on average, 9.7%, 3.6% and 3.3%, respectively, more efficient in the genomic prediction than the traditionally applied methods, G-BLUP, BLASSO and BayesCpi. It is important to point out that one substantial advantage is that these percentage points in predictive ability have no additional computational cost. Moreover, according to RESENDE et al. (2015RESENDE, M.D.V. Genética Quantitativa e de Populações. Visconde do Rio Branco: Suprema , 2015. 463p.), gains of 5% in predictive ability and accuracy are already significant in plant breeding, often equivalent to the gain that is obtained in a complete cycle of improvement genetics. Thus, under genomic selection performed in a short time, these gains are cumulative and grow rapidly. Therefore, it has been shown that the indices caused an improvement in the prediction of the GEBVs because they provided superior predictive abilities over other methods.

For all traits, the Delta-p method presented lower predictive values compared to that of the G-BLUP, BLASSO and BayesCpi methods because of the different genetic information used in each of the methods. According to LIMA et al. (2019LIMA, L.P. et al. New insights into genomic selection through population-based non-parametric prediction methods. Sci. Agric. v.76, n.4, p.290-298, July/August, 2019. doi: http://dx.doi.org/10.1590/1678-992X-2017-0351.
https://doi.org/http://dx.doi.org/10.159...
), the Delta-p method uses only linkage unbalance information, whereas the other methods, such as G-BLUP and Bayesians methods, also used the relationship information between individuals. In addition, AZEVEDO et al. (2016AZEVEDO, C.F. et al. New accuracy estimators for genomic selection with application in a cassava (Manihot esculenta) breeding program. Genet Mol Res, v.15, n.4, 2016. Available from: <Available from: http://dx.doi.org/10.4238/gmr.15048838 >. Accessed: Nov. 7, 2018. doi: 10.4238/gmr.15048838.
http://dx.doi.org/10.4238/gmr.15048838...
) reported that when genomic prediction considered only linkage imbalance, the predictive ability reported should be less than or equal to that derived from the genomic prediction that also considers the relationship between individuals, which corroborates the results reported in our study. However, it was perceived that the index together with the G-BLUP is able to capture more genetic information that benefits genomic prediction.

In addition, the G-BLUP, BLASSO, and BayesCpi methods presented similar predictive abilities, being in agreement with the results reported in the literature (AZEVEDO et al., 2015AZEVEDO, C.F. et al. Ridge, LASSO and Bayesian Additive-dominance genomic models. BMC Genetics, v.16, p.105, 2015. Available from: <Available from: https://bmcgenet.biomedcentral.com/ articles/10.1186/s12863-015-0264-2 >. Accessed: Sep. 11, 2018. doi: 10.1186/s12863-015-0264-2.
https://bmcgenet.biomedcentral.com/ arti...
, GIANOLA, 2013GIANOLA, D. Priors in whole-genome regression: the bayesian alphabet returns. Genetics, v.194, n.3, p.573-96, 2013. Available from: <Available from: https://doi.org/10.1534/genetics.113.151753 >. Accessed: Nov. 7, 2018. doi: 10.1534/genetics.113.151753.
https://doi.org/10.1534/genetics.113.151...
, DE LOS CAMPOS et al., 2012DE LOS CAMPOS, G. et al. Whole genome regression and prediction methods applied to plant and animal breeding. Genetics, v.193, p. 327-45, 2012. Available from: <Available from: https://doi.org/10.1534/genetics.112.143313 >. Accessed: Nov. 10, 2018.
https://doi.org/10.1534/genetics.112.143...
) that point out the similarity of several methods in terms of predictive ability regarding the prediction of genomic values. GUO et al. (2014GUO, Z. et al. The impact of population structure on genomic prediction in stratified populations. Theoretical and applied genetics, v.127, p.749-762, 2014. Available from: <Available from: https://www.ncbi.nlm.nih.gov/pubmed/24452438 >. Accessed: Nov. 08, 2018. doi: 10.1007/s00122-013-2255-x.
https://www.ncbi.nlm.nih.gov/pubmed/2445...
), using G-BLUP, also reported similar values for the predictive ability for the same traits analyzed.

Regression Coefficient

Interest in GWS is that the regression coefficient between the phenotype and the estimated genomic value is close to one, indicating that these values are non-biased. For regression coefficients below one (1), it is understood that the genomic values are overestimated and for coefficients above one (1), genomic values are underestimated. Thus, according to the results, genomic values estimated by the three indices considered were overestimated, except for the trait of amylose content that obtained a regression coefficient equal to one in the Delta-p/BLASSO index method and underestimated the values in the Delta-p/BayesCpi method. In addition, G-BLUP obtained regression coefficients closer to one than did the Delta-p and Delta-p/G-BLUP index methods. Additionally, the BayesCpi method exhibited values closer to one than did the Delta-p/BayesCpi index. Lower values of regression coefficients reported for these indices may have occurred because of the Delta-p method because as previously reported, this method generates regression coefficients more than one. In turn, for the traits flag leaf width, seed width, and amylose content, it was observed that Delta-p/BLASSO index method obtained regression coefficient values closer to one in relation to the BLASSO method.

Heritabilities

When analyzing heritability, it was observed that the BLASSO and BayesCpi methods presented similar heritability estimates, which corroborates the results obtained by AZEVEDO et al. (2015AZEVEDO, C.F. et al. Ridge, LASSO and Bayesian Additive-dominance genomic models. BMC Genetics, v.16, p.105, 2015. Available from: <Available from: https://bmcgenet.biomedcentral.com/ articles/10.1186/s12863-015-0264-2 >. Accessed: Sep. 11, 2018. doi: 10.1186/s12863-015-0264-2.
https://bmcgenet.biomedcentral.com/ arti...
) who also verified similarities between Bayesian methods to estimate genomic heritability. Heritability estimated by G-BLUP were similar to those reported by GUO et al. (2014GUO, Z. et al. The impact of population structure on genomic prediction in stratified populations. Theoretical and applied genetics, v.127, p.749-762, 2014. Available from: <Available from: https://www.ncbi.nlm.nih.gov/pubmed/24452438 >. Accessed: Nov. 08, 2018. doi: 10.1007/s00122-013-2255-x.
https://www.ncbi.nlm.nih.gov/pubmed/2445...
) considering the same dataset. It was verified that the methods Delta-p and G-BLUP, resulted in smaller values for heritability in relation to the Bayesian methods. According to XING & ZHANG (2010XING, Y.; ZHANG, Q. Genetic and Molecular Bases of Rice Yield. Annual Review of Plant Biology, v. 61, p. 421-442, 2010. Available from: <Available from: https://www.researchgate.net/profile/Yongzhong_Xing/publication/41654874_Genetic_and_Molecular_Bases_of_Rice_Yield/links/02e7e51de056f2702c000000/Genetic-and-Molecular-Bases-of-Rice-Yield.pdf?origin=publication_detail >. Accessed: Sep. 16, 2018. doi: 10.1146/annurev-arplant-042809-112209.
https://www.researchgate.net/profile/Yon...
) and VALLURU et al. (2014VALLURU, R. et al. Genetic and molecular bases of yield associated traits: A translational biology approach between rice and wheat. Theoretical and Applied Genetics, v.127, n.7, p.1463-1489, 2014. Available from: <Available from: https://www.researchgate.net/profile/Matthew_Reynolds5/publication/262978594_Genetic_and_molecular_bases_of_yield-associated_traits_A_translational_biology_approach_between_rice_and_wheat/links/55b7c1c308ae092e965742e5/Genetic-and-molecular-bases-of-yield-associated-traits-A-translational-biology-approach-between-rice-and-wheat.pdf?origin=publication_detail >. Accessed: Sep. 11, 2018. doi: 10.1007/s00122-014-2332-9.
https://www.researchgate.net/profile/Mat...
), quantitative traits, as were the traits used in this study, are generally known because they have low heritability and are difficult to investigate.

Reports of estimates of heritability obtained through pedigree for some of the traits, such as, panicles number per plant, flag leaf width, flag leaf length, and seed length, are reported in the literature (XU et al.,2018XU, Y. et al. Genomic selection of agronomic traits in hybrid rice using an NCII population. Rice, v.11, p.32, 2018. Available from: <Available from: https://www.ncbi.nlm.nih.gov/pubmed/29748895 >. Accessed: Sep. 03, 2018. doi: 10.1186/s12284-018-0223-4.
https://www.ncbi.nlm.nih.gov/pubmed/2974...
; SUMANTH et al., 2017SUMANTH, V. et al.Estimation of genetic variability, heritability and genetic advance for grain yield components in rice (Oryza sativa L.). Journal of Pharmacognosy and Phytochemistry, v.6, n.4, p.1437-1439, 2017. Available from:<Available from:http://www.phytojournal.com/archives/2017/vol6issue4/PartU/6-4-59-319.pdf >. Accessed: Sep. 20, 2018. E-ISSN: 2278-4136.
http://www.phytojournal.com/archives/201...
; AKINWALE et al., 2011AKINWALE, M. G. et al. Heritability and correlation coefficient analysis for yield and its components in rice (Oryza sativa L.). African Journal of Plant Science, v.5, n.3, p.207-212, 2011. Available from: <Available from: http://www.academicjournals.org/app/webroot/article/ article1379945851_Akinwale%20et%20al.pdf >. Accessed: Sep. 13, 2018. doi: 10.5897/AJPS.
http://www.academicjournals.org/app/webr...
; SEYOUM et al., 2012SEYOUM, M. et al. Genetic variability, heritability, correlation coefficient and path analysis for yield and yield related traits in upland rice (Oryza sativa L.). Journal of Plant Sciences, v.7, n.1, p.13-22, 2012. Available from: <Available from: http://docsdrive.com/pdfs/academicjournals/ jps/2012/13-22.pdf >. Accessed: Sep. 25, 2018. ISNN 1816-4951 / doi: 10.3923/jps.2012.13.22.
http://docsdrive.com/pdfs/academicjourna...
; SINGH et al., 2011SINGH, S. K. et al. Assessment of genetic variability for yield and its component characters in rice (Oryza sativa L.).Research in Plant Biology, v.1, n.4, p.73-76, 2011. Available from: <Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.3827&rep=rep1&type=pdf >. Accessed: Sep. 13, 2018. doi: 10.23910/ijbsm/2018.9.1.3c0818.
http://citeseerx.ist.psu.edu/viewdoc/dow...
; OLADOSU et al., 2014OLADOSU, Y. et al.Genetic Variability and Selection Criteria in Rice Mutant Lines as Revealed by Quantitative Traits. The Scientific World Journal, v. 2014, p.190531, 2014. Available from:<Available from:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4241329/ >. Accessed: Sep. 23, 2018. doi: 10.1155/2014/190531.
https://www.ncbi.nlm.nih.gov/pmc/article...
). However, according to DE LOS CAMPOS & SORENSEN (2013DE LOS CAMPOS, G.; SORENSEN, D. A. A commentary on Pitfalls of predicting complex traits from SNPs. Nature Reviews Genetics. 14: 894-894, 2013. Available from: <Available from: http://dx.doi.org/10.1038/nrg3457-c1 >. Accessed: Sep. 20, 2018.
http://dx.doi.org/10.1038/nrg3457-c1...
) and DE LOS CAMPOS et al. (2015)DE LOS CAMPOS, G. et al. Genomic heritability: what is it?.PLoSGenetics, v.11, n.5, p.e1005048, 2015. Available from: <Available from: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005048 >. Accessed: Sep. 20, 2018. doi: 10.1371/journal.pgen.1005048.
https://journals.plos.org/plosgenetics/a...
, these heritability values are always superior to genomic or molecular heritability. This superiority is caused by molecular heritability being a fraction of the heritability obtained via the pedigree that is captured by the markers.

CONCLUSION:

In general, the Delta-p/G-BLUP index has more predictive ability for genomic values than traditional methods (G-BLUP, BLASSO, and BayesCpi) and Bayesian indexes, besides being easy to implement and requiring cost computation. Conversely, the genomic indexes presented greater bias in the predictions of the individual genomic values. Results indicated a greater potentiality of use of rank indexes for the selection of genetically superior individuals and not the exact inference about how much they will produce when commercially planted.

ACKNOWLEDGMENTS

The authors thank the following Brazilian funding organizations: the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for their financial support.

REFERENCES

  • 0
    CR-2018-1008.R1

Publication Dates

  • Publication in this collection
    19 June 2019
  • Date of issue
    2019

History

  • Received
    09 Dec 2018
  • Accepted
    18 Apr 2019
  • Reviewed
    15 May 2019
Universidade Federal de Santa Maria Universidade Federal de Santa Maria, Centro de Ciências Rurais , 97105-900 Santa Maria RS Brazil , Tel.: +55 55 3220-8698 , Fax: +55 55 3220-8695 - Santa Maria - RS - Brazil
E-mail: cienciarural@mail.ufsm.br