ABSTRACT:
Genomewide selection (GWS) is based on a large number of markers widely distributed throughout the genome. Genomewide selection provides for the estimation of the effect of each molecular marker on the phenotype, thereby allowing for the capture of all genes affecting the quantitative traits of interest. The main statistical tools applied to GWS are based on random regression or dimensionality reduction methods. In this study a new nonparametric method, called Deltap was proposed, which was then compared to the Genomic Best Linear Unbiased Predictor (GBLUP) method. Furthermore, a new selection index combining the genetic values obtained by the GBLUP and Deltap, named Deltap/GBLUP methods, was proposed. The efficiency of the proposed methods was evaluated through both simulation and real studies. The simulated data consisted of eight scenarios comprising a combination of two levels of heritability, two genetic architectures and two dominance status (absence and complete dominance). Each scenario was simulated ten times. All methods were applied to a real dataset of Asian rice (Oryza sativa) aiming to increase the efficiency of a current breeding program. The methods were compared as regards accuracy of prediction (simulation data) or predictive ability (real dataset), bias and recovery of the true genomic heritability. The results indicated that the proposed Deltap/GBLUP index outperformed the other methods in both prediction accuracy and predictive ability.
Keywords:
genomic prediction; selection index; genetic gain; asian rice
Introduction
The evolution of sequencing and genotyping techniques has prompted a great advance in molecular genetics becoming a possibility using DNA information directly for the selection of genetically superior individuals. This framework is called genomewide selection (GWS), which enables direct early markerbased selection that generally increases the genetic gain per unit of time (Meuwissen et al., 2001Meuwissen, T.H.E.; Hayes, B.J.; Goddard, M.E. 2001. Prediction of total genetic value using genomewide dense marker maps. Genetics 157: 18191829.). Genomewide selection is based on the estimation of molecular marker effects on the phenotypes, thus providing accurate additive and dominant genomic value predictions for individuals subject to selection, even for those without observed phenotypic values.
The Genomic Best Linear Unbiased Predictor (GBLUP) (Goddard, 2009Goddard, M.E. 2009. Genomic selection: prediction of accuracy and maximization of long term response. Genetics 136: 345357.; Habier et al., 2007Habier, D.; Fernando, R.L.; Dekkers, J.C.M. 2007. The impact of genetic relationship on genomeassisted breeding values. Genetics 117: 23892397.; Van Raden, 2008Van Raden, P.M. 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science 91: 44144423.) has been extensively applied to GWS and recommended for additive and dominant genomic value predictions (Azevedo et al., 2015Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105.; De los Campos et al., 2013De los Campos, G.; Hickey, J.M.; PongWong, R.; Daetwyler, H.D.; Callus, M.P.L. 2013. Whole genome regression and prediction methods applied to plant and animal breeding. Genetics 193: 327345.; Gianola, 2013Gianola, D. 2013. Priors in wholegenome regression: the Bayesian alphabet returns. Genetics 194: 573596.; Gianola et al., 2009Gianola, D.; De los Campos, G.; Hill, W.G.; Manfredi, E.; Fernando, R. 2009. Additive genetic variability and the Bayesian alphabet. Genetics 183: 347363.). However, nonparametric and purely conceptual methods can also be used for this purpose. The Deltap method was theoretically proposed by Resende (2015)Resende, M.D.V. 2015. Quantitative and Population Genetics = Genética Quantitativa e de Populações. Editora Suprema, Visconde do Rio Branco, MG, Brazil (in Portuguese). for application to genomic selection, but has not been implemented and evaluated so far. The method is based on the genetic distance between two subpopulations, using the concepts of changes in allele frequency due to selection and the genetic gain theory. It is also possible to combine the genomic values from the GBLUP with the genomic values provided by Deltap under a selection index framework called Deltap/GBLUP.
The aim of this study was to propose the Deltap and Deltap/GBLUP index, and to compare it with the traditional GBLUP method. The efficiency of these methods was evaluated through simulation and real studies. The simulated dataset consisted of eight scenarios comprising the combination of two levels of heritability, two genetic architectures and two dominance status (absence and complete dominance). The empirical dataset consisted of 370 rice accessions phenotyped for seven yield traits, which were genotyped for 44,100 SNP markers.
Materials and Methods
Simulated datasets
The data set was simulated using the Real Breeding software program (Viana, 2011Viana, J.M.S. 2011. Program for molecular and quantitative data analysis Real Breeding = Programa para análises de dados moleculares e quantitativos Real Breeding. Universidade Federal de Viçosa, Viçosa, MG, Brazil (in Portuguese).), and the generation of data has been described by Azevedo et al. (2015)Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105.. A population of 5,000 individuals from 100 families, generated from the random mating of two linkage equilibrium populations, was subjected to five generations of random mating without selection, mutation nor migration. The final population is an advanced generation composite in HardyWeinberg equilibrium and linkage disequilibrium (LD).
The prediction of genomic breeding value based on thousands of single nucleotide polymorphisms (SNPs) depends on LD between markers and QTL. Thus, the LD value (Δ) in a composite population can be calculated from ${\mathrm{\Delta}}_{ab}=\left(\frac{12{\text{\theta}}_{ab}}{4}\right)\left({p}_{a}^{1}{p}_{a}^{2}\right)\left({p}_{b}^{1}{p}_{b}^{2}\right)$ (Kempthorne, 1973Kempthorne, O. 1973. An Introduction to Genetic Statistics. Iowa State University Press, Ames, IA, USA.; Viana, 2004Viana, J.M.S. 2004. Quantitative genetics theory for noninbred populations in linkage disequilibrium. Genetics and Molecular Biology 27: 594601.; Viana et al., 2016Viana, J.M.S.; Piepho, H.P.; Silva, F.F. 2016. Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in openpollinated populations. Scientia Agricola 73: 243251.), where a and b are two SNPs, or two QTLs, or one SNP and one QTL, θ the frequency of recombinant gametes, and p^{1} and p^{2} the allele frequencies in the parental populations 1 and 2, respectively.
Azevedo et al. (2015)Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105. simulated a total of 2,000 equidistant SNP markers separated by 0.10 centimorgans (cM) across the ten equally sized chromosomes. One hundred of the simulated markers were QTLs randomly distributed in the genome and 95 % of the genetic variation was explained by the markers, according to the expression proposed by Goddard et al. (2011)Goddard, M.E.; Hayes, B.J.; Meuwissen, T.H.E. 2011. Using the genomic relationship matrix to predict the accuracy of genomic selection. Journal of Animal Breeding and Genetics 128: 409421.. An amount of 1,000 individuals from 20 fullsib families were genotyped and phenotyped. This simulation study provides a small effective population size (Ne = 39.22), which, according to Resende (2015)Resende, M.D.V. 2015. Quantitative and Population Genetics = Genética Quantitativa e de Populações. Editora Suprema, Visconde do Rio Branco, MG, Brazil (in Portuguese)., is the typical value in elite breeding populations of plant species.
The simulated phenotypes considered two types of genetic architecture, one with polygenic inheritance, wherein the 100 QTLs were assumed with limited additive effects on the phenotype, and the other with mixed inheritance, the 5 QTLs with effects accounting for 50 % of the genetic variability and the small additive effects were assigned to the remaining 95 QTLs. The genotypic values for homozygotes were calculated considering Gmax = 100(m + a) as the upper limit and Gmin = 100(m – a), as the lower limit where m was the mean of genotypic values and a the homozygote genotypic value. The phenotypic values were computed by adding genotypic values to the true population mean, and error effects sampled from a normal distribution $N\left(0,{\text{\sigma}}_{e}^{2}\right)$, where the error variance ${\text{\sigma}}_{e}^{2}$ was computed according to the heritability levels. The additive and dominance effects were assumed to be independents and were normally distributed with zero mean and genetic variance, depending on the assumed genetic architecture and the desired heritability level. In the simulation study, it was also observed that SNPs had minor allele frequency (MAF) greater than 5 %.
Scenarios
Eight different scenarios were used in the analyses: two genetic architectures (polygenic and mixed inheritance); dominance status (absence and complete), two broadsense heritability levels (around 0.20 and 0.30) for absence of dominance and two broadsense heritability levels (around 0.30 and 0.50) for complete dominance. These values were chosen because, according to Azevedo et al. (2015)Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105., it was expected in these cases, that the genomic selection would be superior to phenotypic selection. The description of the scenarios is presented in Table 1.
Scenarios with the respective true additive heritabilities (${h}_{a}^{2}$), due to dominance (${h}_{d}^{2}$) and total (${h}_{g}^{2}$), genetic architectures (traits controlled by genes of small effect  polygenic inheritance, and traits controlled by genes of small and greater effect  mixed inheritance) and dominance status (absence and complete).
Real datasets
Asian rice (Oryza sativa) is one of the most widely consumed foods in the world (Subudhi et al., 2006Subudhi, P.K.; Sasaki, T.; Khush, G.S. 2006. Rice. v.1, p. 19. In: Kole, C., ed. Genome mapping and molecular breeding in plants. Springer, Berlin, Germany.). Thus, the high population growth rate has justified the interest of researchers in developing highyielding rice varieties. The real/empirical dataset consisted of 370 rice accessions phenotyped for seven yield traits, which were genotyped for 44,100 SNPs markers. The dataset is publicly available and is part of two projects, the OryzaSNP Project and the OMAP Project (Ammiraju et al., 2006Ammiraju, J.S.S.; Luo, M.; Goicoechea, J.L.; Wang, W.; Kudrna, D.; Mueller, C.; Talag, J.; Kim, H.; Sisneros N.B.; Blackmon, B.; Fang, E.; Tomkins, J.B.; Brar, D.; Mackill, D.; MacCouch, S.; Kurata, N.; Lambert, G.; Galbraith, D.W.; Arumuganathan, K.; Rao, K.; Walling, J.G.; Gill, N.; Yu, Y.; Sanmiguel, P.; Soderlund, C.; Jackson, S.; Wing, R.A. 2006. The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deepcoverage largeinsert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Research 16: 140147.; Zhao et al., 2011Zhao, K.; Tung, C.W.; Eizenga, G.C.; Wright, M.H.; Ali, M.L.; Price, A.H., Norton, J. G.; Islam, A.R.; Reynolds, A.; Mezey, J.; McClung, A.M.; Bustamante, C.D.; McClung, A.M 2011. Genomewide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature Communications 2: 467477.), and is available at http://ricediversity.org/.
Plantations were supervised throughout the access phase, from May to Oct of the years 2006 and 2007. A complete block design with two replications was used, in which the planting lines had a length equal to 5 m. The plants were spaced 25 cm apart and 0.50 m between rows. Rice accessions were evaluated in the field at Stuttgart, Arkansas coordinates, Lat: 034°49.5805’ N; Long −091°54.8357’ W, 100 m. Further details can be found in Zhao et al. (2011)Zhao, K.; Tung, C.W.; Eizenga, G.C.; Wright, M.H.; Ali, M.L.; Price, A.H., Norton, J. G.; Islam, A.R.; Reynolds, A.; Mezey, J.; McClung, A.M.; Bustamante, C.D.; McClung, A.M 2011. Genomewide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature Communications 2: 467477.. The quality control procedures were made considering a call rate of 70 % and minor allele frequency (MAF) less than 1 %. After the quality control of the genomic database the total was 36,901 SNPs markers.
The seven traits evaluated were: (i) Panicle number per plant, (ii) Plant height, (iii) Panicle length, (iv) Primary panicle branch number, (v) Seed number per panicle, (vi) Florets per panicle, (vii) Panicle fertility.
GBLUP method
The Genomic Best Linear Unbiased Predictor (GBLUP) method is based on the following linear mixed model:
where, y is a vector of phenotypes (N × 1, being N the number of genotyped and phenotyped individuals); µ the general mean and 1 the vector with dimension (N × 1); a the vector of additive genomic values (N × 1) with incidence matrix Z (N × N), whose assumed distribution is $a~N\left(0,{G}_{a}{\sigma}_{a}^{2}\right)$, where ${\text{\sigma}}_{a}^{2}$ is the genetic additive variance and G_{a}(N × N) the additive genomic relationship matrix; d the vector of dominance genomic values (N × 1) with incidence matrix Z (N × N), whose assumed distribution is $d~N\left(0,\text{}{G}_{d}{\text{\sigma}}_{d}^{2}\right)$, where ${\text{\sigma}}_{d}^{2}$ is the dominance variance and G_{d} (N × N) the dominance genomic relationship matrix; and e the random residual vector, assumed as $e~N(0,I{\sigma}_{e}^{2})$, where ${\text{\sigma}}_{e}^{2}$ is the residual variance and I an identity matrix.
According to Vitezica et al. (2013)Vitezica, Z.G.; Varona, L.; Legarra, A. 2013. On the additive and dominance variance and covariance of individuals within the genomic selection scope. Genetics 195: 12231230., the genomic relationship matrices for the additive and dominance effects, G_{a} and G_{d}, are given, respectively, by:
where p_{i} and q_{i} are the allele frequencies of the locus i. The elements of W and S are given by:
Deltap method
The Deltap method uses the concepts of changes in allele frequency due to selection and genetic gain theory (contrast between the averages of two subpopulations). The training population is initially divided into two subpopulations: one with individuals above the general mean (subpopulation one, with a higher average genetic value u_{1}) and the other with individuals below the general mean (subpopulation two, with lower average genetic value u_{2}). The difference between the average genetic values of the two subpopulations (u_{1} – u_{2}) is due to the frequency of favorable alleles (p) among the two populations. Thus, (u_{1} – u_{2}) is explained by Δp = p_{1} – p_{2}, where Δp is the difference of allelic frequencies between the two subpopulations. The Δp values are calculated for each marker i, so that the positive signals are allocated as favorable. Thus, the allelic substitution effects (α_{i}) of these markers are taken as positive, whereas for those with negative signs of Δp their α_{i}'s are assigned as negative.
In this context, the incidence matrix W for the vector of additive effects of markers is reparametrized, making the markers genotypes as 0 (bb), 1 (Bb) and 2 (BB). Here, the B allele is the favorable allele, and the BB or bb allocation is dictated by the α_{i} signal. Thus, the correct allocation in BB or bb is probabilistic. In general, it is expected that the highest number of allocation errors will occur in loci with very minor effects.
The Deltap method requires the estimation of the α_{i} effects, and later the prediction of the genomic values (a). Taking Δp_{i} as an indicator of the relative magnitude of α_{i} (the higher the Δp_{i}, the higher the chance of the α_{i} being higher), it is possible to define the proportion Δp_{i}/Δp_{m} (with their respective positive or negative signals); where Δp_{m} is the average deltap computed using the absolute value of Δp_{i} multiplied by the average α (α_{m}), which results in the α_{i} effects. Thus, ${\alpha}_{i}=(\mathrm{\Delta}{p}_{i}\text{/}\mathrm{\Delta}{p}_{m})\text{}{\alpha}_{m}$.
The α_{m} value can be obtained using the theoretical expression of genetic gain. For a given locus i in subpopulation one, ${u}_{1i}=({p}_{{1}_{i}}{q}_{1i})\text{}{\alpha}_{i}=(2{p}_{1i}1)\text{}{\alpha}_{i}$, where $i=1,2,\dots ,n$; and in subpopulation two, ${u}_{2i}=({p}_{2i}{q}_{2i})\text{}{\alpha}_{i}=(2{p}_{2i}1)\text{}{\alpha}_{i}$. Thus, $({u}_{1i}{u}_{2i})=({2}_{p1i}{2}_{p2i})\text{}{\alpha}_{i}=2\mathrm{\Delta}{p}_{i}{\alpha}_{i}$. Summing all loci, $({u}_{1}{u}_{2})=2\Sigma (\mathrm{\Delta}{p}_{i}{\alpha}_{i})$. The quantity ∆p_{m} is defined by Falconer (1989)Falconer, D.S. 1989. Introduction to Quantitative Genetics. Longman Scientific and Technical, New York, NY, USA. as: $\mathrm{\Delta}{p}_{m}=sq(1q)$, where $s=\alpha k/{\sigma}_{p}$, where s is the selective coefficient of the locus, assuming an additivedominance model. Therefore, $\mathrm{\Delta}{p}_{m}=\alpha kpq/\sqrt{{\text{\sigma}}_{P}^{2}}$ where k is the standardized selection intensity and ${\text{\sigma}}_{P}^{2}$ the phenotypic variance. The genetic gain is defined as ${G}_{s}=k{\text{\sigma}}_{a}^{2}/\sqrt{{\text{\sigma}}_{P}^{2}}$. Thus, $k/\sqrt{{\text{\sigma}}_{P}^{2}}=\sqrt{{G}_{s}}/{\text{\sigma}}_{a}^{2}$. Replacing Δp_{m} in the expression it is possible to obtain $\mathrm{\Delta}{p}_{m}=\text{\alpha}pq{G}_{s}/{\text{\sigma}}_{a}^{2}$.
The genetic gain associated with the first differential selection is given by ${G}_{s1}={h}_{a}^{2}\left({u}_{1}{u}_{0}\right)$, whereas the second one is ${G}_{s2}={h}_{a}^{2}\left({u}_{0}{u}_{2}\right)$. These gains are symmetrical and must be summed to obtain the total gain 2G_{s}, given by: $2{G}_{s}={G}_{s1}+{G}_{s2}={h}_{a}^{2}\left({u}_{1}{u}_{2}\right)$. Thus, ${G}_{s}=\left(\frac{1}{2}\right){h}_{a}^{2}\left({u}_{1}{u}_{2}\right)$. Replacing G_{s} into ∆p_{m}, $\mathrm{\Delta}{p}_{m}=\text{\alpha}pq{G}_{s}/{\text{\sigma}}_{a}^{2}$ and $\mathrm{\Delta}{p}_{m}=\text{\alpha}pq\left(1/2\right){h}_{a}^{2}\left({u}_{1}{u}_{2}\right)/{\text{\sigma}}_{a}^{2}$; where ${\text{\sigma}}_{a}^{2}=2pq{\alpha}^{2}$. Replacing and dividing by ∆p_{m}, we have:
thus, $1=\left(1/2\right){h}_{a}^{2}\left({u}_{1}{u}_{2}\right)/\left(2\text{\alpha}\mathrm{\Delta}{p}_{m}\right)$. Isolating α from the last expression, $\text{\alpha}=\left(1/2\right){h}_{a}^{2}\left({u}_{1}{u}_{2}\right)/\left(2\mathrm{\Delta}{p}_{m}\right)$) for n locus and the average a per locus is given by:
Therefore, the additive markers effects (α) are nonparametrically estimated in the training population (individuals with known phenotypes and genotypes). Based on these effects, the genetic values (a) are estimated by the following expression: $\widehat{a}=W\widehat{a}$, being W the incidence matrix of the additive genetic effects of the markers. This approach does not demand an iterative computational method and uses only genetic distance concepts (magnitude of ∆p_{i}) and the genetic gains of two subpopulations. Additionally, it also does not use differential shrinkage based on allele frequencies, which benefits the loci with higher MAFs. In summary, the Deltap method can be implemented by taking the followings steps: i) subdivision of the training population into two, according to the phenotype adjusted for systematic effects; ii) calculation of ∆p_{i} and ∆p_{m}; iii) calculation of α_{m}; and iv) calculation of ${\alpha}_{i}=(\mathrm{\Delta}{p}_{i}\text{/}\mathrm{\Delta}{p}_{m})\text{}{\alpha}_{m}$.
For the dominance effects, the quantity Δ(2pq)_{i} (difference between frequencies of heterozygous genotypes of subpopulations one and two) is used instead of ∆p_{i}. The better the phenotype and the higher the frequency of heterozygous genotypes in the population, the more favorable the value of the genotype corrected for the additive genetic effects (i.e., higher the of dominance deviations). In order to infer the effects of dominance, the interest is to find this association (frequency of heterozygous and values of phenotypes). For this, the phenotype must be returned as the frequency of heterozygous, corrected for additive effects. The incidence matrix S is inherent to the frequency of heterozygous among the three genotype classes. Then, g_{m} is defined as the genotype value of the heterozygous, which is corrected for the additive effect (α_{i}). It gives the dominance deviation of the locus i (δ_{i}), and the genomic dominance value (vector d) is defined as $\widehat{d}=S\widehat{\text{\delta}}$, where S is the previously mentioned incidence matrix.
Deltap/GBLUP Index method
Based on the principle of combined selection (Lush, 1947Lush, J.L. 1947. Family merit and individual merit as basis for selection. America Naturalist 81: 241261.; Falconer, 1989Falconer, D.S. 1989. Introduction to Quantitative Genetics. Longman Scientific and Technical, New York, NY, USA.), the GBLUP can be jointly used with the Deltap method. A selection index of the form $I={b}_{1}{\widehat{a}}_{1}+{b}_{2}{\widehat{a}}_{2}$ is defined based on information from predicted additive genomic values from GBLUP (${\widehat{a}}_{1}$) and Deltap (${\widehat{a}}_{2}$), weighted by b_{1} and b_{2} coefficients, respectively. The index can be also represented by $I={b}_{1}f(y,W)+{b}_{2}f(({u}_{1}{u}_{2}),\mathrm{\Delta}p)$; where y is the vector of individual phenotypic data, W the centralized allelic dosage matrix, (u_{1} – u_{2}) the contrast of averages from subpopulations and ∆p the differential frequency between subpopulations. The weights of the index are given by b = P^{−1}C as follows:
where C is a matrix of variance and reliability (r^{2}) of estimated genomic values through Deltap and GBLUP, P a covariance matrix between the estimated genomic values through Deltap and GBLUP and a the true additive genetic value of individuals.
Dividing P and C by ${\text{\sigma}}_{{a}_{1}}^{2}$, and considering ${\mathrm{\Delta}}^{2}={\text{\sigma}}_{{a}_{2}}^{2}/{\text{\sigma}}_{{a}_{1}}^{2}$ it is possible to obtain:
The weights (b_{1} and b_{2}) and accuracy (r_{Ia}) between the index and the true additive genetic value a are given respectively by:
Under this approach, the reliabilities of ${\widehat{a}}_{1}$(${r}_{{\widehat{a}}_{1}a}^{2}$, obtained by GBLUP) and ${\widehat{a}}_{2}$ (${r}_{{\widehat{a}}_{2}a}^{2}$ obtained by Deltap) and the proportion of variances of ${\widehat{a}}_{2}$ and ${\widehat{a}}_{1}$ are required. If ${\mathrm{\Delta}}^{2}={\text{\sigma}}_{{a}_{2}}^{2}/{\text{\sigma}}_{{a}_{1}}^{2}$ provides a value higher than one, then ∆^{2} must be at least 1. Another option is to calculate cov(a_{1}, a_{2}) directly through $\text{cov}\left({\widehat{a}}_{1},{\widehat{a}}_{2}\right)$). Considering that the dominance effects of the methods are analogous to that described above only the genomic value due to the dominance should be used ($\widehat{d}$) instead of the additive genomic value ($\widehat{a}$) in derived equations.
Computer resources
All computational implementations of the proposed methods were performed using the R software program (version 3.3.1). The GBLUP was performed in the rrBLUP package through the mixed.solve function. The algorithm used for the development of the Deltap method is available in http://www.novoscursos.ufv.br/departamentos/ufv/det/www/?page_id=1374.
Comparisons of the proposed methods Simulated dataset
Each type of population was simulated 10 times under the same parameter settings, which preserved the same features and provided samples that were effectively from the same conceptual population. Nine replicates were used as training populations, and one replicate was used as a validation population. The estimations based on each of the nine replicates were validated by obtaining estimates of the efficiency measures. The efficiency measures of genomic prediction were given by the accuracies (${r}_{\widehat{a}a}$ and ${r}_{\widehat{d}d}$), bias (${b}_{y\widehat{a}}$e${b}_{y\widehat{d}}$), recovered additive and dominance genomic heritabilities (${h}_{a}^{2}$e${h}_{d}^{2}$) and relative efficiency (RE).
The accuracy is defined as the correlation between the predicted values and the parametric genetic values; and the prediction bias is defined as the regression coefficient associated with this correlation. For regression coefficients below one (< 1) it is understood that predicted values had been overestimated, whereas for coefficients above one (> 1), it was concluded that predicted values had been underestimated. The recovered additive genomic heritability is given by
where ${\text{\sigma}}_{a}^{2}={\displaystyle {\sum}_{i=1}^{n}2{p}_{i}{q}_{i}{m}_{i}^{2}}$ is the additive genomic variance, ${m}_{i}^{2}$ the square of the ith marker effect with allelic frequencies equal to p_{i} and q_{i}. The genomic heritability due to dominance is given by
where ${\text{\sigma}}_{d}^{2}={\displaystyle {\sum}_{i=1}^{n}{\left(2{p}_{i}{q}_{i}{d}_{i}\right)}^{2}}$ and d_{i} is the genotypic value of the heterozygous. The relative efficiency is given by the ratio between the accuracies from the compared methods. All these measures were obtained for each replicate in each scenario and the general results were reported as average values.
Real dataset
We obtained the measures of efficiency using fivefold cross validation (k = 5), in which in each fold the training population consisted of 296 individuals (80 % of total) and the validation population consisted of 74 individuals (20 % of total). The efficiency measures used were the predictive ability (${r}_{\widehat{a}y}$), which consisted of the correlation between the estimated additive genomic values and the phenotypic values of the validation population, and prediction bias.
Results and Discussion
The average and the respective estimated standard deviations for the accuracy, bias, heritabilities and relative efficiency were obtained by both Deltap and GBLUP and are shown in Table 2. We note an increase in accuracy with the use of the GBLUP. The correlation between the genomic values predicted by the Deltap and the GBLUP method was high (${r}_{{\widehat{a}}_{p}{\widehat{a}}_{g}}$), ranging from 0.80 to 0.88. Although the GBLUP is widely applied to GWS and recommended for the prediction of additive and dominant genomic values (Azevedo et al., 2015Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105.; De los Campos et al., 2013De los Campos, G.; Hickey, J.M.; PongWong, R.; Daetwyler, H.D.; Callus, M.P.L. 2013. Whole genome regression and prediction methods applied to plant and animal breeding. Genetics 193: 327345.; Gianola, 2013Gianola, D. 2013. Priors in wholegenome regression: the Bayesian alphabet returns. Genetics 194: 573596.; Gianola et al., 2009Gianola, D.; De los Campos, G.; Hill, W.G.; Manfredi, E.; Fernando, R. 2009. Additive genetic variability and the Bayesian alphabet. Genetics 183: 347363.), this study proposed combining the values derived from the Deltap and GBLUP methods to construct an index giving different weights to each of them which can be seen as an improvement in the GBLUP additive and dominant genomic predictions. For the additive model, the increase in accuracy using the Deltap/GBLUP index (0.80 to 0.87) was 0.03 units on average compared to the accuracy of the GBLUP method (0.77 to 0.85). For the additivedominant model, this increase was more evident, being 0.08 units. The Deltap/GBLUP index, considering the additive model, provided a relative average efficiency of 103 % to 105 %, while in the additivedominant model this efficiency increased to 109 % to 116 %. It is noted that the Deltap/GBLUP index improved the prediction of additive and dominant genomic values through GBLUP, since the index provided greater accuracy. It is important to report that this advantage in terms of extra percentage points has no additional computational costs.
Average and standard deviation of accuracy (${r}_{\widehat{a}a}$), bias (${b}_{y\widehat{a}}$), additive heritability (${h}_{a}^{2}$), correlation between additive genomic values estimated by Deltap and Genomic Best Linear Unbiased Predictor  GBLUP (${r}_{{\widehat{a}}_{1}{\widehat{a}}_{2}}$), proportion between the additive variances estimated (Δ^{2}), weights (b_{1} and b_{2}), accuracy (r_{Ia}) and bias (b_{yIa}) of the Deltap/GBLUP index method, relative efficiency (RE) between the index and GBLUP considering simulated dataset.
As already mentioned, the Deltap/GBLUP index method is based on the combined selection index. According to Lush (1947)Lush, J.L. 1947. Family merit and individual merit as basis for selection. America Naturalist 81: 241261. and Falconer (1989)Falconer, D.S. 1989. Introduction to Quantitative Genetics. Longman Scientific and Technical, New York, NY, USA., this process outperformed other selection methods, for example, individual or among or within family selection. Similarly, in the present study, the Deltap/GBLUP index method, which is based on the combined selection index, outperformed the GBLUP and Deltap methods. Gains of 5 % in accuracy are already significant in genetic improvement, often equivalent to the gain that is obtained in a complete cycle of improvement (Resende et al., 2015Resende, M.D.V.; Ramalho, M.A.P.; Guilherme, S.R.; Abreu, A.F.B. 2015. Multi generation index in the within progenies bulk method for breeding of selfpollinated plants. Crop Science 55: 12021211.).
The differences between allele frequencies were effective in estimating the marker effects. This result corroborates the principle of the Deltap method. Weller et al. (2014)Weller, J.I.; Glick, G.; Ezra, E.; Seroussi, E.; Shemesh, M.; Zeron, Y.; Ron, M. 2014. Predictive ability of selected subsets of single nucleotide polymorphisms (SNPs) in a moderately sized dairy cattle population. Animal 8: 208216. and Weller (2016)Weller, J.I. 2016. Genomic Selection in Animals. John Wiley, Hoboken, NJ, USA. reported that marker selection based on differences between allelic frequencies between groups of young and older bulls, i.e. between two contrasting subpopulations, can be effective in genomic selection in dairy cattle.
Considering the additive model, it was observed that for all scenarios, the GBLUP method was the one that presented an estimated additive heritability close to the parametric value (Table 2). Considering the additivedominant model, it was observed that for the scenarios whose traits are controlled by genes of small effects (scenarios 5 and 6), the Deltap method presented the additive heritability estimates closest to the parametric values. On the other hand, for the scenarios whose traits are controlled by genes of both small and sizeable effects (scenarios 7 and 8), the GBLUP method presented better performance.
Considering the additive model, the GBLUP and Deltap methods had nonbiased predicted additive genomic values (Table 2), except for the Deltap method in scenarios 1 and 3, where the method underestimated these values (${b}_{y\widehat{a}}>1$). Considering the additivedominant model, the GBLUP method provided nonbiased additive genomic values, except for scenario 8, where the method overestimated these values (${b}_{y\widehat{a}}<1$). In contrast, the Deltap method presented predicted additive genomic values which were overestimated for scenarios 5, 6 and 8; and underestimated for scenario 7. Already, the Deltap/GBLUP index overestimated the predicted additive genomic values for all scenarios considered (b_{yIa} < 1). However, the index had biases considerably closer to one in the additive model than in the additivedominant model (Table 2).
Table 3 shows the average results obtained through a simulated dataset, the methods for the genomic values due to dominance considering the additivedominant model and also the weights, accuracy and efficiency obtained through the Deltap/GBLUP index. In this case, the improvement in the prediction of genomic values with the use of the index (Table 3) was less evident. Only 0.01 units improved in prediction accuracy compared to the accuracy of GBLUP. Bennewitz and Meuwissen (2010)Bennewitz, J.; Meuwissen, T.H.E. 2010. The distribution of QTL additive and dominance effects in porcine F2 crosses. Journal of Animal Breeding and Genetics 127: 171179., Hill et al. (2008)Hill, W.G.; Goddard, M.E.; Visscher, P.M. 2008. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genetics 4: e1000008., and Wellmann and Bennewitz (2012)Wellmann, R.; Bennewitz, J. 2012. Bayesian models with dominance effects for genomic evaluation of quantitative traits. Genetics Research 94: 2137. discuss the relevance of the inclusion of dominance in Quantitative Genomics. However, according to the results reported by Azevedo et al. (2015)Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105., genomic values due to dominance are difficult to predict and are consequently associated with low accuracy values and higher bias. This corroborates the low accuracy level recorded for the Deltap/GBLUP index and GBLUP when considering only dominance genomic values.
Average and standard deviation of accuracy (${r}_{\widehat{d}d}$), bias (${b}_{y\widehat{d}}$), additive heritability (${h}_{\mathrm{d}}^{2}$), correlation between dominant genomic values estimated by Deltap and Genomic Best Linear Unbiased Predictor  GBLUP (${r}_{{\widehat{d}}_{1}{\widehat{d}}_{2}}$), proportion between the dominant variances estimated (∆^{2}), weights (b_{1} and b_{2}), accuracy (r_{Id}) and bias (b_{yId}) of the Deltap/GBLUP index method, relative efficiency (RE) between the index and GBLUP considering simulated dataset.
Although the dominance genomic values, considering the additivedominant model, were inaccurately estimated, it can be seen that the index and the GBLUP accuracy were similar. This result agrees with Azevedo et al. (2015)Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105., which used several methods applied to GWS and obtained results of similar dominance accuracy. Thus, it should be noted that the main criteria for the comparison of methods in these scenarios is the recovered heritabilities, due to dominance and prediction bias.
When analyzing the heritabilities due to dominance, it was observed that for the four scenarios, the closest estimates were provided by the GBLUP method. However, in scenarios 5 and 8, the heritabilities, due to dominance, were slightly underestimated and in scenario 7 the heritabilities, due to dominance, were slightly overestimated. The Deltap method underestimated the genomic values due to dominance in the scenarios whose traits were controlled by genes with small effects (scenarios 5 and 6) and overestimated the scenarios whose traits were controlled by genes with both small and sizeable effects. The GBLUP overestimated the values in all scenarios, with bias being considerably lower in relation to the Deltap method. In addition, the Deltap/GBLUP index also overestimated the genomic values due to dominance for all scenarios considered (b_{yIa} < 1) (Table 3).
Although we have previously analyzed the additive genomic values and because of the dominance, considering the additivedominant model, it is also important to present the results referring to the total genomic value (a + d), since this is related to individual performance (Table 4). Analogous to Tables 2 and 3, the GBLUP presented higher accuracy in the prediction of total genomic values, when compared to the Deltap method. The increase in accuracy using the Deltap/GBLUP index (0.65 to 0.71) was 0.07 units, on average, compared to the accuracy of the GBLUP method (0.56 to 0.65), showing again the efficiency of the index. The Deltap/GBLUP index provided a relevant relative average efficiency that was 108 % to 114 % (Table 4). It was observed that, under the four scenarios considered, the GBLUP had a better performance in recovering the total genomic heritability than the Deltap. However, these were still underestimated (Table 4). In addition, the three methods produced predicted total genomic values that were overestimated (${b}_{y\widehat{g}}$ < 1). Although the Deltap/GBLUP index presented biased total genomic values, this method has the great advantage of obtaining greater accuracy when compared to the GBLUP method.
Average and standard deviation of accuracy (${r}_{\widehat{g}g}$), bias (${b}_{y\widehat{g}}$), additive heritability (${h}_{g}^{2}$), correlation between total genomic values estimated by Deltap and Genomic Best Linear Unbiased Predictor  GBLUP (${r}_{\widehat{g}1\widehat{g}2}$), proportion between the total variances estimated (∆^{2}), weights (b_{1} and b_{2}), accuracy (r_{Ig}) and bias (b_{yIg}) of the Deltap/GBLUP index method, relative efficiency (RE) between the index and GBLUP considering real dataset.
To elucidate the importance of the Deltap/GBLUP index for predicting genomic selection, the results for an applied breeding program of Asian rice are presented in Table 5. Analogous to previous results considering additive model (Table 2), GBLUP showed more accurate genomic values than Deltap. The Deltap method presented values showing overestimated additive genomic values (${b}_{y\widehat{a}}<1$), except for the Panicle number per plant, while the GBLUP presented values that were less biased. The Deltap/GBLUP index, considering the real dataset, provided a relative average efficiency of 101 % to 116 %, being on average 8 % more efficient than the GBLUP. Eight % is highly expressive and under genomic selection performed annually, such as in rice cultivation, these gains are cumulative and grow rapidly (Resende et al., 2015Resende, M.D.V.; Ramalho, M.A.P.; Guilherme, S.R.; Abreu, A.F.B. 2015. Multi generation index in the within progenies bulk method for breeding of selfpollinated plants. Crop Science 55: 12021211.). Thus, the index proposed in this study can be satisfactorily implemented in rice breeding programs.
Predictive ability (${r}_{\widehat{a}y}$), bias (${b}_{y\widehat{a}}$), additive heritability (${h}_{a}^{2}$), correlation between additive genomic values estimated by Deltap and Genomic Best Linear Unbiased Predictor  GBLUP (${r}_{\widehat{a}1\widehat{a}2}$), proportion between the additive variances estimated (∆^{2}), weights (b_{1} and b_{2}), predictive ability (r_{Ia}) and bias (b_{yI}) of the Deltap/GBLUP index method, relative efficiency (RE) between the index and GBLUP considering real dataset.
Thus, the Deltap/GBLUP index is defined based on information from predicted genomic values from GBLUP and Deltap. This approach is rarely applied in the context of GWS. This method, as much for the simulated data as for the rice data, showed itself to be superior to the GBLUP method. In addition, it led to genomic, additive and dominance values that are more accurate than the traditional GBLUP and required reduced computational time. It can also be easily implemented in the context of GWS.
Conclusions
The proposed method, the Deltap/GBLUP index, is easy to implement in the context of genomic selection. It demands reduced computational cost and leads to genomic, additive, dominance and total values that are more accurate than the GBLUP method. On the other hand, the Deltap method provided lower values of accuracy and dominance bias estimates than the GBLUP method. However, for additive effects, under almost all scenarios evaluated, the estimates were more biased for the Deltap model than for the GBLUP. For the traits evaluated in rice it was concluded that the Deltap/GBLUP index was superior to the GBLUP method, and increased predictive ability.
Acknowledgments
The authors thank the following Brazilian funding organizations: the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Fundação Arthur Bernardes (FUNARBE).
References
 Ammiraju, J.S.S.; Luo, M.; Goicoechea, J.L.; Wang, W.; Kudrna, D.; Mueller, C.; Talag, J.; Kim, H.; Sisneros N.B.; Blackmon, B.; Fang, E.; Tomkins, J.B.; Brar, D.; Mackill, D.; MacCouch, S.; Kurata, N.; Lambert, G.; Galbraith, D.W.; Arumuganathan, K.; Rao, K.; Walling, J.G.; Gill, N.; Yu, Y.; Sanmiguel, P.; Soderlund, C.; Jackson, S.; Wing, R.A. 2006. The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deepcoverage largeinsert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Research 16: 140147.
 Azevedo, C.F.; Resende, M.D.V.; Silva, F.F.; Viana, J.M.S.; Valente, M.S.F.; Resende Junior, M.F.R.; Muñoz, P. 2015. Ridge, Lasso and Bayesian additivedominance genomic models. BMC Genetics 16: 105.
 Bennewitz, J.; Meuwissen, T.H.E. 2010. The distribution of QTL additive and dominance effects in porcine F2 crosses. Journal of Animal Breeding and Genetics 127: 171179.
 De los Campos, G.; Hickey, J.M.; PongWong, R.; Daetwyler, H.D.; Callus, M.P.L. 2013. Whole genome regression and prediction methods applied to plant and animal breeding. Genetics 193: 327345.
 Falconer, D.S. 1989. Introduction to Quantitative Genetics. Longman Scientific and Technical, New York, NY, USA.
 Gianola, D. 2013. Priors in wholegenome regression: the Bayesian alphabet returns. Genetics 194: 573596.
 Gianola, D.; De los Campos, G.; Hill, W.G.; Manfredi, E.; Fernando, R. 2009. Additive genetic variability and the Bayesian alphabet. Genetics 183: 347363.
 Goddard, M.E. 2009. Genomic selection: prediction of accuracy and maximization of long term response. Genetics 136: 345357.
 Goddard, M.E.; Hayes, B.J.; Meuwissen, T.H.E. 2011. Using the genomic relationship matrix to predict the accuracy of genomic selection. Journal of Animal Breeding and Genetics 128: 409421.
 Habier, D.; Fernando, R.L.; Dekkers, J.C.M. 2007. The impact of genetic relationship on genomeassisted breeding values. Genetics 117: 23892397.
 Hill, W.G.; Goddard, M.E.; Visscher, P.M. 2008. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genetics 4: e1000008.
 Kempthorne, O. 1973. An Introduction to Genetic Statistics. Iowa State University Press, Ames, IA, USA.
 Lush, J.L. 1947. Family merit and individual merit as basis for selection. America Naturalist 81: 241261.
 Meuwissen, T.H.E.; Hayes, B.J.; Goddard, M.E. 2001. Prediction of total genetic value using genomewide dense marker maps. Genetics 157: 18191829.
 Resende, M.D.V. 2015. Quantitative and Population Genetics = Genética Quantitativa e de Populações. Editora Suprema, Visconde do Rio Branco, MG, Brazil (in Portuguese).
 Resende, M.D.V.; Ramalho, M.A.P.; Guilherme, S.R.; Abreu, A.F.B. 2015. Multi generation index in the within progenies bulk method for breeding of selfpollinated plants. Crop Science 55: 12021211.
 Subudhi, P.K.; Sasaki, T.; Khush, G.S. 2006. Rice. v.1, p. 19. In: Kole, C., ed. Genome mapping and molecular breeding in plants. Springer, Berlin, Germany.
 Van Raden, P.M. 2008. Efficient methods to compute genomic predictions. Journal of Dairy Science 91: 44144423.
 Viana, J.M.S. 2004. Quantitative genetics theory for noninbred populations in linkage disequilibrium. Genetics and Molecular Biology 27: 594601.
 Viana, J.M.S. 2011. Program for molecular and quantitative data analysis Real Breeding = Programa para análises de dados moleculares e quantitativos Real Breeding. Universidade Federal de Viçosa, Viçosa, MG, Brazil (in Portuguese).
 Viana, J.M.S.; Piepho, H.P.; Silva, F.F. 2016. Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in openpollinated populations. Scientia Agricola 73: 243251.
 Vitezica, Z.G.; Varona, L.; Legarra, A. 2013. On the additive and dominance variance and covariance of individuals within the genomic selection scope. Genetics 195: 12231230.
 Weller, J.I. 2016. Genomic Selection in Animals. John Wiley, Hoboken, NJ, USA.
 Weller, J.I.; Glick, G.; Ezra, E.; Seroussi, E.; Shemesh, M.; Zeron, Y.; Ron, M. 2014. Predictive ability of selected subsets of single nucleotide polymorphisms (SNPs) in a moderately sized dairy cattle population. Animal 8: 208216.
 Wellmann, R.; Bennewitz, J. 2012. Bayesian models with dominance effects for genomic evaluation of quantitative traits. Genetics Research 94: 2137.
 Zhao, K.; Tung, C.W.; Eizenga, G.C.; Wright, M.H.; Ali, M.L.; Price, A.H., Norton, J. G.; Islam, A.R.; Reynolds, A.; Mezey, J.; McClung, A.M.; Bustamante, C.D.; McClung, A.M 2011. Genomewide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature Communications 2: 467477.
Edited by
Publication Dates

Publication in this collection
JulAug 2019
History

Received
14 Oct 2017 
Accepted
07 Apr 2018