Bias in the prediction of genetic gain due to mass and half-sib selection in random mating populations

The prediction of gains from selection allows the comparison of breeding methods and selection strategies, although these estimates may be biased. The objective of this study was to investigate the extent of such bias in predicting genetic gain. For this, we simulated 10 cycles of a hypothetical breeding program that involved seven traits, three population classes, three experimental conditions and two breeding methods (mass and half-sib selection). Each combination of trait, population, heritability, method and cycle was repeated 10 times. The predicted gains were biased, even when the genetic parameters were estimated without error. Gain from selection in both genders is twice the gain from selection in a single gender only in the absence of dominance. The use of genotypic variance or broad sense heritability in the predictions represented an additional source of bias. Predictions based on additive variance and narrow sense heritability were equivalent, as were predictions based on genotypic variance and broad sense heritability. The predictions based on mass and family selection were suitable for comparing selection strategies, whereas those based on selection within progenies showed the largest bias and lower association with the realized gain.


Introduction
More than two decades ago, Wricke and Weber (1986) stated that the formula for predicting gain from selection "is certainly one of the central points in plant breeding research". However, it is unlikely that either of these authors would now defend this position. Various relevant methods, such as selection indices, diallel analysis, stability and adaptability analysis, Best Linear Unbiased Prediction (BLUP) and QTL analysis, were developed by quantitative geneticists prior and after the proposition of a general function for gain prediction by Eberhart (1970). The prediction function developed by Eberhart (1970) based on work by Falconer (1960) has proven useful for assessing the efficiency of breeding methods and selection strategies. Although regularly used in breeding studies, this function, popularly known as 'the breeder's equation', is not the only one available to quantitative geneticists (Loywyck et al., 2005). Gonçalves et al. (2007) assessed several selection processes in families of yellow passion fruit obtained by Design I. The best process was combined selection. The predicted gain from combined selection based on the number of fruits per plant was 18.55%, whereas the best results for index-based selection were 15.92% for Pesek and Baker and 15.85% for Mulamba and Mock. In a study with BR 5011 corn cultivar in which three mass selection cycles and 17 cycles of half-sib selection were used, Carvalho and Souza (2007) predicted an average gain in yield of 2.56% in the last 14 cycles. Rose et al. (2007) assessed the efficiency of half-sib selection in switchgrass (Panicum virgatum L.) in high and low yield environments. The predicted gains for dry matter were generally lower in the unfavorable environment. The predicted gain for family selection was superior to that for mass selection. Baltunis et al. (2007) showed that in loblolly pine (Pinus taeda L.) the predicted direct gain from half-sib selection for rooting ability was 36% while the predicted indirect gain for height at two years was 5.4%. Selecting the best families for height resulted in direct and indirect predicted gains of 8.1% and 14.8%, respectively. The predicted direct gain from full-sib selection for rooting ability was 43% while the predicted indirect gain for height at two years was 9%. Selecting the best families for height yielded direct and indirect predicted gains of 10.1% and 8.6%, respectively. The selection of clones based on rooting ability resulted in a predicted direct gain of 96% associated with a decrease in height. Selecting the best clones for height resulted in direct and indirect predicted gains of 27% and 43%, respectively. Thus, overall, the selection indices assessed resulted in gains for both traits.
Despite its usefulness in helping to choose the best breeding method or selection process, the Eberhart prediction formula is widely known to provide a biased estimate (usually an overestimate) of changes in the population mean. Bordes et al. (2006) compared the efficiency of two methods of corn inbred lines development. For yield, the use of the dihaploid method resulted in a predicted gain of 2%/year, which was lower than the predicted gains of 2.4%/year and 2.9%/year for two cycles of S 1 families, respectively, in four years. The real gains were 1.65%/year and 1.75%, respectively, indicating overestimation of the predicted gains. A study with popcorn showed that although there was agreement between the predicted and true mean gains in expansion volume and yield, the predictions per cycle were generally overestimated (Viana, 2007). Similar results are reported by Hallauer and Miranda Filho (1988).
In view of the lack of information on the relative importance of possible sources of bias, the aim of this study was to investigate biases in the prediction of genetic gains from selection.

Sources of bias in the prediction of genetic gains
Although generally applicable to genetic breeding, the Eberhart function is based on mass selection in a single gender. The genetic gain (DM) is calculated as M 1 -M, where M 1 is the genotypic mean of the bred population and M is the genotypic mean of the population in which the selection was made. The gain is proportional to the difference between the phenotypic mean of the selected population (P s ) and the phenotypic mean of the base population (P), referred to as the selection differential (SD), i.e., DM = b.(P s -P). Thus, M 1 = M + b.SD. Since the bred population consists of half-sib families whose common parents are the selected individuals, the parameter b should be the same as the regression of the mean phenotypic value of progeny as a function of the difference between the phenotypic value of the selected individual and the phenotypic mean of the base population (P P P o s i , assuming identity of the models for each selected individual).
Based on this assumption, In addition, assuming that genotypic value and environmental effect are independent, where G s and G o are the genotypic values of a selected individual and its progeny in the bred population, the covariance of which is unknown. Assuming that alterations in the gene frequencies are negligible, then in the case of the additive-dominant model (Wricke and Weber, 1986) where s A 2 and s P 2 are the additive genetic variance and the phenotypic variance in the base population, respectively.
Hence, the predicted gain from mass selection on a single gender is where h 2 is the heritability.
Assuming that the numerator of the coefficient of proportionality b is the covariance between the additive genetic value of an individual in the selection unit (X) and the additive genetic value of its relative in the bred population (Y) (COV A (X, Y) = 2r XY s A 2 , where r XY is the coefficient of relationship between X and Y), then the predicted gain in a year is (Eberhart, 1970)  where p is the parental control (1/2, 1 or 2), h 2 is the heritability of the selection unit, s g 2 is the genotypic variance of the selection units, attributable to the average effects of the genes (s s g X Y A r 2 2 4 = ), s ph 2 is the phenotypic variance of the selection units and y is the number of years per cycle. This is a generalization of the function presented by Falconer (1960) for mass selection on both genders.
Since the additive covariance between an individual in the selection unit and its relative in the bred population is only equal to 2 2 r XY A s in the case of absence of selection, the genetic gain prediction function is biased because even though the selection is not efficient the prediction will not necessarily be nil. Additional biases will result from errors in estimating h 2 and s g 2 , attributable to sampling, experimental error and unmet assumptions such as Hardy-Weinberg equilibrium, linkage equilibrium and the absence of epistasis.

Theoretical genetic gains
For a single gene and mass selection on only one gender in a population under Hardy-Weinberg equilibrium, the probabilities of the genotypes in the group of selected individuals are 498 Viana where m is the mean of the genotypic values of the homozygotes, a is the deviation between the genotypic value of the homozygote of greater expression and m, d is the deviation between the genotypic value of the heterozygote and m, and M = m + (p -q)a + 2pqd is the mean of the base population (Wricke and Weber, 1986). The genetic gain due to selection is where a is the effect of substituting the A 2 gene with the A 1 gene (Wricke and Weber, 1986). Since the selection intensity is the ratio between the height of the ordinate of the standard normal distribution corresponding to the truncating point (a t ) and the proportion of selected individuals (P S ) (Wricke and Weber, 1986) With mass selection on both genders the change in the frequency of the favorable gene is Dp 2 = 2Dp 1 . The mean of the bred population is The bias in the prediction of genetic gain is The DM 2 /DM 1 ratio is only equal to two if there is no dominance and, accordingly, only in this situation will the gain from mass selection on the two genders be twice the gain from mass selection on a single gender. Therefore, if dominance is present, then the assumption that selection on both genders results in a predicted gain that is two-fold greater than for selection on only one of the genders is an approximation and a further source of bias.
The impossibility of using the bias functions to investigate the magnitude of bias must be emphasized since the selection intensities for s H and s R on each gene, together with the selection intensity i, are not known a priori. The same is true for family selection. In the case of half-sib selection with recombination only among individuals of the selected progenies, the alteration in the favorable gene frequency is where s D , s H and s R are the selection intensities on the families of common parents A 1 A 1 , A 1 A 2 and A 2 A 2 . The mean of the bred population, based on a recombination generation after the selection cycle, is

Characterization of the gene systems, populations and environmental conditions
The simulation done here considered seven generic traits, three classes of populations, three environmental conditions and two breeding methods, both conducted for 10 cycles. The traits were characterized by different degrees of dominance. The values 2 and -2, 1 and -1, 0.5 and -0.5, and 0 were used to define overdominance, complete dominance, partial dominance and no dominance, respectively. A positive value indicated dominance of a favorable gene (one that increased trait expression) whereas a negative value indicated dominance of the unfavorable gene (one that decreased trait expression). Each trait was assumed to be determined by 10 genes with an assortative distribution. Additional assumptions included absence of epistasis, Hardy-Weinberg equilibrium and linkage equilibrium.
Since the frequencies of favorable genes in a population can range from 0 to 1, we attempted to represent all possible populations by using three categories, namely, an unimproved population, a population with intermediate frequencies of favorable genes and an improved population. The frequencies of the favorable genes for these classes were assumed to be 0.1, 0.5 and 0.9, respectively. The experimental conditions or degree of error control also varied, which resulted in changes in the parametric values for heritability based on the magnitude of the environmental effects that were introduced. This approach accounted for situations of high (90%), intermediate (50%) and low (10%) heritability. Because of the difficulty in precisely establishing the desired heritability value, a variation of ±4% in the desired value was allowed. The breeding methods used were mass selection in one sex and half-sib selection with recombination of selected progenies. In the case of mass selection, the population size was 1000. With half-sib selection, the simulation assumed 200 progenies (200 fe-males and an infinite male gamete pool), with a completely randomized block design, two replications and 25 individuals per plot. The best 10% were selected based on phenotypic values of the individuals and the average phenotypic values of the families. For the recombination plot, the simulation assumed 100 individuals in each selected progeny and an infinite male gamete pool.
The genetic gain due to mass selection was calculated as the difference between the parametric mean of the improved population (cycle n + 1) and the mean of the previous population (cycle n). The genetic gain due to family selection was calculated as the difference between the mean of the improved population obtained with family selection and the mean of the previous population, whereas the gain for the selection of superior individuals in the best progenies was calculated as the difference between the mean of the improved population obtained by among and within selection and the mean of the improved population obtained with family selection. A generation of random mating was assumed to occur after each selection cycle. The predicted gains were calculated based on the parametric values of additive and genotypic variances and of narrow and broad sense heritabilities, as well as the estimated values of these parameters. With mass selection, there was no defined constant bias in the estimate of the additive variance. The estimates were also obtained by simulating parent-offspring and mid-parent-offspring regressions (average of 10 estimates for each regression). In the case of half-sib selection, the estimates of additive variance came from analyses of variance. The function of the predicted gain due to within family selection was The simulated data were obtained by using many of the built-in functions of Microsoft Excel ® software (Microsoft Inc.). The sequence of events used was: (1) specification of the trait and effects of the favorable genes, with insertion of the degree of dominance (the same for each gene), (2) characterization of the population, with insertion of the frequencies of the favorable genes, (3) specification of the environmental conditions, with definition of the desired heritability, (4) calculation of the population parametric mean (cycle 0), (5) simulation of the individual genotypes in the case of mass selection, or of the parent genotypes (females) and 150 individual genotypes for each progeny in the case of half-sib selection, (6) simulation of the genotypic values, environmental effects and phenotypic values of the individuals, (7) in the case of half-sib selection, analysis of variance of the plot phenotypic values (mean phenotypic value of 25 individuals), (8) estimation of genetic parameters (genotypic and additive variances, and heritabilities) and prediction of gains, (9) identification of superior individuals in the case of mass selection, or of the best families in the case of half-sib selection, (10) computation of the gene frequencies in the improved population, and (11) computation of the improved population mean and of realized gains (first cycle). For the other cycles, the same order of events was used, except for events (1) and (3). Note the correspondence between events (10) and (11) for cycle n and events (2) and (4) for cycle n + 1.
Each combination of trait (7), population (3), heritability (3), breeding method (2) and cycles (10) was repeated ten times and corresponded to 12.600 simulations. In the case of mass selection, when the predicted gain was calculated with estimates of the parameters (biased estimates) only one replication was done (total of 1.260 simulations).

Mass selection
Few experimental studies have compared predicted and realized gains, especially using mass selection. This lack of data makes it difficult to compare the results for biases in gain predictions with mass selection (Table 1). Another limiting factor, even when experimental data are available, is the lack of knowledge about gene frequencies in the population under selection, i.e., the level of breeding in the population and the degree of dominance of the genes controlling the traits being studied. As shown here, the prediction of gain from selection is biased, even when the true values of the genetic parameters (unbiased estimates) are used in the calculation. Ignoring biases > 300% that essentially reflected only a small predicted gain and no actual gain, the mean biases in this simulation ranged from 39.2% to 59.3%, depending on the prediction function used. Ex-treme values generally represented < 10% of the cases and occurred mainly in bred populations with average heritability.
Overestimation of gain was not a general rule in our analysis. When additive variance or narrow sense heritability was used there was a tendency to underestimate the gain, particularly with low heritability (Table 1). However, when genotypic variance or broad sense heritability was used, the overestimation of gain for traits with a mean dominance =1.0 was more frequent. Consequently, the use of genotypic variance or broad sense heritability (rather than additive variance and narrow sense heritability) was a further source of bias in gain prediction. In several cases, the bias went from negative (underestimation) to positive (overestimation) values, with an increase in magnitude. The mean absolute values of the biases ranged from 39.2% to 59.3% (increase of 51.3%) with the use of genotypic variance, and from 41.3% to 49.9% (increase of 20.8%) with the use of broad sense heritability. The magnitude and sign of the biases further showed that prediction based on selection intensity and additive variance was equivalent to prediction based on narrow sense heritability and selection differential (means absolute values of the biases were 39.2% and 41.3%, respectively). The same was true for the use of genotypic variance and broad sense heritability (means of 59.3% and 49.9%, respectively).
The results of different traits showed that the magnitude of the bias was proportional to the degree of dominance, regardless of whether the favorable genes were dominant or recessive (Table 1). With prediction based on additive variance, the mean magnitude of the bias with complete dominance/overdominance and partial dominance/absence of dominance was 47.7% and 27.5%, respectively. Finally, small magnitude bias was observed in populations with intermediate frequencies and under high heritability conditions. The means of the absolute values were 42.2%, 42.9% and 32.9% for cases of low, medium and high heritability, respectively, and 51.9%, 27.5% and 37.9% in non-bred populations, populations with intermediate frequencies, and bred populations, respectively, with prediction based on additive variance.
Although the Eberhart function yielded biased estimates of genetic gain, our simulation indicated that this function was adequate for assessing the efficiency of recurrent population breeding methods and selection strategies. The correlation between realized and predicted genetic gains during 10 cycles was generally positive and of high magnitude (average of 0.84) ( Table 2). The exceptions (values < 0.70), which represented 6%-12% of the cases analyzed, did not show any tendency and can be attributed to chance. Again, there was full correspondence between the results obtained with prediction using additive variance or narrow sense heritability and those obtained based on genotypic variance or broad sense heritability.

Bias in predicted gains 501
When gain is predicted based on biased estimates of the genetic parameters, the additional bias can increase or decrease the difference between the realized and predicted gains. Using estimates of additive variance (obtained by parent/offspring and mid-parent/offspring regressions) and genotypic variance (obtained from the difference between the phenotypic and environmental variances), the simula-tion study confirmed almost all of the previous inferences. The exception was a small bias in a bred population, for which the mean magnitude ranged from 27.4% to 47.9%, depending on the prediction function. The bias in the estimates of additive variance ranged from -30.1% to 24.6%, with a predominance of underestimation (71.4% of the cases), which explained the smaller magnitude of the bias 502 Viana et al. observed here. With few exceptions, the realized and predicted gains during 10 cycles were also in full agreement (average correlation of 0.80).

Half-sib selection
The results of bias in predictions of gain from family selection showed similarities and differences compared to those obtained with mass selection (Table 3). Although the amplitude of the absolute value of bias was not smaller (minimum of 0.25% and maximum of 158.4%, with prediction based on the parametric value of the additive variance), there were no very high results (> 300%) and the mean value was 17.7%. The corresponding values in the case of mass selection were 0.31%, 149.1% and 39.2% (Table 1). Although the frequency of cases involving overestimation and underestimation were equivalent (54% and 46%), there was a tendency for overestimation in traits controlled by dominant favorable genes. These results were similar to the findings of Carvalho et al. (2000) for corn yield, in which the bias between the predicted and realized gains was 287.3%. Bonomo et al. (2000) reported yield biases of 53.5%, 119.0%, 129.8% and 88.3% when the selection intensity varied from the lowest to the highest value. More recently, Viana (2007) calculated the realized gain by using the means of the progeny tests and observed full correspondence between the realized and predicted gains. The respective means of the predicted and realized gains for the three selection cycles were 5.6% and 5.6% for expansion volume, and 8.1% and 7.8% for yield.
The mean absolute values of biases by trait, population and heritability were larger with complete dominance and overdominance (21.4%), in bred populations (28.9%) and with low heritability (23.8%) ( Table 3). The mean values in cases of partial/absence of dominance, in non-bred populations and in populations with intermediate gene frequencies, average heritability and high heritability were, respectively, 12.7%, 12.3%, 11.9%, 14.7% and 14.6%. Once again, equivalence was observed between prediction based on selection intensity and additive variance and prediction based on narrow sense heritability and selection differential. For bias in the predictions based on estimates of additive variance, all of the previous inferences were confirmed, with no exceptions (Table 3). Although the use of biased estimates of genetic parameters can either increase or decrease the bias calculated based on parametric values, only increases were observed here. The absolute minimum, mean and maximum values were 0.52%, 25.9% and 180.4%, respectively.
The gain prediction from family selection was a poorer indicator of the efficiency of recurrent population breeding methods and selection strategies compared to similar prediction from mass selection ( Table 4). The linear association between predicted and realized gains during 10 cycles was only adequate for heritability > 50%, regardless of the traits and the bias in the additive variance estimates.
The mean correlation was 0.71 for prediction based on unbiased estimates of the additive variance, and 0.59 in the case of prediction based on biased estimates. When low heritability cases were excluded, the mean correlations were 0.85 and 0.81. Bias in predicted gains 503 Table 2 -Correlation between realized and predicted gains in 10 mass selection cycles based on unbiased estimates of additive and genotypic variances 1 .
Gain h 2 (%) p Degree of dominance Comparison of the predicted and realized gains based on the selection of individuals in the best families yielded poor results (Tables 3 and 4). In approximately 54% of the cases, the realized gain was practically nil, implying very high bias values in relation to the predicted gain (Table 3). This situation occurred in predictions of traits controlled by dominant favorable genes (degree of dominance > 0, regardless of the bias in the estimates of additive variance). When these values were ignored, the smallest magnitude of bias was 4.6% and the absolute maximum value was 297.9%, with prediction using unbiased estimates of additive variance. The mean magnitude of the bias was 94.5%. 504 Viana et al. These values were greater than those observed with mass selection, indicating that prediction of gain from within half-sib selection is more biased than prediction of gain from mass selection using unbiased estimates of additive variance. There was a tendency for underestimation in traits controlled by favorable genes with a degree of dominance > 1.0, as also seen with corn grain yield. However, overestimation was detected in the other situations. These observations agreed with findings for popcorn yield (Matta and Viana, 2003), for which the biases in gain predictions from among and within selection were 218.1% and -116.3%, respectively, in line with the theoretical results. For expansion volume, considered by Scapim et al. (2002) to be determined by favorable dominant and recessive genes (bi-directional dominance), the bias was towards underestimation, i.e., -18.4% with progeny selection and -78.4% with selection of individuals in the selected families. As expected, bias in gain prediction from within selection was much larger than bias in gain prediction from among family selection.
Greater biases were observed for traits controlled by favorable dominant genes (average magnitudes of 133.3%, 297.9% and 115.0% for degrees of dominance of 0.5, 1 and 2, respectively) and traits not controlled by allelic interaction effects (average magnitude of 143.8%) ( Table 3). The average absolute values of the biases for traits controlled by favorable recessive genes were 90.1%, 54.9% and 41.0% for degrees of dominance of -0.5, -1 and -2, respectively. Greater absolute biases were observed in populations with intermediate gene frequencies (113.1% versus 81.6% and 86.2%, in non-bred and bred populations) and low heritability (108.8% versus 82.4% and 92.4%, with medium and high heritability). The predicted gains calculated based on biased estimates of additive variance were more biased, but generally confirmed the results obtained by using the parametric value. The minimum, mean and maximum magnitudes were 4.1%, 102.3% and 296.1%, respectively.
An additional negative aspect of gain prediction from individual selection within the selected families was shown by the correlation between predicted and realized gain during 10 cycles. Regardless of the magnitude of the bias in additive variance, the correlation was negative in~40% of the situations assessed (Table 4) but was > 0.7 in only 30%-40% of the cases. Only in cases of traits controlled by favorable recessive genes with average to high heritability was there sufficient agreement between predicted and realized gains to allow assessment of the efficiency of the recurrent breeding method and selection strategies (average correlation of 0.75, regardless of the bias in the additive variance estimate). The average correlations for unbiased and biased estimates of the additive variance were 0.15 and 0.24, respectively.
In conclusion, the use of unbiased and biased estimates of the genotypic variance within progeny rather than the within family additive variance, i.e., broad versus narrow sense heritability, increased the magnitude of bias without worsening the correlation between predicted and realized gains. These findings indicate that Eberharts for-Bias in predicted gains 505 Table 4 -Correlation between realized and predicted gains during 10 half-sib selection cycles based on unbiased and biased estimates of additive variance 1 .
Gain h 2 (%) p Degree of dominance mula, which is a function of additive variance or narrow sense heritability, is a less biased estimator of genetic gain than the estimator based on a function of genotypic variance or broad sense heritability. As shown for mass and family selection, there was full correspondence between the gains calculated with additive or genotypic variance and the predictions based on broad or narrow sense heritability.