## Services on Demand

## Article

## Indicators

## Related links

## Share

## Genetics and Molecular Biology

*Print version* ISSN 1415-4757

### Genet. Mol. Biol. vol.31 no.1 São Paulo 2008

#### http://dx.doi.org/10.1590/S1415-47572008000100015

**PLANT GENETICS RESEARCH ARTICLE**

**Inclusion of genetic relationship information in the pedigree selection method using mixed models**

**José Airton Rodrigues Nunes ^{I}; Magno Antonio Patto Ramalho^{II}; Daniel Furtado Ferreira^{III}**

^{I}Departamento de Planejamento e Política Agrícola, Centro de Ciências Agrárias,Universidade Federal do Piauí, Teresina, PI, Brazil

^{II}Departamento de Biologia, Universidade Federal de Lavras, Lavras, MG, Brazil

^{III}Departamento de Ciências Exatas, Universidade Federal de Lavras, Lavras, MG, Brazil

**ABSTRACT**

We used a mixed model approach and computer simulation to evaluate the inclusion of parentage information as determined by the genealogy established in the pedigree method. The simulations were based on a purely additive genetic model for one quantitative trait of 20 unlinked segregating loci with equal effects and an allelic frequency of 0.5 for heritability values of 10%, 25%, 50% and 75% for selection based on an F_{4:5} progeny mean. We simulated 1000 experiments for each heritability value, corresponding to the evaluation of 256 F_{4:5} progenies. The phenotypic values of the progenies were analyzed according to two models, one ignoring and one considering the additive genetic parentage among the progenies. The additive relationship coefficients among F_{4:5} progenies ranged from 0.0 to 1.75. The evaluated selection procedures were the phenotypic progeny mean (*M*) and the best linear unbiased predictor including parentage (*BLUP _{A}*). The inclusion of parentage among progenies using the

*BLUP*procedure resulted in higher selection gains than when the relationship information was ignored, which possibly recompenses the additional work invested to obtain these records, above all in the case of low - heritability traits.

_{A}**Key words:** autogamous crops, *BLUP*, computer simulation, plant breeding.

**Introduction**

The pedigree method, proposed towards the end of the 19^{th} century, is widely applied to improvement programs of self-fertilized plants and is mainly based on recording the genealogies among progenies over the selfing generations (Ramalho *et al.*, 2001). However, not only is does this procedure require time and dedication from the breeder but the usefulness of this method for the selection process is somewhat restricted. One possibility of using this parentage information in support of the selection process would be in progeny evaluations in experiments with replications. Such an approach could be useful since breeders of autogamous species are primarily interested in selecting progenies that, during homozygosis, accumulate a higher quantity of favorable alleles that associate the best additive genetic values (*AGV*), bearing in mind that the ultimate aim is the establishment of lines (Fehr, 1987). For quantitative traits, however, the phenotype does not always reflect the associated *AGV*. In this case, it would be important to use methodologies that optimize the use of the available information, in order to classify the progenies as closely as possible to the ranking given by the true *AGV* (White and Hodge, 1989). Several fixed model and mixed model procedures have been proposed to predict the *AGV* of progenies, including the best linear unbiased estimator (*BLUE*) method, the best linear predictor (*BLP*) technique and the best linear unbiased predictor (*BLUP*) approach (White and Hodge, 1989; Mrode, 1996; Lynch and Walsh, 1998; Resende, 2002).

The *BLUP* procedure has been the most widely used in the prediction of the genetic merit in animals (Mrode, 1996) and, more recently, it has been widely applied in plant improvement (Bernardo, 2002; Resende, 2002). Under unbalanced conditions this procedure not only has the advantage of making predictions more reliable compared to those obtained by the ordinary least square method but also incorporates information on related plants and thus optimizes the use of the available data in progeny comparisons (Bernardo, 2002).

Since we found no reports on the use of genealogy established by the pedigree method in progeny selection in self-pollinated crops and field experiments produce unreliable information (Wang *et al.*, 2003) we evaluated the efficiency of selection incorporating this genetic relationship using a mixed model computer simulation.

**Methodology**

The program was implemented in the Delphi 6.0 environment (Cantú, 2002). A simplified genetic model was assumed for any quantitative trait considering 20 loci of independent segregation, with equal and additive effects and an allelic frequency of 0.5 without dominance. The simulations considered heritability values of 10%, 25%, 50% and 75% for selection based on an F_{4:5} progeny mean (). For each heritability we simulated 1000 F_{2} populations with 20 segregating loci consisting of 64 plants each. The plant multiplication rates were assumed to be equal, with each plant generated 40 offspring.

Initially, the generations were advanced by the pedigree method with no visual selection. A segregating F_{2} population of 64 simulated plants gave rise to the 64 F_{2:3} progenies with 40 plants each. Two plants were randomly selected from each F_{2:3} progeny, resulting in 128 F_{3:4} progenies and the process repeated in the following generation to finally obtain 256 F_{4:5} progenies with 40 plants each (Figure 1).

The phenotypic values for the plants of each F_{4:5} progeny (*y _{i}*) were simulated by adding normally distributed random errors to the genotypic values (

*GV*), by the following model:

where µ is a constant (100 in the present case), *g _{i}* is the genotypic effect of plant

*i*(

*i*= 1, 2, ..., 40) and

*w*is the environmental deviation associated to

_{i}*y*.

_{i}The *g _{i}* effect result from the cumulative effect of the 20 loci as already described in the genetic model. The additive effect (

*a*) of the

_{l}*l*

^{th}locus was assumed equal to 1.0, where

*l*= 1, 2, ..., 20. The value of

*g*taking locus

_{i}*B*with two alleles (

*B*and

^{1}*B*) as reference is given by:

^{2}The *w _{i}* effects were randomly attributed based on a normal distribution with constant variance,

*i.e.*,

*N*(0,). The variance component is the environmental variance among plants, which can be obtained by:

where is the genetic variance among the F_{2} plants (*i.e.*, =+), is the F_{2} additive variance with, in this case, an allelic frequency of 0.5 , and since *a _{l}* was assumed equal to 1.0 for all loci, then , is the F

_{2}variance dominance and since dominance was assumed to be absent, then , and is the F

_{2}generation individual heritability.

The 40 simulated genotypes or plants per F_{4:5} progeny were divided into two virtual plots of 20 plants (*n* = 20) to produce two replications (*r* = 2) for each progeny. In the following equations, random errors were considered to be normally distributed among plots, with *e~N*(0,) in relation to the mean phenotypic values of the plots. The variance component is the environmental variance among plots.

In the simulation the relation / was considered fixed at eight (*c* = 8). The error terms varied according to values assumed for heritability:

where is the genetic variance among F_{4:5} progenies (=7/4), is the phenotypic variance within a plot (=+, where is the genetic variance within F_{4:5} progenies given by =1/8). Thus the individual F_{2} heritability () was determined as a function of the pre-fixed heritability values as:

We analyzed 1000 experiments corresponding to the evaluation of 256 simulated F_{4:5} progenies, derived from the pedigree method. The analysis was based on the mean phenotypic data of the plots, using a completely randomized experimental design with two replications.

According to the description of the conduction by pedigree method, each F_{2} plant generated four F_{4:5} progenies (Figure 1). Based on this detailed pedigree, the matrix of the additive genetic parentages among the related progenies was determined, considering the F_{2} population as non-inbred. The phenotypic progeny data were then analyzed according to two models:

**Model G_{I}**

The genetic relationship among progenies was ignored. The mean phenotypic data of the plots of the 256 F_{4:5} progenies were analyzed using a linear mixed model (Henderson *et al.*, 1959) , **y=X**b**+Za+e**, where **y** is a 512 x 1 vector of the mean phenotypic data of the plots, **X** is a 512 x 1 fixed effect design matrix, b is a scalar fixed effect of the constant, **Z** is a 512 x 256 random effects of progenies design matrix, **a** is a 256 x 1 progeny random effects vector with *a*~*N*(0,*G*) and G=*A*, while **e** is a 512 x 1 vector of errors with *e~N*(0,*R*) and *R*=*I*. The *G* matrix was designated by *I*(*i.e.*, **A** = **I**), indicating that the progenies were assumed to be unrelated. In this case, the component is equal to the genetic variance among F_{4:5} progenies ().

**Model G_{A}**

In this model the genetic relationship among progenies was considered by the inclusion of parentage among progenies. The mixed model for analysis was identical to model *G _{I},* except that the

*G*matrix was designated by , with

**A**containing the additive relationship coefficients among F

_{4:5}progenies, corresponding to twice the Malecot's coancestry coefficient (Bernardo, 2002: section 2.3.5.2), and refers directly to the F

_{2}additive variance among plants (). In animal breeding the

*A*matrix is referred to as the

*numerator relationship matrix*(Mrode, 1996) and, in this case, it was given by:

where is the Kronecker product.

The solutions for the random (**â**) and fixed effects () for both models were obtained by solving the following equation (Henderson *et al.*, 1959):

To obtain the previous solutions, the components of genetic and non-genetic variances were assumed to be unknown. These variance components were estimated using the restricted maximum likelihood (REML) method (Patterson and Thompson, 1971). Since the REML method employs an iterative process, the expectation-maximization (EM) numeric algorithm was applied (Dempster *et al.*, 1977).

The predictions of the progeny random effects (**â**) based on the overall adjusted mixed model are *BLUP* predictions (Henderson, 1975). After an adjustment of the *G _{I}* model the predictions were denoted as

*BLUP*, while for the

_{I}*G*model the predictions were designated

_{A}*BLUP*. Additionally, the phenotypic progeny means (

_{A}*M*) for each simulated experiment were also obtained.

It should be noted that due to the balancing conditions under which the simulation were conducted and the use of an orthogonal experimental design with no missing data the *BLUP _{I}* predictions do not have selective advantage in relation to the phenotypic means of the progenies

*(M)*(Kennedy and Sorensen, 1988). Thus, only the results using the mean

*M*will be shown, and these should be understood as being equal to

*BLUP*.

_{I}For each pre-fixed heritability, corresponding to 1000 simulated experiments, we obtained the mean estimates of the genetic variance among the F_{4:5} progenies () and heritability on an F_{4:5} progeny mean basis () for both models (*G _{I}* and

*G*). The selection procedures of the F

_{A}_{4:5}progenies (mean

*M*and

*BLUP*) were evaluated and compared based on the true genotypic values (

_{A}*GV*) so for both procedures we estimated the Spearman correlations (

*r*), proportions of coincidence in the 5%, 10% and 25% selection fractions for lower and upper extremes on the ranking of progenies, and the mean

_{S}*GV*for different percentages (0.4% (best progeny), 5%, 10% and 25%) of the superior selected progenies.

The relative efficiency (*RE*) of *BLUP _{A}* in relation to the mean

*M*was determined by

*RE =*{[

*r*]

_{S(BLUPA, GV)}- r_{S(M, GV)}*/ r*} x 100, where

_{S(M, GV)}*r*is the Spearman correlation between

_{S(BLUPA, GV)}*BLUP*and

_{A}*GV*of the selected progenies, and

*r*is the Spearman correlation between the mean

_{S(M, GV)}*M*and

*GV*of the selected progenies. The relative efficiency was obtained also using proportions of coincidence. We also calculated the relative gain (

*RG*) of

*BLUP*in relation to the mean

_{A}*M*using

*RG =*{[

*MGV*]

_{BLUPA}- MGV_{M}*/ MGV*} x 100, where

_{M}*MGV*is the mean genotypic values of the selected progenies calculated by

_{BLUPA}*BLUP*while

_{A}*MGV*is the mean genotypic values of the selected progenies calculated by the mean

_{M}*M*method.

**Results**

For both models, the mean estimates of the genetic parameters associated with the F_{4:5} progenies were close to the pre-fixed parametric values for all the heritabilities studied (Table 1). Nevertheless in all the evaluations the genetic parameter estimates by the *G _{A}* model, which includes parentage among progenies, were more accurate than those produced by the

*G*model. For instance, for 25% heritability the standard error associated with the estimate in the

_{I}*G*model was 33.5% but was 44.4% for the G

_{A}*model. However, when 50% heritability was considered the same percentages were very similar at 21.3% for the*

_{I}*G*model and 22.4% for the G

_{A}*model (Table 1). This demonstrates that it is advantageous to take into account genealogy (as normally occurs when using the pedigree method), although this advantage decreases as the character heritability increases (*

_{I}__>__50%).

The selection units (mean *M* and *BLUP _{A}*) were evaluated regarding the correct ranking of F

_{4:5}progenies using the true associated genotypic values (

*GV*) as reference. As expected, the mean correlation estimates

*r*of the evaluated procedures were directly proportional to the heritability values (Table 2). The heritability represents a determination coefficient between the

_{S}*M*and

*GV*means, so that the mean values of the correlation estimates (

*r*) can be used to verify the quality of the simulations, since they are approximate estimators of (Falconer and Mackay, 1996). The

_{S(M, GV)}*r*correlation values were near the expected ()values for all the heritabilities studied (Table 2),

_{S(M, GV)}*e.g.*for 25% heritability the mean

*r*correlation estimate was 0.48 and therefore close to the population value of 0.5.

_{S(M, GV)}

The *r _{S(BLUPA, GV)}* mean correlations between

*BLUP*and

_{A}*GV*were superior to the

*r*mean correlation values for all the heritability values studied (Table 2), demonstrating that the incorporation of genetic relationships results in greater efficiency regarding the correct classification of progenies, particularly in situations where heritability was less than 50%. For example, for 10% heritability the relative efficiency (

_{S}(_{M,GV})*RE*) of

*BLUP*to mean

_{A}*M*was 43.33% while for 50% heritability the

*RE*dropped to only 14.5%, this being confirmed by the high

*r*correlation (0.87) between

_{S}(_{M, BLUPA})*BLUP*and mean

_{A}*M*(Table 2).

The identification of the progenies in the extremes on their ranking is of greater relevance for breeders than the classification of all the progenies evaluated. For this we estimated the coincidence proportions (*C _{(BLUPA, GV)}*) of selected progenies using the

*BLUP*and mean

_{A}*M*methods and compared the results with selected progenies based on the real

*GV*(Table 3) and found that for a fixed selection fraction (

*s*) value the corresponding proportions of estimated coincidences in the lower and upper selected extremes were identical.

For all heritability and selection fractions *s* values the *C _{(BLUPA, GV)}* between

*BLUP*and

_{A}*GV*were higher than the

*C*between the mean

_{(M, GV)}*M*and

*GV*, (Table 3), supporting our

*r*estimates (Table 2). As mentioned above, the

_{S}*RE*of the

*BLUP*in relation to mean

_{A}*M*in the coincidences with

*GV*was proportionally greater for lower heritability values and selected fractions (

*s*). For example, for 10% heritability and

*s*= 5%

*C*was 0.21 and

_{(BLUPA, GV)}*C*0.15 (an

_{(M, GV)}*RE*of 40%), while at the same heritability but with

*s*= 25%

*RE*and was only 15.4%. When heritability was 50%

*RE*dropped to 26.3% for

*s =*5% and 13.3% for

*s*= 25% (Table 3). This indicates that the efficiency of

*BLUP*could possibly be higher when breeders work with a trait of low heritability and apply high selection intensity.

_{A}Breeders want the selected progenies to have the highest possible genetic values, which ultimately reflect the gain achieved with selection, disregarding the progeny by environment interaction. In the selected fractions (*s*) comparing the *GV* means of the *BLUP _{A}*-selected progenies with the mean

*M*for the pedigree method it can be seen verify that the

*BLUP*procedure offers an advantage at all the heritabilities studied, although with lower relative gains (

_{A}*RG*). The

*RG*increased continuously as heritability and

*s*decreased (Table 4),

*e.g.*, for 10% heritability and

*s*= 0.4% the

*RG*for

*BLUP*was 0.77%, while for

_{A}*s*= 25% it was 0.59%. With higher h

^{2}

_{p}heritabilities

*RG*and at = 50%

*RG*= 0.65% for

*s*= 0.4% and 0.48% for

*s*= 25%.

**Discussion**

The fact that the dominance effect is not included in our genetic model does not constitute a severe restriction because the simulation involved F_{4:5} progenies that represent only 7/64 of the dominance variance (Ramalho *et al.*, 2001). Furthermore, most of the characters of self-fertilized plants, including grain yield, usually show a non-expressive dominance effect (Souza and Ramalho, 1995; Novoselovic *et al.*, 2004). Van Oeveren and Stam (1992) have also verified that the dominance has little importance in computer simulations of autogamous crops.

A restriction of the simulation was the lack of visual selection, normally occurring in the pedigree method, during the conduction stages (Fehr, 1987). However, there are many literature reports on the inefficiency of visual selection for characters with low (< 50%) heritability, which is the case for most characters of economic importance (Silva *et al.*, 1994; Cutrim *et al.*, 1997). Thus, taking two random plants to generate subsequent progenies probably causes no expressive effect on the results, especially for heritabilities lower than 50%.

It is worth mentioning that the *BLUP _{A}* and mean

*M*estimators are phenotypic data functions that both predict additive genetic values (

*AGV*) associated with progenies. The best estimator is therefore the one that results in the

*AGV*ranked closest to the ranking by the true

*AGV*(White and Hodge, 1989). It should be noted that, with the adoption of the

*G*model, the predictions of the random effect of progenies (

_{A}**â**) or

*BLUP*correspond to the predictions of the additive genetic value (

_{A}*AGV*) of the progenies (Lynch and Walsh, 1998), indicating the theoretical superiority of the

*BLUP*procedure in relation to mean

_{A}*M*.

An important aspect must be mentioned concerning the meaning of unbiasedness for *BLUP*, more specifically for *BLUP _{A}*. As mentioned above, in the present context

*BLUP*is a predictor of the

_{A}*AGV*of progenies (

**a**) derived from the same breeding population, whose expectation, by definition, is zero [

*E*(

**a**)=0] (Falconer and Mackay, 1996). In this context,

*BLUP*is unbiased in the sense that

_{A}*E*(

**â**)=

*E*(

**a**) (Robinson, 1991), where

**â**denotes the

*AGV*predictors. The conclusion that can be drawn is, differently from the concept of unbiasedness for estimators of fixed effects, that the unbiasedness property for

*BLUP*does not refer to predictions of individual random effects [

*E*(

**â**)=

**a**] but to the expected value of these effects. Summing up, when , while with ® 0 we have

**â**=

*E*(a/y)®0 , demonstrating that the shrinkage effect in

*BLUP*predictions is more marked when the values are low, resulting in lower

_{A}*r*correlation estimates. Thus the results of simulation showed in a concordant way that when heritability diminishes information on parentage becomes more important, so that with higher heritability (> 50%) the genotypic values are already well-determined by the mean phenotypic values (

_{S(M, BLUPA)}*M*) (Duarte and Vencovsky, 2001).

In general, our simulation showed that the inclusion of parentage among the progenies of the pedigree method using the *BLUP _{A}* procedure resulted in slightly higher selections gains and more accurate estimates of genetic parameters than when this relationship information was ignored. This possibly compensates for the additional work invested in obtaining these records, especially when investigating low-heritability traits. Our results are supported by other published research showing that higher selection gains can be reached when using the

*G-*model or

_{A}*BLUP*procedure (Durel

_{A}*et al.*, 1998; Bromley

*et al.*, 2000). A study by Panter and Allen (1995) comparing two

*BLUP*models (with and without the inclusion of information about genetic parentage between lines) for prediction of soybean crossings showed no marked differences between the

*BLUP*models, yet the model which takes parentage into consideration performed better.

**Acknowledgments**

This research was financially supported by the Brazilian Agencies CAPES and CNPq. The authors gratefully acknowledge Dr. Eduardo Bearzoti for his excellent comments and suggestions.

**References**

Bernardo R (2002) Breeding for Quantitative Traits in Plants. Stemma Press, Woodbury, 359 pp. [ Links ]

Bromley CM, Van Vleck LD, Johnson BE and Smith OS (2000) Estimation of genetic variance in corn from F_{1} performance with and without pedigree relationship among inbred lines. Crop Sci 40:651-655. [ Links ]

Cantú M (2002) Dominando o Delphi 6: A Bíblia. MAKRON Books, São Paulo, 1104 pp. [ Links ]

Cutrim VA, Ramalho MAP and Carvalho AM (1997) Eficiência da seleção visual na produtividade de grãos de arroz (*Oryza sativa* L.) irrigado. Pesq Agropeq Bras 32:601-606. [ Links ]

Dempster A, Laird N and Rubin D (1977) Maximum likelihood from incomplete data via the EM Algorithm. JR Stat Soc Ser B 39:1-38. [ Links ]

Duarte JB and Vencovsky R (2001) Estimação e predição por modelo linear misto com ênfase na ordenação de médias de tratamentos genéticos. Sci Agric 58:109-117. [ Links ]

Durel CE, Laurens F, Fouillet A and Lespinasse Y (1998) Utilization of pedigree information to estimate genetic parameters from large unbalanced data sets in apple. Theor Appl Genet 96:1077-1085. [ Links ]

Falconer DS and Mackay TFC (1996) Introduction to Quantitative Genetics. 4th ed. Longman, London, 464 pp. [ Links ]

Fehr WR (1987) Principles of Cultivar Development: Theory and Technique. MacMillan Publishing Company, New York, 527 pp. [ Links ]

Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423-447. [ Links ]

Henderson CR, Kempthorne O, Searle SR and Von Krosigk CM (1959) The estimation of environmental and genetic trends from records subject to culling. Biometrics 13:192-218. [ Links ]

Kennedy BW and Sorensen DA (1988) Properties of mixed-model methods for prediction of genetic merit under different genetic models in selected and unselected populations. In: Weir B, Goodman MM and Namkoong G (eds) Second International Conference Quantitative Genetics. North Carolina State University, Raleigh, pp 91-103. [ Links ]

Lynch M and Walsh B (1998) Genetics and Analysis of Quantitative Traits. Sinauer Associates, Inc., Sunderland, 948 pp. [ Links ]

Mrode RA (1996) Linear Models for the Prediction of Animal Breeding Values. Biddles, Guildford, 184 pp. [ Links ]

Novoselovic D, Baric M, Drezner G, Gunjaca J and Lalic A (2004) Quantitative inheritance of some wheat plant traits. Genet Mol Biol 27:92-98. [ Links ]

Panter DM and Allen FL (1995) Using best linear unbiased predictions to enhance breeding for yield in soybean: II Selection of superior crosses from a limited number of yield trials. Crop Sci 35:405-410. [ Links ]

Patterson HD and Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58:545-554. [ Links ]

Ramalho MAP, Abreu AFB and Santos JB (2001) Melhoramento de espécies autógamas. In: Nass LL, Valois ACC, Melo IS and Inglis MCV (eds) Recursos Genéticos e Melhoramento de Plantas. Fundação MT, Rondonópolis, pp 201-230. [ Links ]

Resende MDV (2002) Genética Biométrica e Estatística no Melhoramento de Plantas Perenes. Embrapa Informação Tecnológica, Brasília, 975 pp. [ Links ]

Robinson GK (1991) That *BLUP* is a good thing: The estimation of random effects. Stat Sci 6:15-51. [ Links ]

Silva HD, Ramalho MAP, Abreu AFB and Martins LA (1994) Efeito da seleção visual para produtividade de grãos em populações segregantes do feijoeiro. II. Seleção entre famílias. Cienc Prat 18:181-185. [ Links ]

Souza GA and Ramalho MAP (1995) Estimates of genetic and phenotypic variance of some traits of dry bean using a segregating population from the cross Jalo x Small White. Rev Bras Genet 18:87-91. [ Links ]

Van Oeveren AJ and Stam P (1992) Comparative simulation studies on the effects of selection for quantitative traits in autogamous crop: Early selection versus single seed descent. Heredity 69:342-351. [ Links ]

Wang J, van Ginkel M, Podlich D, YE G, Trethowan R, Pfeiffer W, Delacy IH, Cooper M and Rajaram S (2003) Comparison of two breeding strategies by computer simulation. Crop Sci 43:1764-1773. [ Links ]

White TL and Hodge GR (1989) Predicting Breeding Values with Applications in Forest Tree Improvement. Kluwer Academic Publishers, Dordrecht, 363 pp. [ Links ]

** Send correspondence to: **José Airton Rodrigues Nunes

Departamento de Planejamento e Política Agrícola

Centro de Ciências Agrárias

Universidade Federal do Piauí, Campus Socopo

Bairro Ininga, 64049-550 Teresina, PI, Brazil

E-mail: jarnunes@ufpi.br

Received: February 2, 2007; Accepted: June 11, 2007.

*Senior Editor: Ernesto Paterniani*