Acessibilidade / Reportar erro

Selection in several environments by BLP as an alternative to pooled anova in crop breeding

Seleção em diversos ambientes pelo BLP como alternativa à anava conjunta

Abstracts

Plant breeders often carry out genetic trials in balanced designs. That is not always the case with animal genetic trials. In plant breeding is usual to select progenies tested in several environments by pooled analysis of variance (ANOVA). This procedure is based on the global averages for each family, although genetic values of progenies are better viewed as random effects. Thus, the appropriate form of analysis is more likely to follow the mixed models approach to progeny tests, which became a common practice in animal breeding. Best Linear Unbiased Prediction (BLUP) is not a "method" but a feature of mixed model estimators (predictors) of random effects and may be derived in so many ways that it has the potential of unifying the statistical theory of linear models (Robinson, 1991). When estimates of fixed effects are present is possible to combine information from several different tests by simplifying BLUP, in these situations BLP also has unbiased properties and this lead to BLUP from straightforward heuristics. In this paper some advantages of BLP applied to plant breeding are discussed. Our focus is on how to deal with estimates of progeny means and variances from many environments to work out predictions that have "best" properties (minimum variance linear combinations of progenies' averages). A practical rule for relative weighting is worked out.

BLP; plant breeding; statistical genetics


Os melhoristas de plantas em geral conduzem testes genéticos em delineamentos balanceados, ao contrário do que ocorre com o melhoramento animal. É possível selecionar progênies pela ANAVA conjunta, com base nas médias gerais de cada família. Sabe-se, no entanto, que os valores genéticos de progênies são melhor representados por efeitos aleatórios. As formas de análise dos testes de progênie que parecem mais apropriadas são as que seguem a metodologia de modelos mistos, como no melhoramento animal. Segundo Robinson (1991) o Melhor Preditor Linear Não-Viesado (do inglês, BLUP) não é um método, mas uma propriedade dos estimadores (preditores) dos efeitos aleatórios e pode ser derivada de tantas maneiras diferentes que tem o potencial de unificar as teorias estatísticas de modelos lineares. A presença de bons estimadores para os efeitos fixos e componentes da variância torna possível combinar informações de diferentes testes por algumas simplificações do BLUP. Este trabalho exemplifica as vantagens do Melhor Preditor Linear (BLP) aplicado ao melhoramento de plantas. Procurou-se ilustrar como proceder com estimativas de médias e de variâncias de progênies obtidas em diferentes ambientes para produzir preditores que tenham propriedades "melhor" (no sentido de variância mínima entre todas as combinações lineares entre as médias de progênies). Derivou-se uma regra prática para a produção dos pesos relativos de cada ambiente. O BLP, em alguns casos, é também não-viesado produzindo BLUPs a partir de lógica mais direta.

BLP; genética estatística; melhoramento de plantas


CIÊNCIAS AGRÁRIAS

Selection in several environments by BLP as an alternative to pooled anova in crop breeding

Seleção em diversos ambientes pelo BLP como alternativa à anava conjunta

Júlio Sílvio de Sousa Bueno FilhoI; Roland VencovskyII

IDoutor em Ciências Exatas, Professor Associado - Departamento de Ciências Exatas/DEX - Universidade Federal de Lavras/UFLA - Cx. P. 3037 - 37200-000 Lavras, MG - juliobuenof@gmail.com

IIDoutor, Professor Titular - Departamento de Genética Escola Superior de Agricultura "Luiz de Queiroz"/ESALQ - Universidade de São Paulo/USP - Avenida Pádua Dias, 11 Cx. P. 83 13400-970 - Piracicaba, SP - rvencovs@esalq.usp.br

ABSTRACT

Plant breeders often carry out genetic trials in balanced designs. That is not always the case with animal genetic trials. In plant breeding is usual to select progenies tested in several environments by pooled analysis of variance (ANOVA). This procedure is based on the global averages for each family, although genetic values of progenies are better viewed as random effects. Thus, the appropriate form of analysis is more likely to follow the mixed models approach to progeny tests, which became a common practice in animal breeding. Best Linear Unbiased Prediction (BLUP) is not a "method" but a feature of mixed model estimators (predictors) of random effects and may be derived in so many ways that it has the potential of unifying the statistical theory of linear models (Robinson, 1991). When estimates of fixed effects are present is possible to combine information from several different tests by simplifying BLUP, in these situations BLP also has unbiased properties and this lead to BLUP from straightforward heuristics. In this paper some advantages of BLP applied to plant breeding are discussed. Our focus is on how to deal with estimates of progeny means and variances from many environments to work out predictions that have "best" properties (minimum variance linear combinations of progenies' averages). A practical rule for relative weighting is worked out.

Index terms: BLP, plant breeding, statistical genetics.

RESUMO

Os melhoristas de plantas em geral conduzem testes genéticos em delineamentos balanceados, ao contrário do que ocorre com o melhoramento animal. É possível selecionar progênies pela ANAVA conjunta, com base nas médias gerais de cada família. Sabe-se, no entanto, que os valores genéticos de progênies são melhor representados por efeitos aleatórios. As formas de análise dos testes de progênie que parecem mais apropriadas são as que seguem a metodologia de modelos mistos, como no melhoramento animal. Segundo Robinson (1991) o Melhor Preditor Linear Não-Viesado (do inglês, BLUP) não é um método, mas uma propriedade dos estimadores (preditores) dos efeitos aleatórios e pode ser derivada de tantas maneiras diferentes que tem o potencial de unificar as teorias estatísticas de modelos lineares. A presença de bons estimadores para os efeitos fixos e componentes da variância torna possível combinar informações de diferentes testes por algumas simplificações do BLUP. Este trabalho exemplifica as vantagens do Melhor Preditor Linear (BLP) aplicado ao melhoramento de plantas. Procurou-se ilustrar como proceder com estimativas de médias e de variâncias de progênies obtidas em diferentes ambientes para produzir preditores que tenham propriedades "melhor" (no sentido de variância mínima entre todas as combinações lineares entre as médias de progênies). Derivou-se uma regra prática para a produção dos pesos relativos de cada ambiente. O BLP, em alguns casos, é também não-viesado produzindo BLUPs a partir de lógica mais direta.

Termos para indexação: BLP, genética estatística, melhoramento de plantas.

INTRODUCTION

One of the most typical features of plant genetical essays is the high level of balance and the better precision of variance component estimates compared with animal counterparts. Plant breeders do not always keep accurate records of genetic relatedness, in some allogamous species, for instance, open pollinated families are taken as half sib progenies for selection purposes. However, this is compensated by the large number of progeny and the replications of the trials in same locations and years, conditions often impossible to attain with animal trials.

The usual analysis that guides plant breeders in selecting progenies tested in several environments is the pooled ANOVA, based on marginal averages of each family tested. The underlying assumption for this approach being that for each genetic value there is a constant effect. This is the heuristics of the fixed statistical modelling.

On the other hand, breeders know that the genetic values of individuals measured by the performance of their progenies are better viewed as random effects, representing small samples of the possible genotypes being tested. This way, the genetic value of a family in each environment can be viewed as one of the possible realizations of an unobservable random variable (the "true" breeding value). The intraclass correlation of those different realizations reflects the heritability for the selection based on progeny means. Thus, the appropriate form of analysis is more likely to follow the mixed model approach to progeny tests, which is a common practice in animal breeding after Henderson's lifework.

In particular, Best Linear Unbiased Prediction (BLUP) has been considered as the most appropriate form in the analysis of genetic data in animal breeding trials. Following Robinson (1991) this BLUP is not a "method" but a feature of estimators (predictors) of random effects and can be derived in so many ways that has the potential of unifying the statistical theory of linear models.

However, the presence of fair estimates of fixed effects, coupled with a large amount of historical of data on variance components and heritability estimates makes it possible to combine information from several different tests by relaxing some BLUP assumptions. The purpose of this work is to introduce and exemplify the advantages of Best Linear Prediction (BLP) applied to plant breeding. Our focus is on how to deal with estimates of progeny means and variances from many environments to work out predictions that have "best" properties (in the sense of being minimum variance linear combinations).

In analogy with Robinson (1991), we think that if breeders can obtain good estimates of fixed effects, the BLP "method" will have also unbiased properties and we produce BLUPs from straightforward heuristics.

Statistical steps to establish BLUP as a reasonable classical predictor of genetic values may be found in some seminal papers since Henderson et al. (1959). In particular, for BLP derivation and features like maximization of correct ranks (under normality assumptions), see Henderson (1963). For forestry breeding purposes, White & Hodge (1988) was the pioneer BLP work, and White & Hodge (1989) comprehensively covers both BLUP and BLP subjects. In forestry breeding, Resende et al. (1993) were the first Brazilian researchers to introduce BLP (and soon other mixed model techniques). Although these techniques are straightforward we have found no works in which BLP was applied to crop species.

METHODOLOGY

A progeny trial is a way to predict the breeding value of parents from realizations of its progeny phenotype. To make comprehensive selection decisions, plant breeders usually run the same trials in multiple environments (locations or years), Table I being a schematic representation of such trials.

The statistical model for each realization is:

In this model, m is the general mean, Aj is the fixed effect of environment j, pi is the random effect of breeding value of progeny i; (pA)ij is the random effect of interaction of ith breeding value with jth environment and ij is the mean experimental error.

The underlying assumptions for BLP purposes are that the fixed effects and variance components of random effects of each level of the model are known. Then:

And the following variance component estimates are taken as true variance parameters:

: progeny variance; : phenotypic variance among overall means of progenies, a function of progeny and (average) error variances; j: phenotypic variance among local means of progenies for environment j, that is a function of all the random term variances.

Note that this is a usual set of assumptions if the objective is selection. In this situation, all nuisance parameters are taken as fixed, as in animal breeding literature - for a comprehensive justification see White & Hodge (1988, 1989).

In this case BLP is calculated considering usual selection as a special case of BLP in which only one parameter (the global means in the right column of Table 1) is available for guiding the selection process.

At the next hierarchical level, with one progeny mean per environment (as displayed in Table 1) another set of breeding values with more general BLP properties may be calculated as weighted averages of environmental means. In this case, the weights are in some way inversely proportional to environmental (and non-additive genetical) variances.

When there are no population differences (fixed effects) among genetic values of progenies, the breeding values may be predicted by the following vector, that has BLP properties (Searle et al., 1992):

in which C is the covariance matrix between genetic values and its phenotypic realization (t stands for a transposition operation); V-1 is the inverse of the covariance phenotypic data matrix; y is the phenotypic data vector and E(.) stands for mathematical expectation.

Following White & Hodge (1989), both variance among the predictions and covariance between the predicted and true breeding values may be calculated by:

A "goodness of fit" measure for the prediction process could be calculated by the correlation between the true and predicted genetic values:

in this expression is the true genetic variance (assumed as known) of breeding values. This square root of the coefficient of genotypic determination is the so called "accuracy" and in our context will be used for comparison purposes.

RESULTS AND DISCUSSION

For the global means we can derive the result:

Using all the elements in Table 1, for more than 2 environments, we get:

in wich b is given by:

in the above expression |V| is the determinant of V matrix:

Note that

i includes progeny variance plus progeny-environment interaction variance as well as the environmental error variance estimate, although for only two environments it is not possible to separate these variance component estimates into distinct fractions.

Example

Let us take four environments with variance component estimates given by:

The breeding value of i progeny can more realistic be calculated from:

The relative weights being 1.000, 0.833, 0.625 and 0.625.

The global mean approach in this case corresponds to rank genetic values with the same weight for each environment, regardless of the differeces in precision of the estimated means. For the estimation of breeding progress by selecting the ith progeny, the deviance of the progeny mean must then be multiplied by the heritability of the selection based on global progeny means. The relative weights in this case are all 0.25.

As a practical rule - derived in Bueno Filho (1997), proofs shown in apendix - the relative weights of each environmental mean are in fact products of the differences between the phenotypic variances and the common progeny variance for the other environments.

In the example described in Table 1 we get the following weights (Table 2):

Plant breeders are often interested in selecting progenies tested in different locations and years, but in general each progeny is not tested in all environments.

Selection based on average environmental conditions is the most common objective, although in some special cases "target environments" are elected either by being the most (or least) productive locations, or by some desired (or undesired) climate, soil features, typical years with particular experimental properties, etc.

Although pooled ANOVA is a well-known technique, it is not designed to handle unbalanced data of this type which shows dependencies between factor levels. However, the application of more general BLP to such situations is straightforward, as highlighted in the example of Table 3, that shows experimental means from the experiments described in Table 2, and Table 4 that shows the correspondent breeding values predictions.

The most important factor in determining relative weights for selection is the similarity of the progeny performance in both tested and target environment. For example, in Table 4 selection for A1 uses the specific means of progenies in environment A1 weighted by the progeny variance in environment A1 (that includes progeny by environment interaction variance), while selecting for average environment this interaction is not included in the weighting process.

It is remarkable in Table 4 that progeny 3 has a positive predicted genetic value for environment A1 that may lead it to be selected, although not tested! In conventional pooled ANOVA we do not even use the means of untested progeny and the target environment approach is unthinkable...

One of the most interesting features of best linear unbiased predictors (BLUP) is thier ability to maximize the true ranks for normal populations. This is a theoretically proven fact which is well established by some 40 years of mixed model studies including Monte Carlo simulations. This means that rank differences between global means and BLP must lead to greater errors when using ANOVA, because the first, being more similar to BLUP, has less strong assumptions on the covariance structure. Examples that are sometimes given in which wrong ranking results from poor variance component estimates are insignificant for practical plant breeding purposes.

Registers on pedigree and molecular data information on genetic relatedness great increases the superiority of BLUP and BLP over pooled ANOVA.

Another very important fact about BLP (White & Hodge, 1989) is that linear combinations of BLP have BLP properties. This suggest that to work out BLUP values rather than local means could lead to BLUP properties of breeding values, and these can be calculated in a BLP like approach. This is as valid for multivariate analysis as for any other technique involving BLUP.

In the example given in this paper, it is possible to calculate BLUP values for all progenies and any target environment by using BLUP of progeny breeding values calculated independently in each environment. This potentially simplifies the computational task in calculating BLUP in a single model and allows simpler estimates that concatenates different years, locations, experimental designs, generations, types of progeny, relatedness of genetical material, etc.

Restricted Expectation of Maximum Likelihood (REML) is becoming a standard in likelihood-based analysis of linear mixed models (Searle et al., 1992). Although much of the previous BLP work demonstrates the readiness for its use at the operational level, statistical analysis of genetic trials is mainly concerned with producing complex models to manage REML like estimates and predictions directly.

However, a robust handling of progeny breeding values of progenies as random variables can be easily managed using BLP techniques from usual tables that contains information from several environments in pooled annalysis. To ignore these facts, results in greater estimation (prediction) errors and could increase progeny misclassification in selection trials.

Aknowledgments

The authors wish to thank Prof. Dr. Lucas Monteiro Chaves for insights on how to prove the matrix identities.

(Received in august 23, 2006 and approved in june 16, 2008)

APENDIX

Derivation of b'

We will consider the covariance matrices in a more amenable form without generality losses. This is achieved by taking a single vector of constants from C' and the following form for V:

c'=Jlnx in which Jnxp is the matrix that has only 1 as its elements;

Vn= Jn + Dn, in which Dij = di, if i=j and Dij = 0, if i1j. Thus di represents:

in which σ2pAi can be any linear combination of variance components in which other terms plus additive genetical variance occurs (Bueno Filho, 1997).

For any number (n) of environments, the determinant of V may be calculated as:

This is a recurrencing relationship that may be proved by as follows. This relation is used to prove the analogous relation for the b vector, that has the following elements:

Equations (1) and (2) can easily be proved for special cases, e.g.:

Let call V*j the matrix that has element dj = 0.

Taking equation (1) as true, it then follows that:

and the same operations for including an environments in set k must be:

Taking the last columns for calculating the determinants:

in which V*j is the nucleus of coffactors for row j and column n of Vn matrix. These matrices may be obtained by letting dj = 0 as follows for V2 and V3:

It may be easily shown that for odd j values this expression will be negative and positive for even j values. So for any j this will be:

The negative signals result from combination of odd exponents of coffactors or by changing the assortment of column Jn-1,1 when taking determinants of minor order. So, this expression can be simplifyed to:

and Vk+1 determinant result in:

, that is the same of equation (1) q.e.d.

In an analogous way, b' in equation (2) is calculated by sums of the coffactors for j columns as follows:

and by analogy, we get for the following j element:

The target environment approach could be easily adapted by taking 1+di instead of 1 in the i element of the c' covariance vector. This leads to an additional factor of di b'jin the weighting for environment j.

  • BUENO FILHO, J. S. DE S. Modelos mistos na predição de valores genéticos aditivos em testes de progênies florestais 1997. 118 f. Tese (Doutorado) - Escola Superior de Agricultura de Luiz de Queiroz, Piracicaba, 1997.
  • HENDERSON, C. R. Selection index and expected genetic advance Washington, DC: NRC, 1963. 982 p.
  • HENDERSON, C. R.; Kempthorne, O.; Searle, S.; Krossig, C. N. von. Estimation of environmental and genetic trends from records subject to culling. Biometrics, Washington, v. 13, p. 192-218, 1959.
  • RESENDE, M. D. V. DE; HIGA, A. R.; LAVORANTI, O. J. Predição de valores genéticos no melhoramento de Eucalyptus: melhor preditor linear. In: Congresso Florestal Pan-Americano, 1., 1993, Curitiba, PR. Anais.. Curitiba: IUFRO, 1993. p. 144-147.
  • ROBINSON, G. K. That BLUP is a good thing: the estimation of random effects. Statistical Science, v. 6, p. 15-51, 1991.
  • SEARLE, S.; CASELLA, G.; MCCULLOCH, C. E. Variance components New York: J. Willey, 1992.
  • WHITE, T. L.; HODGE, G. R. Best linear prediction of breeding values in a forest tree improvement program.Theorethical and Applied Genetics, v.76, p. 719-727, 1988.
  • WHITE, T. L.; HODGE, G. R. Predicting breeding values with applications in forest tree improvementDordrecht: Kluwer Academy, 1989.

Publication Dates

  • Publication in this collection
    13 Nov 2009
  • Date of issue
    Oct 2009

History

  • Accepted
    16 June 2008
  • Received
    23 Aug 2006
Editora da Universidade Federal de Lavras Editora da UFLA, Caixa Postal 3037 - 37200-900 - Lavras - MG - Brasil, Telefone: 35 3829-1115 - Lavras - MG - Brazil
E-mail: revista.ca.editora@ufla.br