Acessibilidade / Reportar erro

A new method to estimate genetic gain in annual crops

Abstracts

The genetic gain obtained by breeding programs to improve quantitative traits may be estimated by using data from regional trials. A new statistical method for this estimate is proposed and includes four steps: a) joint analysis of regional trial data using a generalized linear model to obtain adjusted genotype means and covariance matrix of these means for the whole studied period; b) calculation of the arithmetic mean of the adjusted genotype means, exclusively for the group of genotypes evaluated each year; c) direct year comparison of the arithmetic means calculated, and d) estimation of mean genetic gain by regression. Using the generalized least squares method, a weighted estimate of mean genetic gain during the period is calculated. This method permits a better cancellation of genotype x year and genotype x trial/year interactions, thus resulting in more precise estimates. This method can be applied to unbalanced data, allowing the estimation of genetic gain in series of multilocational trials.


Os ganhos genéticos obtidos pelo melhoramento de caracteres quantitativos podem ser estimados utilizando resultados de ensaios regionais de avaliação de linhagens e cultivares. Um novo método estatístico para esta estimativa é proposto, o qual consiste em quatro passos: a) análise conjunta da série de dados dos ensaios regionais através de um modelo linear generalizado de forma a obter as médias ajustadas dos genótipos e a matriz de covariâncias destas médias; b) para o grupo de genótipos avaliados em cada ano, cálculo da média aritmética das médias ajustadas obtidas na análise conjunta; c) comparação direta dos anos, conforme as médias aritméticas obtidas, e d) estimativa de um ganho genético médio, por regressão. Aplicando-se o método de quadrados mínimos generalizado, é calculada uma estimativa ponderada do ganho genético médio no período. Este método permite um melhor cancelamento das interações genótipo x ano e genótipo x ensaio/ano, resultando assim em estimativas mais precisas. Este método pode ser aplicado a dados desbalanceados, o que possibilita a estimativa dos ganhos genéticos em séries de ensaios multilocais de qualquer amplitude e duração.


METHODOLOGY

A new method to estimate genetic gain in annual crops* * Part of a thesis presented by F.B. to the Universidade Federal de Goiás, Goiânia, GO, in partial fulfillment of the requirements for the Master's degree.

Flávio Breseghello, Orlando Peixoto de Moraisand Paulo Hideo Nakano Rangel

Embrapa Arroz e Feijão,Caixa Postal 179, 75375-000 Santo Antônio de Goiás, GO, Brasil. Send correspondence to F.B.

ABSTRACT

The genetic gain obtained by breeding programs to improve quantitative traits may be estimated by using data from regional trials. A new statistical method for this estimate is proposed and includes four steps: a) joint analysis of regional trial data using a generalized linear model to obtain adjusted genotype means and covariance matrix of these means for the whole studied period; b) calculation of the arithmetic mean of the adjusted genotype means, exclusively for the group of genotypes evaluated each year; c) direct year comparison of the arithmetic means calculated, and d) estimation of mean genetic gain by regression. Using the generalized least squares method, a weighted estimate of mean genetic gain during the period is calculated. This method permits a better cancellation of genotype x year and genotype x trial/year interactions, thus resulting in more precise estimates. This method can be applied to unbalanced data, allowing the estimation of genetic gain in series of multilocational trials.

INTRODUCTION

Genetic gain estimates in breeding programs are important to critically analyze efficiency and to plan new actions and strategies. Institutions working with annual crop breeding routinely conduct a series of trials to compare elite lines and to release new cultivars. Each year lines are replaced in the trials with the expectation that the new ones will be superior. The change in yield mean as a consequence of these substitutions may be considered an estimate of genetic gain. This approach provides information based on data already available from different years and locations without additional costs.

To evaluate maize breeding efficiency in Brazil, Vencovsky et al. (1988) analyzed 20 years of trials. Gain was estimated for each pair of consecutive years as being the variation of annual means minus the variation of the means for the lines common to the two years. Later, Toledo et al. (1990) used the method of Vencovsky et al. (1988) to calculate genetic gain obtained by soybean (Glycine max, Merr.) breeding in Paraná State. They calculated mean genetic gain by the weighted least squares method to avoid cancellation of information obtained in intermediate years. Rodrigues (1990) estimated the variances of annual gains and the covariances of consecutive gains from the number of treatments shared by each pair of years, to obtain a covariance matrix that was used in the generalized least squares method to obtain mean and variance estimates.

Soares (1992) estimated genetic gain for rice (Oryza sativa L.) breeding in the State of Minas Gerais using, in addition to the method of Vencovsky et al. (1988), a method based on the behavior of standard checks, i.e., those present every year of the series. In this method, the deviation between the mean for the lines and the mean for the standard checks is calculated for each year. The estimate of genetic progress is the linear regression coefficient of these deviations in relation to years.

All of these studies were developed to calculate genetic gain per location by analysis of series of balanced data. However, when a series of many years of trials conducted over a large geographic region is analyzed, the data may be unbalanced due to differences between the sets of treatments, number of replications, missing plots, changes in test locations, etc.

The objective of the present study was to propose a new, generalized methodology to estimate genetic gain of quantitative traits, grain yield in particular, using series of balanced or unbalanced data originating from networks of multiple location trials.

MATERIAL AND METHODS

General case

Let us consider a series of data from a certain number of genotypes, each one tested during one or several years of a given period of time, with variable number of locations/year and replications/genotype/location/year. These data can be described by the following simplified mathematical model:

Yijkr = µ + Ak + L/Ajk + R/L/Ajkr + Gi + eijkr,

where Yijkr: rth observation (r = 1, ..., sijk) of genotype i at location j during year k; µ: constant associated with observation Yijkr; Ak: effect of year k (k = 1, ..., a); L/Ajk: effect of location j within year k (j = 1, ..., mk); R/L/Ajkr: effect of replication r within location j for year k (r = 1, ..., sjk); Gi: effect of genotype i (i = 1, ..., n), and eijkr: error associated with observation Yijkr, considered to be independent and normally distributed, with a null mean and a common variance .

The interaction effects related to genotype x year and genotype x location/year should be excluded from the model, i.e., they should be considered as components of experimental error. Thus, the marginal means for each genotype, adjusted for effect of year, location/year and replication/location/year, will represent estimable functions (Searle, 1971). The analysis of variance scheme is presented in Table I.

Analysis of variance should be performed using a procedure compatible with the structure of unbalanced data, such as GLM of the SAS statistical package (SAS Institute Inc., 1985) that provides the least square solutions of the vector of parameters q° and the generalized inverse (X'X)G of X'X, where X is the matrix of the coefficients of parameters in the model.

The vector Y, of the adjusted genotype means (

i ...), is obtained as follows:

= C q°,

where C: coefficient matrix, n x (1 + a + mk + sjk + n), which can be represented by:

In the previous expression, = C q°, and in others to follow, the sign "^" indicates that it is an estimator. The covariance matrix of the adjusted means, V(Y), is given by the expression:

= C (X'X)G C' RMS,

where RMS = residual mean square estimator.

Special cases

In some cases the data for each available trial are balanced, i.e., all treatments have the same number of replications for each experiment, with no missing plots. Two different situations are possible:

Experiments with the same number of replications

If rjk = rj'k' for any j and k value, it is possible to use the mean for the lines at the trial level for analysis, i.e., ijk.. Thus, it is possible to use trial information where there is only mean information. Furthermore, there is an additional advantage of a substantial reduction in computational resource requirements given by elimination of the source of variation due to replication/location/year, with a consequent reduction in the X'X matrix.

However, a disadvantage of this alternative is a reduction in estimate precision. The residual sum of squares loses its component due to the interaction between genotypes and replication/location/year, usually the one of lowest magnitude and of highest weight (larger number of degrees of freedom). However, when the residual mean squares of each assay (MSRjk) and their respective degrees of freedom are available, the residual of joint analysis can be easily corrected, and the precision of the original analysis can be recovered.

Experiments with different numbers of replications (rjk¹ rj'k' for at least one j or j' value and any k or k' values)

In this case, it is also convenient to use the

ijk. means, especially when the advantages mentioned in the previous case are relevant. Considering that in this case the means are not formed by an equal number of replications, it is necessary to repeat each mean as many times as the number of replications that produced it, i.e., sijk times in order to recover the real adjusted treatment means. In this case the replications should not be included in the model because the delineation of each trial is considered to be fully randomized.

By repeating each

ijk. mean sijk times, the same number of degrees of freedom is recovered for the residual source of variation that would be obtained if the original Yijkr information was used, but replication of the means does not contribute to the residual sum of squares since Yijkr = Yijkr' = ijk•. If the sums of squares of residuals for each trial are available, the residual sum of squares of joint analysis can be corrected. If this information is not available, one should use only the number of degrees of freedom corresponding to genotype x year and genotype x location/year interactions. This, however, reduces the power of the method in identifying significant estimates.

Genetic gain estimate

Let

i... be the mean for genotype i, adjusted for year and trials/year, obtained from the combined analysis of the entire period studied. Let be the arithmetic mean of the adjusted i... means exclusively for the genotypes tested during year k. The mean is an estimate of the mean grain yield of the set of lines tested during year k, which recovers the information obtained in other years k', with k' ¹ k.

To calculate the means and their variances and covariances, it is helpful to construct an auxiliary matrix S with a x n dimensions, in which each row refers to one year k and each column to one genotype i. If genotype i was evaluated during year k, the ki cell is filled with 1/nk, where nk is the number of genotypes evaluated during year k; if the genotype was not in the trials conducted during year k, the value of the cell is zero.

The Y* column vector of the means is obtained by the following equation:

* = S.

Genetic gain over two years, consecutive or not, can be estimated by the difference in the respective means. To estimate the weighted mean genetic gain it is necessary to obtain a matrix of covariances of the means, with a x a dimensions, given by:

= S S'.

Mean annual genetic gain is estimated by the linear regression coefficient b1 of as a function of year k, which is obtained by the generalized least squares method (Hoffmann and Vieira, 1987).

where : intercept; : linear regression coefficient for means as a function of year, weighted estimate of mean annual genetic gain; x: known constant matrix with a x 2 dimensions consisting of a column vector of 1's, relative to b0 and a column vector of 1, 2, ..., a, relative to b1, and estimator of the covariance matrix of means.

The b1 variance is the value of the cell in the second row, second column of the V(ß) matrix, which is estimated by:

If the estimate of genetic gain is significant, it is interesting to calculate the percent mean annual gain, using as a base the mean referring to the genotypes tested during the first year of the considered period, .

To illustrate the application of the proposed method, genetic gain was calculated from regional trial data of irrigated rice. The data was the result of nine yield trials of lines and varieties of irrigated rice in the states of Piauí and Maranhão from 1986 to 1990. Table II presents the means for the genotypes in each trial. Only part of the trial results was used. Data were selected only to provide a simple example. Thus, the results obtained here should be considered as hypothetical. This example follows the method proposed for general cases. The example has no missing plots, but there is a trial with a smaller number of replications. Therefore, this example could also be applied to the second special case cited simply by using the data presented in Table II.

RESULTS AND DISCUSSION

The example's combined analysis of variance results are presented in Table III. This combined analysis, which was conducted according to the GLM procedure of the SAS program, provided the generalized inverse [(X'X)G], the least squares solution vector (qo), and the residual mean square (RMS).

In this example case, the coefficient matrix C had dimensions 29 x 79 and can be partially represented as follows:

The adjusted mean vector, Yi , corresponds to the last column of Table II. The auxiliary matrix S used to obtain mean vector Y*, and the V(Y*) matrix, had dimensions 5 x 29 and can be partially represented as follows:

Thus, the Y* vector of arithmetic means of the adjusted means for genotypes is:

and the respective covariance matrix is:

By applying the generalized least squares method, the ß vector was estimated as follows:

and the covariance matrix:

The variance of b1 is 777.0; therefore, the mean genetic gain for the period represented by the data was 71.4 + 27.9 kg/ha/year, significant at 1% level of probability by the t-test. Considering that the mean for the first year in the period was 6245 kg/ha, the relative genetic gain was 1.14% per year.

The proposed method allows the estimation of genetic gain obtained in breeding programs using regional trial data from any geographic and temporal amplitude. It is possible, for example, to calculate grain yield growth rate for a region, a state, or the entire area of interest in a breeding program. The gain for subregions within the same program may show whether the new lines are adapted to specific environmental conditions.

The versatility and strength of this method permit its use even when the data available are unbalanced. This allows maximum use of existing files, and opens possibility to study more remote periods. The use of

ijk. means permits analysis of experimental series for which the original data are not available, but only the genotype means, replication number, and experimental precision parameters. The use of all available information for each line, resulting in adjusted means that are used in place of point observations, reduces the importance of genotype x year and genotype x trial/year interactions, which are the major sources of error in genetic gain studies. The reliability of the estimated gain depends only on the quality of the experimental results available.

The most cumbersome phase in the application of the proposed method is the organization of the files for combined analysis. It is necessary to standardize the identification of each genotype in all the files within the period. However, a subproduct of this procedure is a perfect organization of the files and a retrospective view of the experiments performed. Construction of auxiliary matrix C is also a laborious phase, especially when the number of trials is high and the analysis is done with replications. It is necessary to observe nesting so that the sum of the cells referring to each effect in the model will be equal to one. Automation of this phase via a specific program would greatly simplify application of the method.

Genetic gain estimation by the method proposed in the present study is efficient, precise, and provides highly useful results to critically evaluate genetic plant breeding programs.

ACKNOWLEDGMENTS

We are indebted to Embrapa Meio Norte and to Empresa Maranhense de Pesquisa Agropecuária (Emapa) for providing the experimental data. F.B. is the recipient of a CNPq fellowship.

RESUMO

Os ganhos genéticos obtidos pelo melhoramento de caracteres quantitativos podem ser estimados utilizando resultados de ensaios regionais de avaliação de linhagens e cultivares. Um novo método estatístico para esta estimativa é proposto, o qual consiste em quatro passos: a) análise conjunta da série de dados dos ensaios regionais através de um modelo linear generalizado de forma a obter as médias ajustadas dos genótipos e a matriz de covariâncias destas médias; b) para o grupo de genótipos avaliados em cada ano, cálculo da média aritmética das médias ajustadas obtidas na análise conjunta; c) comparação direta dos anos, conforme as médias aritméticas obtidas, e d) estimativa de um ganho genético médio, por regressão. Aplicando-se o método de quadrados mínimos generalizado, é calculada uma estimativa ponderada do ganho genético médio no período. Este método permite um melhor cancelamento das interações genótipo x ano e genótipo x ensaio/ano, resultando assim em estimativas mais precisas. Este método pode ser aplicado a dados desbalanceados, o que possibilita a estimativa dos ganhos genéticos em séries de ensaios multilocais de qualquer amplitude e duração.

REFERENCES

Hoffmann, R. and Vieira, S. (1987). Análise de Regressão: Uma Introdução à Econometria. 2nd edn. Hucitec, São Paulo.

Rodrigues, J.A.S. (1990). Progresso genético e potencial de risco da cultura do sorgo granífero (Sorghum bicolor (L.) Moench) no Brasil. Doctoral thesis, ESALQ, Piracicaba.

SAS Institute Inc. (1985). SAS User's Guide: Statistics. Version 5. SAS Institute Inc., Cary, NC.

Searle, S.R. (1971). Linear Models. John Wiley & Sons, New York.

Soares, A.A. (1992). Desempenho do melhoramento genético do arroz de sequeiro e irrigado da década de oitenta em Minas Gerais. Doctoral thesis, ESAL, Lavras.

Toledo, J.F.F. de, Almeida, L.A. de, Kiihl, R.A. de S. and Menosso, O.G. (1990). Ganho genético em soja no Estado do Paraná, via melhoramento. Pesq. Agropec. Bras. 25: 89-94.

Vencovsky, R., Morais, A.R., Garcia, J.C. and Teixeira, N.M. (1988). Progresso genético em vinte anos de melhoramento de milho no Brasil. In: Congresso Nacional de Milho e Sorgo, 1986, Sete Lagoas. Anais. Embrapa-CNPMS, Sete Lagoas, pp. 300-306.

(Received June 17, 1997)

  • Hoffmann, R. and Vieira, S. (1987). Análise de Regressăo: Uma Introduçăo ŕ Econometria 2nd edn. Hucitec, Săo Paulo.
  • Rodrigues, J.A.S. (1990). Progresso genético e potencial de risco da cultura do sorgo granífero (Sorghum bicolor (L.) Moench) no Brasil. Doctoral thesis, ESALQ, Piracicaba.
  • SAS Institute Inc. (1985). SAS User's Guide: Statistics Version 5. SAS Institute Inc., Cary, NC.
  • Soares, A.A. (1992). Desempenho do melhoramento genético do arroz de sequeiro e irrigado da década de oitenta em Minas Gerais. Doctoral thesis, ESAL, Lavras.
  • Toledo, J.F.F. de, Almeida, L.A. de, Kiihl, R.A. de S. and Menosso, O.G. (1990). Ganho genético em soja no Estado do Paraná, via melhoramento. Pesq. Agropec. Bras. 25: 89-94.
  • Vencovsky, R., Morais, A.R., Garcia, J.C. and Teixeira, N.M. (1988). Progresso genético em vinte anos de melhoramento de milho no Brasil. In: Congresso Nacional de Milho e Sorgo, 1986, Sete Lagoas Anais Embrapa-CNPMS, Sete Lagoas, pp. 300-306.
  • *
    Part of a thesis presented by F.B. to the Universidade Federal de Goiás, Goiânia, GO, in partial fulfillment of the requirements for the Master's degree.
  • Publication Dates

    • Publication in this collection
      01 Mar 1999
    • Date of issue
      Dec 1998

    History

    • Received
      17 June 1997
    Sociedade Brasileira de Genética Rua Cap. Adelmio Norberto da Silva, 736, 14025-670 Ribeirão Preto SP Brazil, Tel.: (55 16) 3911-4130 / Fax.: (55 16) 3621-3552 - Ribeirão Preto - SP - Brazil
    E-mail: editor@gmb.org.br