Use of canonical variates in genetic divergence studies

Ferreira, Daniel Furtado; Rezende, Gabriel Dehon Sampaio Peçanha

doi:10.1590/S0100-84551997000300022

Abstracts

Exact correlation coefficients between a canonical variate and measured traits were derived to evaluate genetic divergence among varieties. This method allows the plant breeder to determine which traits contribute significantly to genetic divergence and, also, to identify the most important among them. An example is presented, related to a trial where 28 varieties of maize were evaluated for two traits

O coeficiente de correlação exato entre uma variável canônica e a característica mensurada foi derivado com a finalidade de avaliar a divergência genética entre variedades. O presente método permite ao melhorista determinar qual característica tem contribuição significativa para a divergência genética e, também, identificar as mais importantes entre elas. Um exemplo real, relativo a um ensaio com 28 variedades de milho mensuradas para duas características, foi apresentado

Use of canonical variates in genetic divergence studies

Daniel Furtado Ferreira¹ and Gabriel Dehon Sampaio Peçanha Rezende²

¹Departamento de Ciências Exatas, Universidade Federal de Lavras, UFLA, Caixa Postal 37,

37200-000 Lavras, MG, Brasil. Send correspondence to D.F.F

²Aracruz Florestal, Aracruz, Espírito Santos, ES, Brasil.

ABSTRACT

Exact correlation coefficients between a canonical variate and measured traits were derived to evaluate genetic divergence among varieties. This method allows the plant breeder to determine which traits contribute significantly to genetic divergence and, also, to identify the most important among them. An example is presented, related to a trial where 28 varieties of maize were evaluated for two traits.

INTRODUCTION

Genetic divergence is one of the most important parameters evaluated by plant breeders in starting a breeding program. This is a necessary, but not sufficient, condition for the occurrence of heterosis and the generation of a population with broad genetic variability. Subsequently, heterosis is directly proportional to genetic divergence and to dominance squared (Falconer, 1981; Cruz, 1990; Ferreira, 1993) and is also associated with adaptation.

The usual approach to make inferences about genetic divergence is to adopt predictive methodologies. Among them, diallel crosses are the most important. In this case, crosses are made among materials and a great number of hybrids is obtained. These must be evaluated over several years and environmental conditions, which increases the initial cost of the breeding program.

A second approach is to use multivariate methods to estimate genetic divergence and then predict hybrid performance. In this case, it is not necessary to make crosses. Furthermore, a large number of materials may be successfully evaluated (Hallauer and Miranda Filho, 1981).

In the latter approach, a large number of traits must be measured. A canonical variate technique is often used to reduce the number of these traits, through a linear combination of them, without a significant loss of the total variation. Additionally, this technique takes into account the structure of residual covariances. Thus, it allows plant breeders to obtain information about traits that are important for genetic divergence among varieties. This information can be obtained from the correlation between canonical variates and traits. This coefficient of correlation may be subdivided into two parts, the first is variation among varieties (phenotypic) and the other is residual variation (within-group).

Ferreira (1993) presents an approach to estimate the correlation coefficient due to variation among varieties, when the residual covariance matrix is not different from the identity matrix. However, when this assumption is not verified, the approach can lead breeders to discard important traits. This study was initiated to derive the exact correlation coefficients between a canonical variate and the traits used in genetic divergence studies.

METHODOLOGY

Let X₁, X₂, ..., X_p be the traits (1, 2, ..., p), measured in a replicated variety trial to evaluate genetic divergence. Let Y₁, Y₂, ..., Y_p be the canonical variates (Fishers discriminant function) which are linear combinations of the traits, given by:

i = 1, 2, ..., p

Let T and E be the sum of squares and product matrix (p x p) due to varieties and residual, respectively, and l_i and be eigenvalues and eigenvectors (p x 1) related to the ith canonical variate. The residual covariance matrix is considered to be the same for every variety generating a pooled matrix. Then, the variance among varieties and the residual variance of the ith canonical variate are:

The covariance among varieties and the residual covariance between two different canonical variates are:

The eigenvalues (l_i) and eigenvectors () related to the ith canonical variate are obtained from the solution of the homogeneous indeterminate system:

A convenient way to solve equation (6) is by transforming it into a problem of determination of principal components (Johnson and Wichern, 1988).

First, the transformation matrix S^-1 (Bock, 1975) is obtained by carrying out a Cholesky decomposition of E, such that E = SS, where S is a lower triangular matrix. Then:

The same transformation may be applied to T:

The new system to be solved is given as follows:

Where is the ith eigenvector of the transformed system (9).

The solution to equation (9) is obtained with the extraction of eigenvalues and eigenvectors from matrix L. The eigenvalues are invariant under nonsingular transformation (Bock, 1975), but the eigenvectors () are modified and must be recovered by:

The approximate correlation coefficient due to varieties between the traits and the canonical variates is given by the correlation coefficient () between the ith principal component of equation (9) and the kth trait (Johnson and Wichern, 1988):

where is the kth element of the ith eigenvector (), and S_k,k is the sum of squares among varieties of the kth trait under the nonsingular transformation, obtained in the diagonal of matrix L, given in equation (8).

To determine the exact correlation due to variation among varieties () between the ith canonical variate (Y_i) and the kth trait (X_k), let let and , as presented in equation (1). Thus:

From equation (6), it is clear that:

And using (13) in (12):

where s_e(k,j) is the kth row and jth column element of E (symmetric). X_k and Y_i variances are:

where s_t(k,k) is the kth row and column element of T.

Then:

Generally, a normalized solution () is taken to the eigenvector :

In this case:

where c_i is a scale factor. Then:

In this specific case, the correlation coefficient given in equation (17) must be divided by . Since equation (20) belongs to the numerator of the expression, it is biased by a factor of , where c_i is obtained from equation (19). Thus:

To determine the correlation coefficient due to residual variation between the ith canonical variate and the kth trait () the following results are needed:

The variances due to X_k and Y_i residual variation are:

Then, the exact residual correlation coefficient is:

With the normalized solution (18), equation (25) results in:

EXAMPLE

Part of the data presented by Ferreira (1993) was used in this example. Only two traits were used to illustrate the estimation of the correlation coefficients. The traits were X₁ (stalk diameter - SD) and X₂ (number of leaves - NL) measured in 28 varieties of maize, evaluated in a trial with two replications. The estimated T and E matrices were:

The estimated S^-1 matrix was:

And the estimated L matrix, obtained from equation (8), was:

The estimated eigenvalues and eigenvectors were:

l₁= 2.3487 and l₂= 0.9300;

The estimates of the approximate correlation coefficient due to variation among varieties, between the canonical variates (Y₁ and Y₂) and the traits (X₁ and X₂), were obtained from equation (11) and are presented in Table I.

Thumbnail

Traits Canonical variates Y1Y2X1 (SD) 0.9728 0.2315 X2 (NL) -0.5151 0.8571

Table I - Estimates of approximate correlation coefficients due to variation among varieties, between the canonical variates (Y₁ and Y₂) and the traits (X₁ and X₂), obtained from equation (11).

SD, Stalk diameter; NL, number of leaves.

The eigenvalues are invariant under nonsingular transformation of variables, but the eigenvectors must be recovered through equation (10). The results are:

It can be verified that:

Using this result and equation (17), the exact correlation coefficients among varieties, between the traits and the canonical variates, were obtained and are presented in Table II.

Thumbnail

Traits Canonical variates Y1Y2X1 (SD) 0.9728 0.2315 X2 (NL) -0.1957 0.9807

Table II - Estimates of exact correlation coefficients among varieties, between the traits (X₁ and X₂) and the canonical variates (Y₁ and Y₂) obtained from equation (17).

SD, Stalk diameter; NL, number of leaves.

The estimates of approximate and exact correlation coefficients among varieties were the same for X₁ and the respective canonical variates, but were different for X₂ and the canonical variate (Tables I and II). Canonical variate Y₁ accounted for 71.64% of total variation. Since X₁ presented a high correlation (0.9728) with it, X₁ was considered the most important trait contributing to variability; therefore, it discriminates among most of the divergent varieties.

Estimates of residual correlation coefficients were obtained from equation (25) and are presented in Table III.

Thumbnail

Traits Canonical variates Y1Y2X1 (SD) 0.9354 0.3539 X2 (NL) -0.1243 0.9927

Table III - Residual correlation coefficients between the traits (X₁ and X₂) and the canonical variates (Y₁ and Y₂), obtained from equation (25).

SD, Stalk diameter; NL, number of leaves.

X₁ presented a high residual correlation coefficient with the main canonical variate (Y₁). On the other hand, X₂ presented a high estimated residual correlation coefficient with the canonical variate Y₂, which was less important to genetic divergence among varieties (Table III).

The purpose of this example was to show the utility of the correlation coefficients presented in equations 17, 21, 25 and 26, derived in this paper. This method allows the plant breeder to determine which traits are the most important for genetic divergence among materials under study. This knowledge is useful in choosing parental lines during the first stages of breeding programs and also in the maintenance of germplasm banks.

RESUMO

O coeficiente de correlação exato entre uma variável canônica e a característica mensurada foi derivado com a finalidade de avaliar a divergência genética entre variedades. O presente método permite ao melhorista determinar qual característica tem contribuição significativa para a divergência genética e, também, identificar as mais importantes entre elas. Um exemplo real, relativo a um ensaio com 28 variedades de milho mensuradas para duas características, foi apresentado.

(Received October 2, 1996)

Bock, R.D (1975). Multivariate Statistical Methods in Behavioral Research MacGraw-Hill, New York, pp. 623.
Cruz, C.D (1990). Aplicações de algumas técnicas multivariadas no melhoramento de plantas. Doctoral thesis, ESALQ, USP, Piracicaba, SP.
Falconer, D.S (1981). Introduction to Quantitative Genetics 2nd edn. Longman, London, pp. 340.
Ferreira, D.F (1993). Métodos de avaliação da divergência genética em milho e suas relações com os cruzamentos dialélicos. Master’s thesis, UFLA, Lavras, MG.
Hallauer, R. and Miranda Filho, J.B. (1981). Quantitative Genetics in Maize Breeding. Iowa State University Press, Ames, pp. 468.
Johnson, R. and Wichern, D.W (1988). Applied Multivariate Statistical Analysis 2nd edn. Prentice Hall, New York, pp. 607.

Publication Dates

Publication in this collection
02 Sept 2004
Date of issue
Sept 1997

History

Received
02 Oct 1996

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] Bock, R.D (1975). Multivariate Statistical Methods in Behavioral Research MacGraw-Hill, New York, pp. 623.

[2] Cruz, C.D (1990). Aplicações de algumas técnicas multivariadas no melhoramento de plantas. Doctoral thesis, ESALQ, USP, Piracicaba, SP.

[3] Falconer, D.S (1981). Introduction to Quantitative Genetics 2nd edn. Longman, London, pp. 340.

[4] Ferreira, D.F (1993). Métodos de avaliação da divergência genética em milho e suas relações com os cruzamentos dialélicos. Master’s thesis, UFLA, Lavras, MG.

[5] Hallauer, R. and Miranda Filho, J.B. (1981). Quantitative Genetics in Maize Breeding. Iowa State University Press, Ames, pp. 468.

[6] Johnson, R. and Wichern, D.W (1988). Applied Multivariate Statistical Analysis 2nd edn. Prentice Hall, New York, pp. 607.

Traits	Canonical variates
Traits	Y₁	Y₂
X₁ (SD)	0.9728	0.2315
X₂ (NL)	-0.5151	0.8571