## Servicios Personalizados

## Articulo

## Indicadores

## Links relacionados

## Compartir

## Brazilian Journal of Genetics

##
*versión On-line* ISSN 1678-4502

### Braz. J. Genet. v.20 n.3 Ribeirão Preto sep. 1997

#### http://dx.doi.org/10.1590/S0100-84551997000300022

Use of canonical variates in genetic divergence studies

*Daniel Furtado Ferreira*^{1} and Gabriel Dehon Sampaio Peçanha Rezende^{ 2}

^{1}Departamento de Ciências Exatas, Universidade Federal de Lavras, UFLA, Caixa Postal 37,

37200-000 Lavras, MG, Brasil. Send correspondence to D.F.F

^{2}Aracruz Florestal, Aracruz, Espírito Santos, ES, Brasil.

**ABSTRACT **

Exact correlation coefficients between a canonical variate and measured traits were derived to evaluate genetic divergence among varieties. This method allows the plant breeder to determine which traits contribute significantly to genetic divergence and, also, to identify the most important among them. An example is presented, related to a trial where 28 varieties of maize were evaluated for two traits.

**INTRODUCTION **

Genetic divergence is one of the most important parameters evaluated by plant breeders in starting a breeding program. This is a necessary, but not sufficient, condition for the occurrence of heterosis and the generation of a population with broad genetic variability. Subsequently, heterosis is directly proportional to genetic divergence and to dominance squared (Falconer, 1981; Cruz, 1990; Ferreira, 1993) and is also associated with adaptation.

The usual approach to make inferences about genetic divergence is to adopt predictive methodologies. Among them, diallel crosses are the most important. In this case, crosses are made among materials and a great number of hybrids is obtained. These must be evaluated over several years and environmental conditions, which increases the initial cost of the breeding program.

A second approach is to use multivariate methods to estimate genetic divergence and then predict hybrid performance. In this case, it is not necessary to make crosses. Furthermore, a large number of materials may be successfully evaluated (Hallauer and Miranda Filho, 1981).

In the latter approach, a large number of traits must be measured. A canonical variate technique is often used to reduce the number of these traits, through a linear combination of them, without a significant loss of the total variation. Additionally, this technique takes into account the structure of residual covariances. Thus, it allows plant breeders to obtain information about traits that are important for genetic divergence among varieties. This information can be obtained from the correlation between canonical variates and traits. This coefficient of correlation may be subdivided into two parts, the first is variation among varieties (phenotypic) and the other is residual variation (within-group).

Ferreira (1993) presents an approach to estimate the correlation coefficient due to variation among varieties, when the residual covariance matrix is not different from the identity matrix. However, when this assumption is not verified, the approach can lead breeders to discard important traits. This study was initiated to derive the exact correlation coefficients between a canonical variate and the traits used in genetic divergence studies.

**METHODOLOGY **

Let X_{1}, X_{2}, ..., X_{p} be the traits (1, 2, ..., p), measured in a replicated variety trial to evaluate genetic divergence. Let Y_{1}, Y_{2}, ..., Y_{p} be the canonical variates (Fisher’s discriminant function) which are linear combinations of the traits, given by:

i = 1, 2, ..., p

Let T and E be the sum of squares and product matrix (p x p) due to varieties and residual, respectively, and l_{i} and be eigenvalues and eigenvectors (p x 1) related to the **i**th canonical variate. The residual covariance matrix is considered to be the same for every variety generating a pooled matrix. Then, the variance among varieties and the residual variance of the **i**th canonical variate are:

The covariance among varieties and the residual covariance between two different canonical variates are:

The eigenvalues (l_{i}) and eigenvectors () related to the **i**th canonical variate are obtained from the solution of the homogeneous indeterminate system:

A convenient way to solve equation (6) is by transforming it into a problem of determination of principal components (Johnson and Wichern, 1988).

First, the transformation matrix S^{-1} (Bock, 1975) is obtained by carrying out a Cholesky decomposition of E, such that E = SS’, where S is a lower triangular matrix. Then:

The same transformation may be applied to T:

The new system to be solved is given as follows:

Where is the **i**th eigenvector of the transformed system (9).

The solution to equation (9) is obtained with the extraction of eigenvalues and eigenvectors from matrix L. The eigenvalues are invariant under nonsingular transformation (Bock, 1975), but the eigenvectors () are modified and must be recovered by:

The approximate correlation coefficient due to varieties between the traits and the canonical variates is given by the correlation coefficient () between the **i**th principal component of equation (9) and the **k**th trait (Johnson and Wichern, 1988):

where is the **k**th element of the **i**th eigenvector (), and S_{k,k} is the sum of squares among varieties of the **k**th trait under the nonsingular transformation, obtained in the diagonal of matrix **L**, given in equation (8).

To determine the exact correlation due to variation among varieties () between the **i**th canonical variate (Y_{i}) and the **k**th trait (X_{k}), let let and , as presented in equation (1). Thus:

From equation (6), it is clear that:

And using (13) in (12):

where s_{e(k,j)} is the **k**th row and **j**th column element of **E **(symmetric). X_{k} and Y_{i} variances are:

where s_{t(k,k)} is the **k**th row and column element of **T**.

Then:

Generally, a normalized solution () is taken to the eigenvector :

In this case:

where c_{i} is a scale factor. Then:

In this specific case, the correlation coefficient given in equation (17) must be divided by . Since equation (20) belongs to the numerator of the expression, it is biased by a factor of , where c_{i} is obtained from equation (19). Thus:

To determine the correlation coefficient due to residual variation between the **i**th canonical variate and the **k**th trait () the following results are needed:

The variances due to X_{k} and Y_{i} residual variation are:

Then, the exact residual correlation coefficient is:

With the normalized solution (18), equation (25) results in:

**EXAMPLE **

Part of the data presented by Ferreira (1993) was used in this example. Only two traits were used to illustrate the estimation of the correlation coefficients. The traits were X_{1} (stalk diameter - SD) and X_{2} (number of leaves - NL) measured in 28 varieties of maize, evaluated in a trial with two replications. The estimated **T** and **E** matrices were:

The estimated S^{-1} matrix was:

And the estimated **L** matrix, obtained from equation (8), was:

The estimated eigenvalues and eigenvectors were:

l_{1 }= 2.3487 and l_{2 }= 0.9300;

The estimates of the approximate correlation coefficient due to variation among varieties, between the canonical variates (Y_{1} and Y_{2}) and the traits (X_{1} and X_{2}), were obtained from equation (11) and are presented in Table I.

**Table I** - Estimates of approximate correlation coefficients due to variation among varieties, between the canonical variates (Y_{1} and Y_{2}) and the traits (X_{1} and X_{2}), obtained from equation (11).

Traits | Canonical variates | |

Y_{1} | Y_{2} | |

X_{1} (SD) | 0.9728 | 0.2315 |

X_{2} (NL) | -0.5151 | 0.8571 |

SD, Stalk diameter; NL, number of leaves.

The eigenvalues are invariant under nonsingular transformation of variables, but the eigenvectors must be recovered through equation (10). The results are:

It can be verified that:

Using this result and equation (17), the exact correlation coefficients among varieties, between the traits and the canonical variates, were obtained and are presented in Table II.

**Table II** - Estimates of exact correlation coefficients among varieties, between the traits (X_{1} and X_{2}) and the canonical variates (Y_{1} and Y_{2}) obtained from equation (17).

Traits | Canonical variates | |

Y_{1} | Y_{2} | |

X_{1} (SD) | 0.9728 | 0.2315 |

X_{2} (NL) | -0.1957 | 0.9807 |

SD, Stalk diameter; NL, number of leaves.

The estimates of approximate and exact correlation coefficients among varieties were the same for X_{1} and the respective canonical variates, but were different for X_{2} and the canonical variate (Tables I and II). Canonical variate Y_{1} accounted for 71.64% of total variation. Since X_{1} presented a high correlation (0.9728) with it, X_{1} was considered the most important trait contributing to variability; therefore, it discriminates among most of the divergent varieties.

Estimates of residual correlation coefficients were obtained from equation (25) and are presented in Table III.

**Table III** - Residual correlation coefficients between the traits (X_{1} and X_{2}) and the canonical variates (Y_{1} and Y_{2}), obtained from equation (25).

Traits | Canonical variates | |

Y_{1} | Y_{2} | |

X_{1} (SD) | 0.9354 | 0.3539 |

X_{2} (NL) | -0.1243 | 0.9927 |

SD, Stalk diameter; NL, number of leaves.

X_{1} presented a high residual correlation coefficient with the main canonical variate (Y_{1}). On the other hand, X_{2} presented a high estimated residual correlation coefficient with the canonical variate Y_{2}, which was less important to genetic divergence among varieties (Table III).

The purpose of this example was to show the utility of the correlation coefficients presented in equations 17, 21, 25 and 26, derived in this paper. This method allows the plant breeder to determine which traits are the most important for genetic divergence among materials under study. This knowledge is useful in choosing parental lines during the first stages of breeding programs and also in the maintenance of germplasm banks.

**RESUMO **

O coeficiente de correlação exato entre uma variável canônica e a característica mensurada foi derivado com a finalidade de avaliar a divergência genética entre variedades. O presente método permite ao melhorista determinar qual característica tem contribuição significativa para a divergência genética e, também, identificar as mais importantes entre elas. Um exemplo real, relativo a um ensaio com 28 variedades de milho mensuradas para duas características, foi apresentado.

**REFERENCES **

**Bock, R.D**. (1975). *Multivariate Statistical Methods in Behavioral Research*. MacGraw-Hill, New York, pp. 623. [ Links ]

**Cruz, C.D**. (1990). Aplicações de algumas técnicas multivariadas no melhoramento de plantas. Doctoral thesis, ESALQ, USP, Piracicaba, SP. [ Links ]

**Falconer, D.S**. (1981). *Introduction to Quantitative Genetics*. 2nd edn. Longman, London, pp. 340. [ Links ]

**Ferreira, D.F**. (1993). Métodos de avaliação da divergência genética em milho e suas relações com os cruzamentos dialélicos*.* Master’s thesis, UFLA, Lavras, MG. [ Links ]

**Hallauer, R.** and** Miranda Filho, J.B. **(1981). *Quantitative Genetics in Maize Breeding.* Iowa State University Press, Ames, pp. 468. [ Links ]

**Johnson, R.** and** Wichern, D.W**. (1988). *Applied Multivariate Statistical Analysis*. 2nd edn. Prentice Hall, New York, pp. 607. [ Links ]

**(Received October 2, 1996) **