versão impressa ISSN 0100-8455
Braz. J. Genet. v.20 n.3 Ribeirão Preto set. 1997
Use of canonical variates in genetic divergence studies
Daniel Furtado Ferreira1 and Gabriel Dehon Sampaio Peçanha Rezende 2
1Departamento de Ciências Exatas, Universidade Federal de Lavras, UFLA, Caixa Postal 37,
37200-000 Lavras, MG, Brasil. Send correspondence to D.F.F
2Aracruz Florestal, Aracruz, Espírito Santos, ES, Brasil.
Exact correlation coefficients between a canonical variate and measured traits were derived to evaluate genetic divergence among varieties. This method allows the plant breeder to determine which traits contribute significantly to genetic divergence and, also, to identify the most important among them. An example is presented, related to a trial where 28 varieties of maize were evaluated for two traits.
Genetic divergence is one of the most important parameters evaluated by plant breeders in starting a breeding program. This is a necessary, but not sufficient, condition for the occurrence of heterosis and the generation of a population with broad genetic variability. Subsequently, heterosis is directly proportional to genetic divergence and to dominance squared (Falconer, 1981; Cruz, 1990; Ferreira, 1993) and is also associated with adaptation.
The usual approach to make inferences about genetic divergence is to adopt predictive methodologies. Among them, diallel crosses are the most important. In this case, crosses are made among materials and a great number of hybrids is obtained. These must be evaluated over several years and environmental conditions, which increases the initial cost of the breeding program.
A second approach is to use multivariate methods to estimate genetic divergence and then predict hybrid performance. In this case, it is not necessary to make crosses. Furthermore, a large number of materials may be successfully evaluated (Hallauer and Miranda Filho, 1981).
In the latter approach, a large number of traits must be measured. A canonical variate technique is often used to reduce the number of these traits, through a linear combination of them, without a significant loss of the total variation. Additionally, this technique takes into account the structure of residual covariances. Thus, it allows plant breeders to obtain information about traits that are important for genetic divergence among varieties. This information can be obtained from the correlation between canonical variates and traits. This coefficient of correlation may be subdivided into two parts, the first is variation among varieties (phenotypic) and the other is residual variation (within-group).
Ferreira (1993) presents an approach to estimate the correlation coefficient due to variation among varieties, when the residual covariance matrix is not different from the identity matrix. However, when this assumption is not verified, the approach can lead breeders to discard important traits. This study was initiated to derive the exact correlation coefficients between a canonical variate and the traits used in genetic divergence studies.
Let X1, X2, ..., Xp be the traits (1, 2, ..., p), measured in a replicated variety trial to evaluate genetic divergence. Let Y1, Y2, ..., Yp be the canonical variates (Fishers discriminant function) which are linear combinations of the traits, given by:
i = 1, 2, ..., p
Let T and E be the sum of squares and product matrix (p x p) due to varieties and residual, respectively, and li and be eigenvalues and eigenvectors (p x 1) related to the ith canonical variate. The residual covariance matrix is considered to be the same for every variety generating a pooled matrix. Then, the variance among varieties and the residual variance of the ith canonical variate are:
The covariance among varieties and the residual covariance between two different canonical variates are:
The eigenvalues (li) and eigenvectors () related to the ith canonical variate are obtained from the solution of the homogeneous indeterminate system:
A convenient way to solve equation (6) is by transforming it into a problem of determination of principal components (Johnson and Wichern, 1988).
First, the transformation matrix S-1 (Bock, 1975) is obtained by carrying out a Cholesky decomposition of E, such that E = SS, where S is a lower triangular matrix. Then:
The same transformation may be applied to T:
The new system to be solved is given as follows:
Where is the ith eigenvector of the transformed system (9).
The solution to equation (9) is obtained with the extraction of eigenvalues and eigenvectors from matrix L. The eigenvalues are invariant under nonsingular transformation (Bock, 1975), but the eigenvectors () are modified and must be recovered by:
The approximate correlation coefficient due to varieties between the traits and the canonical variates is given by the correlation coefficient () between the ith principal component of equation (9) and the kth trait (Johnson and Wichern, 1988):
where is the kth element of the ith eigenvector (), and Sk,k is the sum of squares among varieties of the kth trait under the nonsingular transformation, obtained in the diagonal of matrix L, given in equation (8).
To determine the exact correlation due to variation among varieties () between the ith canonical variate (Yi) and the kth trait (Xk), let let and , as presented in equation (1). Thus:
From equation (6), it is clear that:
And using (13) in (12):
where se(k,j) is the kth row and jth column element of E (symmetric). Xk and Yi variances are:
where st(k,k) is the kth row and column element of T.
Generally, a normalized solution () is taken to the eigenvector :
In this case:
where ci is a scale factor. Then:
In this specific case, the correlation coefficient given in equation (17) must be divided by . Since equation (20) belongs to the numerator of the expression, it is biased by a factor of , where ci is obtained from equation (19). Thus:
To determine the correlation coefficient due to residual variation between the ith canonical variate and the kth trait () the following results are needed:
The variances due to Xk and Yi residual variation are:
Then, the exact residual correlation coefficient is:
With the normalized solution (18), equation (25) results in:
Part of the data presented by Ferreira (1993) was used in this example. Only two traits were used to illustrate the estimation of the correlation coefficients. The traits were X1 (stalk diameter - SD) and X2 (number of leaves - NL) measured in 28 varieties of maize, evaluated in a trial with two replications. The estimated T and E matrices were:
The estimated S-1 matrix was:
And the estimated L matrix, obtained from equation (8), was:
The estimated eigenvalues and eigenvectors were:
l1 = 2.3487 and l2 = 0.9300;
The estimates of the approximate correlation coefficient due to variation among varieties, between the canonical variates (Y1 and Y2) and the traits (X1 and X2), were obtained from equation (11) and are presented in Table I.
SD, Stalk diameter; NL, number of leaves.
The eigenvalues are invariant under nonsingular transformation of variables, but the eigenvectors must be recovered through equation (10). The results are:
It can be verified that:
Using this result and equation (17), the exact correlation coefficients among varieties, between the traits and the canonical variates, were obtained and are presented in Table II.
SD, Stalk diameter; NL, number of leaves.
The estimates of approximate and exact correlation coefficients among varieties were the same for X1 and the respective canonical variates, but were different for X2 and the canonical variate (Tables I and II). Canonical variate Y1 accounted for 71.64% of total variation. Since X1 presented a high correlation (0.9728) with it, X1 was considered the most important trait contributing to variability; therefore, it discriminates among most of the divergent varieties.
Estimates of residual correlation coefficients were obtained from equation (25) and are presented in Table III.
SD, Stalk diameter; NL, number of leaves.
X1 presented a high residual correlation coefficient with the main canonical variate (Y1). On the other hand, X2 presented a high estimated residual correlation coefficient with the canonical variate Y2, which was less important to genetic divergence among varieties (Table III).
The purpose of this example was to show the utility of the correlation coefficients presented in equations 17, 21, 25 and 26, derived in this paper. This method allows the plant breeder to determine which traits are the most important for genetic divergence among materials under study. This knowledge is useful in choosing parental lines during the first stages of breeding programs and also in the maintenance of germplasm banks.
O coeficiente de correlação exato entre uma variável canônica e a característica mensurada foi derivado com a finalidade de avaliar a divergência genética entre variedades. O presente método permite ao melhorista determinar qual característica tem contribuição significativa para a divergência genética e, também, identificar as mais importantes entre elas. Um exemplo real, relativo a um ensaio com 28 variedades de milho mensuradas para duas características, foi apresentado.
Bock, R.D. (1975). Multivariate Statistical Methods in Behavioral Research. MacGraw-Hill, New York, pp. 623. [ Links ]
Cruz, C.D. (1990). Aplicações de algumas técnicas multivariadas no melhoramento de plantas. Doctoral thesis, ESALQ, USP, Piracicaba, SP. [ Links ]
Falconer, D.S. (1981). Introduction to Quantitative Genetics. 2nd edn. Longman, London, pp. 340. [ Links ]
Ferreira, D.F. (1993). Métodos de avaliação da divergência genética em milho e suas relações com os cruzamentos dialélicos. Masters thesis, UFLA, Lavras, MG. [ Links ]
Hallauer, R. and Miranda Filho, J.B. (1981). Quantitative Genetics in Maize Breeding. Iowa State University Press, Ames, pp. 468. [ Links ]
Johnson, R. and Wichern, D.W. (1988). Applied Multivariate Statistical Analysis. 2nd edn. Prentice Hall, New York, pp. 607. [ Links ]
(Received October 2, 1996)