COMPARISON OF SIMILARITY COEFFICIENTS BASED ON RAPD MARKERS IN THE COMMON BEAN *

The alterations caused by eight different similarity coefficients were evaluated in the clustering and ordination of 27 common bean (Phaseolus vulgaris L.) cultivars analyzed by RAPD markers. The Anderberg, simple matching, Rogers and Tanimoto, Russel and Rao, Ochiai, Jaccard, Sorensen-Dice, and Ochiai II’s coefficients were tested. Comparisons among the coefficients were made through correlation analysis of genetic distances obtained by the complement of these coefficients, dendrogram evaluation (visual inspection and consensus fork index CIC), projection efficiency in a two-dimensional space, and groups formed by Tocher’s optimization procedure. The employment of different similarity coefficients caused few alterations in cultivar classification, since correlations among genetic distances were larger than 0.86. Nevertheless, the different similarity coefficients altered the projection efficiency in a two-dimensional space and formed different numbers of groups by Tocher’s optimization procedure. Among these coefficients, Russel and Rao’s was the most discordant and the Sorensen-Dice was considered the most adequate due to a higher projection efficiency in a two-dimensional space. Even though few structural changes were suggested in the most different groups, these coefficients altered some relationships between cultivars with high genetic similarity. *Part of a thesis presented by J.M.D. to the Universidade Federal de Lavras, Lavras, MG, in partial fulfillment of the requirements for the Master’s de-


INTRODUCTION
Studies of divergence and phylogenetic relationships between and within vegetable species of agricultural interest have been one of the most concrete contributions of molecular markers to germplasm organization, plant genetics and breeding.Multivariate techniques such as clustering and ordination analyses for a simplified representation of the results are frequently employed in these studies.The predecessor of these analyses is the construction of a similarity (or distance) matrix between the cultivars being evaluated.Jackson et al. (1989) commented that employment of these techniques has revealed some problems.The objective nature of the analyses is compromised by the subjective choice of the clustering method and/or the similarity-dissimilarity coefficient.
Several coefficients have been proposed (Sokal and Sneath, 1963;Sneath and Sokal, 1973;Johnson and Wichern, 1988).Similarity coefficients specific for dichotomic variables, especially co-occurrence measures, are suggested for use with RAPD type molecular markers.These coefficients employ several reasons of similarity or differences by total comparisons, and their values vary from 0 to 1 (Skroch et al., 1992).Though many coefficients are available, published studies usually do not justify their preference for any one in particular.Considering that clus-tering and ordination results can be influenced by this choice (Gower and Legendre, 1986;Jackson et al., 1989), these coefficients need to be better understood, so that the most efficient ones can be employed.
In this study, the alterations caused by eight different similarity coefficients on the subsequent clustering and ordination analyses of 27 common bean (Phaseolus vulgaris L.) cultivars analyzed by RAPD markers were evaluated.The most adequate coefficient was identified for the study of genetic divergence in these cultivars.

MATERIAL AND METHODS
Similarity coefficients were compared among 27 common bean cultivars (Table I) analyzed by RAPD markers.Procedures for DNA extraction, RAPD reaction and electrophoresis were essentially as described by Nienhuis et al. (1995).
From a zero and one matrix constructed by 137 medium/strong RAPD bands, where zero represented an absence of the band and one the presence, genetic similarity estimates (sg ij ) between each pair of i and j cultivars were performed for eight similarity coefficients (Table II).Similarities derived from these coefficients were transformed into genetic distance measures by the following equation: dg ij = 1 -sg ij .All the genetic similarity matrices met the presuppositions for transformation into genetic distances described by Johnson and Wichern (1988), that is, all of them were non-negative definite.Similarity analyses were done with the NTSYS-PC program (Rohlf, 1992).
Coefficients were compared by Spearman's correlation between the genetic distances generated by the complement of these coefficients, and also by the evaluation of alterations caused by these different coefficients in the subsequent clustering analyses (construction of den-taxonomic unit (OTU).The different dendrograms were subjectively compared using visual inspection, and then contrasted with consensus trees using the CI C index or consensus fork index, obtained from comparisons of all pair of dendrogram combinations (Rohlf, 1982).
The CI C index gives a relative estimate of dendrogram similarity.It is obtained by dividing the number of common ramifications between the dendrograms by the maximum possible number of ramifications, which is n-2 for integrally resolved dendrograms (n corresponds to the number of OTU) (Rohlf, 1982).Dendrograms were obtained from the 'SANH-Clustering' option and the CI C index by the 'CONSENSUS-Consensus tree' option, both in the NTSYS-PC program (Rohlf, 1992).
The methodology of Cruz and Viana (1994) was employed, from the GENES program (Cruz, 1997), for the projection of distances in a two-dimensional space.Similarity coefficients were compared by the efficiency of the projection considering: a) Correlation between the original distances and the distances obtained by the graphic representation of twodimensional dispersion; b) Distortion degree (1 -α), considering that: in which d gij and d oij are the graph distances (two-dimensional space) and original distances (n-dimensional space), respectively, of every pair of i and j cultivars (Cruz and Viana, 1994).c) Stress (s) value, given by: ad drograms and groups formed by Tocher's optimization procedure, cited by Rao, 1952) and ordination analyses (two-dimensional projection (Cruz and Viana, 1994)).
The unweighted pair-group mean arithmetic method (UPGMA) was employed to construct the dendrograms.Each cultivar was denominated an operational This statistical representation of stress (standardized residual sum of squares) was proposed by Kruskal (1964).It is a parameter that determines the goodness-offit of the graphic projection.Stress was classified according to the following suggestions (Kruskal, 1964): The establishment of groups by Tocher's optimization procedure was obtained using the GENES program (Cruz, 1997).The largest value of the set of smaller distances involving each cultivar studied was considered as the inter-group distance limit.
Levels of statistical significance are not given because the analyses are derived from a single initial data matrix and therefore lack independence.

RESULTS AND DISCUSSION
Correlations between the different genetic distances were all close to 1 (Table III), making it evident that they are highly related.Even though all these correlations were elevated, for the Russel and Rao's coefficient they were slightly inferior than for the other coefficients.These high distance correlations seem to be constant for the different coefficients applied to dichotomic variables.Johns et al. (1997), in a study with RAPD markers in the common bean, found correlations on the order of 0.989, 0.972 and 0.979 between the genetic distances obtained by the complement of the simple matching coefficient, Jaccard and Nei-Li's coefficients and Rogers' modified distance, respectively.
The dendrograms constructed from the coefficients studied all presented the same general structure (Figure 1), making it evident that the different coefficients caused few alterations.Considering that the 27 common bean cultivars belonged to two distinct domestication centers and different races, one can perceive that all the dendrograms were capable of dividing the cultivars into their respective domestication centers.However, some modifications in the clustering of races could be found.These results are in agreement with those obtained by Johns et al. (1997), who verified that different similarity coefficients basically did not influence the clustering of common bean landraces from Chile in groups corresponding to the Mesoamerican and Andean domestication centers.
Although all dendrograms were similar, when they were contrasted by the CI C index (Table IV), small differences among them were made evident.By this index, whose amplitude goes from 0 to 1, two dendrograms are considered identical when the calculated value equals one.Therefore, the dendrogram in Figure 1 obtained by Jaccard's similarity coefficient is identical to that of Sorensen-Dice, as were Rogers and Tanimoto's and Ochiai II's.Comparing dendrograms by this index, one can also perceive their division into two groups, based on their similarity: the first corresponded to those constructed by simple matching, Rogers and Tanimoto, Ochiai and Ochiai II's coefficients.The other group involved Anderberg, Jaccard and Sorensen-Dice's coefficients.It was also observed that the dendrogram constructed by the Russel and Rao's coefficient presented very low CI C index values compared to the other coefficients, making it evident that this coefficient is the most discriminating, as a visual evaluation of this dendrogram (Figure 1) shows.These results are highly coherent with those presented by Jackson et al. (1989), who studying relationships between different fish species based on different similarity coefficients, verified that cluster analysis shows a strong similarity between dendrograms obtained with Jaccard and Sorensen-Dice's coefficients, and simple matching and Rogers and Tanimoto's coefficients.
The similar appearance in some dendrograms is not surprising since generalizations about the properties of several coefficients are possible.They are differentiated by the manner in which the matrix of original data (1 = presence of the RAPD marker and 0 = absence) is employed in the similarity estimate.When two genotypes are compared, the following situations occur: a = 1.1; b = 1.0; c = 0.1; d = 0.0.Thus, Jaccard and Sorensen-Dice's coefficients are equivalent, except that double weight is given to positive co-occurrences (a) in the Sorensen-Dice's coefficient.Simple matching and Rogers and Tanimoto's coefficients include negative co-occurrences (d), but differ by the double weight given to the disagreements (that is, b and c) in the latter coefficient.As shown by the results presented, different weights of values of a, b, c and d ( ) cedure, cited by Rao (1952).In this method, individuals (cultivars) are partitioned into non-empty and mutually exclusive sub-groups by means of maximization or minimization of a pre-established measurement (Cruz and Regazzi, 1994), requiring a similarity or distance matrix, which can be obtained by several coefficients.Different coefficients altered the number of groups formed, which varied from six to 10 (Table VI).They also altered the classification of some cultivars in these groups.Prior results (Table VI) had the same tendency, in which Russel and Rao's similarity coefficient once again was the most discriminatory.Sokal and Sneath (1963) reported that this coefficient is, in essence, a 'hybrid' coefficient, excluding negative co-occurrences (d) from the numerator, but not from the denominator.This seems to be of questionable usefulness.
All results obtained illustrate the redundancy of the different coefficients.Anderberg, Jaccard and Sorensen-Dice's coefficients had approximately identical results, as seem to have limited impact on the subsequent analyses.
The different similarity coefficients altered the efficiency of distance projection in a two-dimensional space (Table V).Considering the three evaluation parameters of efficiency separately (distortion, correlation between original and estimated distances and stress), one can perceive the same general tendency of coefficient classification.The distorted values are coherent with the correlation values, and both values are coherent with the level of stress.Stress values are the most widely used parameter to evaluate projection efficiency.The Ochiai's coefficient showed the smallest stress value and Russel and Rao's the biggest.According to Kruskal (1964), simple matching, Sorensen-Dice and Ochiai's coefficients had good levels of stress.Rogers and Tanimoto, Anderberg, Jaccard and Ochiai II's coefficients had regular, and only the Russel and Rao's coefficient had stress considered unsatisfactory.
One cultivar clustering method that has also been employed with RAPD data is Tocher's optimization pro- *Enumeration of the cultivars is according to Table I.Abbreviations defined in Table II.
did the simple matching and Rogers and Tanimoto's coefficients.Nevertheless, similarity coefficient choice should be based on some criteria, because even a few structural changes of more differentiated groups can alter the relationship between cultivars with high genetic similarity.
In relation to these criteria, an important aspect to be considered is the inclusion or exclusion of negative cooccurrences in the coefficient.This inclusion is highly related to the type of trait with which one is working.In some cases, an absence of the trait in both individuals would indicate similarity, but in other cases, this is not necessarily true.Taking into consideration the genetic basis of RAPD markers (Williams et al., 1990), the absence of amplification of a determined band in two genotypes does not necessarily represent genetic similarity between them, which makes those coefficients that exclude these negative co-occurrences from their expression of similarity (Jaccard, Sorensen-Dice, Ochiai, etc.) more adequate for use with this type of marker.Sokal and Sneath (1963) also stated that the simpler the coefficient the easier its interpretation; therefore, simpler coefficients should preferentially be employed.Jaccard's similarity coefficient is the simplest of its category (exclusion of d), and it has been widely employed with RAPD markers.In this study, it was verified that cultivar cluster results with Jaccard and Sorensen-Dice's coefficients were identical, but for the latter, a higher projection efficiency in a two-dimensional space (smaller distortion and stress, higher correlation) was obtained, so that the Sorensen-Dice's coefficient can be considered as the most adequate for a genetic divergence study in this group of cultivars, employing RAPD markers.

ACKNOWLEDGMENTS
Research supported by CAPES and FAPEMIG.

Table II -
Similarity coefficients studied.

Table I -
Common bean cultivars employed for comparison of similarity coefficients and respective races and domestication centers.

Table III -
Spearman's correlation between the genetic distances generated from the complement of the similarity coefficients*.

Table IV -
Comparison of the dendrograms generated by the similarity coefficients employing the values of the consensus fork index (CI C index)*.

Table V -
Distortion degree (%), correlation (r) between the original and estimated distances, and value of the stress (%), obtained by the projection of the genetic distances in a two-dimensional space*.

Table VI -
Clustering of common bean cultivars by means of Tocher's optimization method considering different similarity coefficients*.