Genetic similarity among soybean (Glycine max (L) Merrill) cultivars released in Brazil using AFLP markers

Ana Lídia V. Bonato Eberson S. Calvo Isaias O. Geraldi Carlos Alberto A. Arias About the authors


Genetic similarity among soybean genotypes was studied by applying the amplified fragment length polymorphism (AFLP) technique to 317 soybean cultivars released in Brazil from 1962 to 1998. Genetic similarity (GS) coefficients were estimated using the coefficient of Nei and Li (Nei and Li 1979), and the cultivars were clustered using the unweighted pair-group method with averages (UPGMA). The parentage coefficients of 100 cultivars released between 1984 and 1998 were calculated and correlated with the genetic similarity obtained by the markers. The genetic similarity coefficients varied from 0.17 to 0.97 (x = 0.61), with 56.8% of the coefficients being above 0.60 and only 9.7% equal to or less than 0.50. The similarity coefficients have remained constant during the last three decades. Dendrogram interpretation was hindered by the large number of cultivars used, but it was possible to detect groups of cultivars formed as expected from their genealogy. Another dendrogram, composed of 63 cultivars, allowed a better interpretation of the groups. Parentage coefficients among the 100 cultivars varied from zero to one (x = 0.21). However, no significant correlation (r = 0.12) was detected among the parentage coefficients and the AFLP genetic similarity. The results show the efficiency of AFLP markers in large scale studies of genetic similarity and are discussed in relation to soybean breeding in Brazil.

AFLP; genetic similarity; molecular markers; parentage coefficient; soybean



Genetic similarity among soybean (Glycine max (L) Merrill) cultivars released in Brazil using AFLP markers

Ana Lídia V. BonatoI; Eberson S. CalvoII; Isaias O. GeraldiIII; Carlos Alberto A. AriasIV

ICentro Nacional de Pesquisa de Trigo, Empresa Brasileira de Pesquisa Agropecuária, Passo Fundo, RS, Brazil

IITropical Melhoramento & Genética Ltda., Cambé, PR, Brazil

IIIDepartamento de Genética, Escola Superior de Agricultura "Luiz de Queiroz", Universidade de São Paulo, Piracicaba, SP, Brazil

IVCentro Nacional de Pesquisa de Soja, Empresa Brasileira de Pesquisa Agropecuária, Londrina, PR, Brazil

Send correspondence to


Genetic similarity among soybean genotypes was studied by applying the amplified fragment length polymorphism (AFLP) technique to 317 soybean cultivars released in Brazil from 1962 to 1998. Genetic similarity (GS) coefficients were estimated using the coefficient of Nei and Li (Nei and Li 1979), and the cultivars were clustered using the unweighted pair-group method with averages (UPGMA). The parentage coefficients of 100 cultivars released between 1984 and 1998 were calculated and correlated with the genetic similarity obtained by the markers. The genetic similarity coefficients varied from 0.17 to 0.97 ( = 0.61), with 56.8% of the coefficients being above 0.60 and only 9.7% equal to or less than 0.50. The similarity coefficients have remained constant during the last three decades. Dendrogram interpretation was hindered by the large number of cultivars used, but it was possible to detect groups of cultivars formed as expected from their genealogy. Another dendrogram, composed of 63 cultivars, allowed a better interpretation of the groups. Parentage coefficients among the 100 cultivars varied from zero to one ( = 0.21). However, no significant correlation (r = 0.12) was detected among the parentage coefficients and the AFLP genetic similarity. The results show the efficiency of AFLP markers in large scale studies of genetic similarity and are discussed in relation to soybean breeding in Brazil.

Key words: AFLP, genetic similarity, molecular markers, parentage coefficient, soybean.


Knowing the degree of genetic similarity among different genotypes is of fundamental importance for efficient plant breeding programs. Such information is useful for organizing a working collection, identifying heterotic groups, and selecting parents for crosses.

The parentage coefficient (Malécot, 1947) based on information regarding genotype genealogy has been used to estimate genetic similarity and to study the genetic structure of cultivated soybean germplasm. Pedigree analyses of American germplasm showed a high level of genetic relationship (Delannay et al., 1983), but more recent studies have revealed that cultivars from the north and south of the United States have contrasting genetic bases (Gizlice et al., 1996). These studies also showed that the genetic diversity of North American soybean germplasm as a whole has been reduced over the last 50 years (Gizlice et al., 1993). Pedigree analyses have also shown a narrow genetic base in Brazilian soybean germplasm (Vello et al., 1988), although the use of Malécot's coefficients depends on the availability and precision of genealogical information.

Genetic diversity between individuals may be directly estimated by using biochemical and molecular markers, although the use of biochemical markers, such as isoenzymes, has been hindered in soybean by the low degree of polymorphism in this specie (Cox et al., 1985). This problem has been overcome by using molecular markers. Sneller et al. (1997) clearly separated elite American lines from the north and south using restriction fragment length polymorphism (RFLP) markers. This technique has also been used to study exotic soybean germplasm and it has allowed the identification of different gene pools (Kisha et al., 1998). Similar studies have been carried out using other types of molecular markers, such as RAPD markers (Abdelnoor et al., 1995; Brown-Guedira et al., 2000), simple sequence repeat (SSR) markers (microsatellites) (Diwan and Cregan, 1997) and amplified fragment length polymorphism (AFLP) markers (Zhu et al., 1999; Ude et al., 2003).

A comparative study on the performance of different types of markers in soybean genetic analysis showed that microsatellite markers have a greater degree of polymorphism and, thus, better discrimination between genotypes. However, AFLP markers have greater multiplex efficiency (i.e. a large number of loci can be simultaneously analyzed in a gel) and are considered an efficient tool for distinguishing highly related genotypes (Powell et al., 1996). As a result of such characteristics, this technique continues to be used in current genetic diversity studies (Oliveira et al., 2004).

Although genetic diversity studies using molecular markers have been carried out with various types of markers and diverse genotypes, as expected, they confirmed the presence of a larger amount of genetic diversity in exotic germplasm (Zhu et al., 1999). However, genetic diversity estimates between soybean cultivars obtained using both the parentage coefficient and molecular markers, have shown variable results. The magnitude of the correlation between these two estimates was 0.54 to 0.91 in RFLP studies (Manjarrez-Sandoval et al., 1997; Kisha et al., 1998) but Helms et al. (1997) obtained no apparent relationship between the two types of estimates when using RAPD markers. Abdelnoor et al. (1995) reported some cases of discrepancy in the genetic distance between Brazilian cultivars analyzed by RAPD and by pedigree, in spite of overall agreement in the data. Since molecular marker measurements are a direct measure of the genetic distance it is possible that these discrepancies reflect errors related to pedigree assessments.

Analysis of Brazilian soybean germplasm by molecular markers has been reported. Abdelnoor et al. (1995), assessing the molecular marker approach using the RAPD technique to measure the genetic diversity of 30 Brazilian cultivars, found five different subgroups. However, the application of these results in a breeding program was hindered by the reduced number of genotypes used in the study. Recently, alleles of 12 microsatellite loci of 186 Brazilian soybean cultivars were used to morphologically distinguish similar groups and their use allowed the determination of 184 profiles for all cultivars (Priolli et al., 2002). Our present study was carried out to investigate the use of AFLP markers in the genetic similarity analysis of 317 soybean cultivars released in Brazil from 1962 to 1998.

Material and Methods

Genetic material

We investigated 317 soybean cultivars released in Brazil between 1962 and 1998 (Table 1). The genetic material was obtained from the Active Germplasm Bank of the National Soybean Research Center of the Brazilian Corporation for Agriculture Research (Embrapa Soja), Londrina, PR, Brazil.

DNA extraction and quantification

For each cultivar, 30 leaves were collected from different greenhouse-grown plants and immediately frozen in liquid nitrogen and stored at -80 °C for subsequent genomic DNA extraction according to Saghai-Maroof et al. (1984). The DNA concentration was estimated in 0.8% agarose gels by visual comparison of DNA band intensity with undigested lambda DNA standards and the DNA samples were then diluted to 30 ng µL-1 and stored at -20 °C until needed.

AFLP genotyping

All AFLP analyses were made with the AFLP Analyses Kit I (Gibco-LifeTechnologies, Rockville, MD, USA) essentially as described in the kit manual. All the amplifications were conducted in a Perkin-Elmer Gene Amp 9600 thermocycler (Perkin-Elmer Corp., Norwalk, CT, USA). The AFLP products were fractionated using 5% (w/v) polyacrylamide sequencing gels, dried and the autoradiography was performed by exposing Kodak Bio Max MR-2 film (Eastman Kodak Co., Rochester NY, USA). Six EcoRI/ MseI primer combinations (E-AAC/M-CAT, E-AAC/M-CTA, E-AAC/M-CTC, E-AAC/M-CTG, E-AAG/M-CTT, and E-ACT/M-CAT) were selected based on the previously reported polymorphism rate (Bonato et al., 2006).

Data analysis

The DNA bands were scored as 1 (presence) and 0 (absence) based on visual observation and the results entered into an Excel®. Genetic similarity (GS) was estimated for all genotype pairs using the equation GSi,j = 2Ni,j/(2Ni,j + Ni + Nj) (Nei and Li, 1979), where GSi,j represents the similarity estimate between the genotypes i and j, based on the AFLP data, Ni,j is the total number of bands common to i and j, and Ni and Nj correspond to the number of bands found in genotypes i and j. The matrix generated with the GS estimates was used to cluster the genotypes in a dendrogram obtained by the unweighted pair group method using arithmetic averages (UPGMA) (Rohlf, 1997). Cophenetic correlation between GS-matrix and dendrogram cophenetic values was estimated to validate the dendrogram in relation to the original similarity estimates and the binary data matrix analyzed using the NTSYS 2.0 software (Rohlf, 1997). Bootstrap analysis (Tivang et al., 1994) was used to verify if the number of markers was sufficient to characterize the cultivars for genetic similarity. The procedures for this re-sampling have been described by Barroso et al. (2003). Cophenetic correlation was obtained by Bionumeric Analyses (Rolhf, 1997) to express the consistence of a cluster.

Parentage coefficient

The parentage coefficient, f, (Malécot, 1947) estimates among 100 soybean cultivars released from 1984 to 1998 were based on their respective genealogies (see Vello et al., 1988 for earlier results) and obtained using the PARENT software program (CIAGRI - ESALQ/USP). These cultivars are indicated with an asterisk (*) in Table 1. One hundred cultivars were used because this is the maximum number of genotypes that can be assessed by the PARENT software. The estimates were later used to calculate the correlation between the parentage coefficient and the genetic similarity between the respective pairs of cultivars as measured by AFLP.

Results and Discussion

AFLP analysis

The six primer combinations used to analyze the 317 cultivars generated 394 bands, 78 (19.8%) of those were found to be polymorphic among genotypes (Table 2). The average number of polymorphic markers per primer combination was 13, varying from six to 25. The EcoRI-AAC/MseI-CTC and EcoRI-AAC/MseI-CAT markers generated the highest levels of polymorphism, 18 (34.6%) and 25 (30.1%) respectively.

Maughan et al. (1996) detected 274 (36%) AFLP polymorphic bands among Glycine max and Glycine soja accessions and an average of 18 polymorphic bands per primer combination, considering accessions of both species, with most (31%) of the polymorphism occurring in Glycine soja accession and only 17% in Glycine max.

In our study, although we considered only adapted Glycine max cultivars the level of polymorphism found (19.8%) clearly indicated the genetic similarity among the genotypes. Our results confirm previous findings demonstrating that AFLP is a molecular technique that detects polymorphism in multiple loci, generating a vast number of reproducible markers in a short period of time (Maheswaran et al.,1997) and is a powerful tool for screening highly related genotypes (Powell et al., 1996).

According to Zhu et al. (1999), the most polymorphic primer combinations were EcoRI-AAC/MseI-CTC (53%) and EcoRI-AAC/MseI-CAT (50%). These same authors reported a greater polymorphism frequency compared to that found by us, probably because Zhu et al. (1999) were dealing with adapted and non-adapted Glycine max and Glycine soja accessions. Their results, however, are in line with those obtained in our study and confirm that these two primer combinations are highly informative for analysis of Brazilian soybean germplasm.

AFLP estimates of genetic similarity

We constructed a similarity coefficient matrix from the genetic similarity calculations for the 317 genotypes. Figure 1 shows the frequency distribution of the coefficients, the average coefficient among all genotypes was 0.61, ranging from 0.17 for the Nobre and Bossier cultivars to 0.97 for the FT-Cristal and FT Cristalina cultivars. We also found that 56.8% of the estimated coefficients had values greater than 0.60, reflecting the high degree of genetic similarity among the cultivars used in this study. However, 9.7% of the coefficients were equal to or less than 0.50, and can be exploited for divergent parent selection.

High genetic similarity among Brazilian soybean cultivars was also detected by Abdelnoor et al. (1995), who used RAPD analysis and obtained a mean GS coefficient of 0.82 with a range of 0.69 to 1.00. The most divergent cultivars were Tropical and UFV-6, whereas the most similar cultivars were Ocepar-9 and Paranagoiana. Other studies using AFLP have also shown high similarity among adapted Glycine max genotypes. For example, Maughan et al. (1996) found similarity values ranging from 0.74 to 1.00. Zhu et al. (1999), although observing high similarity coefficients between Glycine max and Glycine soja accessions (0.60 to 0.94), emphasized the greater similarity of Glycine max cultivars. However, Priolli et al. (2002) used SSR markers and found GS values ranging from 0.18 to 0.59 in a group of 186 Brazilian soybean cultivars, this level of genetic similarity being lower than that found in our study.

One of the concerns about GS estimates is the number of markers required for sampling the genome. We obtained an average coefficient of variation (CV) of 7.7% by bootstrap analysis with the 78 markers used in our AFLP analysis (Figure 2) and the CV decreased as the sample size increased, indicating that the accuracy of genetic similarity estimates increases if the number of polymorphic loci is increased. Logarithmic transformation of mean CV and sample size established a linear relationship between the two variables and, consequently, the regression equation showed that with 100 polymorphic loci a mean CV of 6.7% would be obtained (Figure 3), an insignificant increase in the precision of the estimates.

Pejic et al. (1998) reported a comparative analysis of genetic similarity in maize, using RFLP, RAPD, AFLP, and SSR markers, and found coefficients of variation of 5% and 10% for the four types of markers with 150 polymorphic bands. These authors consider this to be a sufficient number of markers to estimate similarity with high accuracy. Moser and Lee (1994) suggested that species with a polymorphism index lower than that found in maize have a lower standard error in these estimates, which means that less loci need be assessed for the same level of precision. In the present study, the results obtained by bootstrap analysis show that the number of AFLP markers was sufficient to characterize soybean cultivars for their genetic similarity.

Genetic diversity of 317 Brazilian soybean cultivars released between 1962 and 1998

The impact of Brazilian genetic breeding programs on soybean genetic diversity over 36 years was investigated using the GS coefficients after grouping the cultivars into three periods according to their release to the market. In the first period we considered 48 cultivars released between 1962 and 1980, in the second period 122 cultivars released between 1981 and 1990, and in the third period 121 cultivars released between 1991 and 1998. We disregarded 26 cultivars because their release date was not available. Table 3 shows that the mean values remained practically constant for the three periods, indicating that the genetic diversity of cultivars developed in the Brazilian breeding programs maintained a similar level of genetic similarity throughout the years. This contrasts with the findings of Kisha et al. (1998) who assessed genetic diversity among different USA soybean gene pool using RFLP and found that the diversity among elite cultivars, as compared to ancestral genotypes, was declining over time as a consequence of breeding effects.

In spite of the narrow genetic base found in soybean cultivated in Brazil and the relatively high similarity among cultivars, substantial genetic gains in terms of productivity have been obtained for grain yield and other traits. Similar facts have occurred in the USA breeding programs (Hiromoto and Vello, 1986). In a soybean breeding program in the Brazilian state of Paraná Toledo et al. (1990) estimated mean annual increased productivity due to genetic gains from 1981 to 1986 of 1.8% for an early maturity group and 1.3% for an intermediate maturity group of cultivars. In a study of soybean cultivars widely grown in the southern Brazilian state of Rio Grande do Sul during different periods Rubin and Santos (1996) concluded that there has been a mean genetic gain of 19 kg ha-1 y-1 over the last 40 years, equivalent to 1.1% per year. Rubin and Santos also noted that these gains have been decreasing over the years as a result of using the same basic germplasm during hybridization. However, this is not necessarily the case, as revealed by the recent releases of new cultivars and the fact that there have been considerable improvements in the performance of agronomic traits due to the correction of defects controlled by qualitative traits and an improvement in grain yield has also been reported in several Brazilian states. For example, in Rio Grande do Sul the BRS 153 cultivar outperformed (in terms of grain yield) the control cultivar BR-16 by 14%, while the cultivar BRS-133 outperformed the same control by 8.5% in the state of Paraná, while in the Brazilian state of Mato Grosso do Sul the cultivar MS/BRS-171 (Campo Grande) outperformed the control cultivar FT-Cristalina by 20% while the cultivar MT/BR-50 (Parecis) outperformed cultivar MT/BR-45 (Paiaguás) by 5% (Congresso Brasileiro de Soja, 1999).

UPGMA grouping of the 317 soybean cultivars based on GS estimates

The GS estimates for the 317 cultivars were used to generate a UPGMA dendrogram (Figure 4). In spite of the large numbers hindering the assessment of similarities between each pair of cultivars, the dendrogram allowed the detection of groups of cultivars with expected genetic similarity corresponding to their genealogy. These results support the previously noted efficiency of AFLP markers for estimating genetic similarity among soybean genotypes. Knowing the genealogy of cultivars was essential for the interpretation of the dendrogram, a fact in line with other studies (Abdelnoor et al., 1995; Diwan and Cregan, 1997; Priolli et al., 2002).

As expected, the genetic similarity (GS) coefficients and the dendrogram showed that cultivars derived from natural mutation had high similarity coefficients. The Paranagoiana and Ocepar 9-SS1 cultivars, mutants of the Paraná cultivar, had similarity coefficients of 0.95. The São Carlos mutant cultivar had a genetic similarity coefficient of 0.83 with respect to the original Davis cultivar, while the UFV-1 mutant cultivar shared a coefficient of 0.67 with the original Viçoja cultivar. The magnitude difference between genetic similarity coefficients observed for the different mutants seems to indicate either that not all mutations giving rise to the mutant genotypes were single point mutations or they that these cultivars may not actually be mutants. However, the Paranagoiana and Ocepar 9-SS1 cultivars are known to be mutants of the Paraná cultivar, this having been demonstrated by the electrophoresis and isoenzyme studies of Derbyshire et al. (1990).

The GS estimates for cultivars derived from other cultivars also showed many cases of similarity. Examples were found in the FT-Cristal and FT-Bahia cultivars, which had GS coefficients of 0.97 and 0.87, respectively, compared with the FT-Cristalina cultivar from which they were selected. Similarly, the Ocepar-8 cultivar showed high similarity (0.90), when compared to the Paraná cultivar from which it was selected. However, in some instances, low similarity was found between putatively related cultivars, as in the case of the IAS 5 cultivar, which had been separated into two types according to pod color (dark pod IAS 5 and pale pod IAS 5) by researchers at the Embrapa National Soybean Research Center (Personal Communication). Our AFLP analysis indicated that these two cultivars must diverge in other genes besides those defining pod color because the AFLP GS coefficient between them was only 0.63. However, we also found that the FT-2 cultivar, derived from a selection within the IAS 5 cultivar, was closer (GS = 0.78) to the pale pod IAS 5cultivar than to the dark pod IAS 5 cultivar (GS = 0.65). These data indicate that the AFLP technique is highly discriminating for cultivar differentiation even among closely related genotypes. Diwan and Cregan (1997) analyzed soybean genotypes using 20 microsatellite markers but were unable to separate the Ilini genotype from the its ancestral AK Harrow genotype.

Among cultivars derived from the same cross-sister cultivars, there were several cases of agreement between GS coefficients and their allocation in the same group. Cultivars MT/BR-50 (Parecis), MT/BR-51 (Xingu), MT/BR-52 (Curió) and MT/BR- 53 (Tucano), derived from the BR 83-9520-1 (2) x FT-Estrela cross, had genetic similarity coefficients greater than 0.80. There was a similar situation for the UFV-2, UFV-3, UFV-4 and UFV-Araguaia cultivars, derived from the Hardee x IAC-2 cross, whose coefficients were greater than 0.83. Among the FT-5, FT-10, FT-14 and FT-15 cultivars, derived from the FT-9510 x SantAna cross, the GS coefficients varied from 0.71 to 0.93. Cultivars Embrapa 59, Embrapa 60, Embrapa 61, and BRS 66, all derived from the FT-Abyara x BR 83-147 cross had GS coefficients greater than 0.75. Additionally, the BRS 133, BRS 135, BRS 158, MS/BR-57 (Lambari) and MS/BRS-171 (Campo Grande) cultivars, also selected from the FT-Abyara x BR 83-147 cross, had GS coefficients lower than 0.65 as compared to their Embrapa 59, Embrapa 60, Embrapa 61, and BRS 66 sister cultivars. We attributed these differences in similarity among the sister cultivars to selection effects. Abdelnoor et al. (1995) also found similarities at several levels between cultivars derived from the same cross.

Cultivars developed from backcrosses had variable GS coefficients, as compared to their recurrent parents. For example, BR-6 (Nova Bragg) and BR-13 (Maravilha) cultivars were obtained from backcrosses (three to Nova Bragg and four to Maravilha) with the Bragg cultivar and had genetic similarity coefficients greater than 0.75 in relation to the Bragg cultivar. The Embrapa 1 cultivar, obtained from six backcrosses to the IAS 5 cultivar, had a genetic similarity coefficient of 0.68 when compared with the dark pod IAS 5 cultivar and 0.54 when compared with the pale pod IAS 5. The Embrapa 4 cultivar, derived from six backcrosses to the BR-4 cultivar, had a similarity coefficient of 0.61 only with BR-4 cultivar. The lower than expected genetic similarity between backcross progeny and respective recurrent parents found in our study may be explained by the work of Muehlbauer et al. (1988), who suggested that these types of effects are caused by the introgression of other markers in the same linkage groups as the transferred gene. Another possibility may be the lower selection pressure applied to recover the genetic characteristics of the recurrent parent.

Dendrogram analysis did not allow the separation of cultivars into groups based on the geographic distribution of their release sites or recommended planting sites, although the RFLP analyses of Kisha et al. (1998) showed a clear separation between soybean cultivars from the north and south of the USA and greater similarity among the genotypes from the south. The results of Kisha et al. (1998) were probably due to the fact that the cultivars from each region in the USA were derived from distinct ancestral groups, whereas in Brazil there are no such expected differences because the cultivars developed at different locations were derived from the same ancestral group (Romeu Kiihl - Personal Communication).

In most of the cases discussed above not only was the similarity indicated by the GS coefficients greater than that displayed in the dendrogram but some cultivars with high GS coefficients were placed in different groups in the dendrogram. For example, this occurred with the Pirapó 78 and Nova IAC-7 cultivars, which, even with coefficients of 0.82, were allocated to distinct groups of the respective parental cultivars Paraná and IAC-7. Possible causes for such discrepancies could be the large number of very closely genetically related genotypes analyzed and the low cophenetic correlation obtained for the original coefficients compared to that estimated by grouping (r = 0.60). Powell et al. (1996) obtained a cophenetic correlation of 0.78 among Glycine max accessions using AFLP markers, but the value rose to 0.96 when accessions of this species were considered together with Glycine soja accessions. In Powell's study, the lower cophenetic correlation observed among Glycine max accessions may have been due to the greater genetic similarity of the genotypes of this species. This explanation may be extended to the results of our study in which the cultivars had very similar GS coefficients which may have interfered when the estimates were grouped in the dendrogram.

Grouping of the 62 soybean cultivars based on GS estimates

To simplify the interpretation of the dendrogram, a new dendrogram was constructed using only 62 of the 317 cultivars (Figure 5). The cultivars used included those with ambiguous results regarding their grouping in the previous 317-cultivar dendrogram, as well as those with similarity coefficients of different magnitudes. Cophenetic correlation is also a parameter that expresses the consistency of a cluster by calculating the correlation between the dendrogram-derived similarities and the matrix similarities. In BioNumerics, the value is calculated for each cluster (branch) thus estimating the faithfulness of each subcluster of the dendrogram. The cophenetic correlation was obtained for the whole dendrogram from the cophenetic correlation at the roots (Figure 5). In the 63-cultivar dendrogram it was easier to visualize groups and there was a small increase in the general cophenetic correlation values from 0.60 to 0.70. In practical terms, these results suggest that parental selection based on genetic diversity may be more effective when soybean breeders use smaller groups of genotypes to calculate coefficients and if genotypes are previously selected based on their qualitative and quantitative agronomic traits.

Among cultivars considered discrepant in the previous dendrogram (Figure 4) there were pairs of cultivars that were reallocated within the same group, indicating good agreement with their original coefficients. An example of this was the group gathering the Embrapa 32 (Itaqui), Embrapa 33 and Cariri RCH cultivars, all derived from backcrosses to the BR-27 (Cariri) cultivar. A similar situation was observed for cultivar MS/BRS-172 (Tuiuiú) and its recurrent parent (the FT-Cristalina cultivar) with four backcrosses (0.77). The MT/BR-45 (Paiaguás) cultivar, from the cross between Doko x IAC-7, had a GS of 0.79 with cultivar IAC-7 forming a group with cultivars IAC-7 and Nova IAC-7. Another example of improved grouping occurred between the sister cultivars MT/BR-55 (Uirapuru) and MT/BRS-159 (Crixás) in relation to cultivar MT/BR-50, this new group having GS coefficients higher than 0.70.

However, some cultivars remained in different groups in spite of sharing high similarity coefficients, probably reflecting deficiencies in the grouping approach, as shown by the moderate cophenetic correlation value. This could be seen in several cases, such as the groups containing cultivar BR/IAC-21, derived from five backcrosses to the IAC-8 cultivar, with GS coefficients of 0.40; the Coodetec 201 cultivar, derived from five backcrosses to the Ocepar 4 (Iguaçú) (0.75); and the two pairs of sister cultivars, BR-4 and BR-5 (0.75) and BRS 153 and BRS 154 (0.71).

In spite of such relationship, some cultivars were allocated to different groups because of the low similarity obtained from the AFLP markers, examples being cultivar BRS 137 and the recurrent parent Dourados cultivar, with a similarity coefficient of 0.65, as well as the Embrapa-20 (Doko RC) cultivar, derived from four backcrosses to the Doko cultivar, with a similarity coefficient of 0.63. A similar finding was observed between cultivar UFV-6 and its sister cultivars UFV-5, UFV-9, and UFV-10.

Factors that may affect genetic similarity estimates

Results contradictory to theoretical expectations, based on the genealogy of each cultivar are difficult to explain with certainty, but several sources of error may be found in such studies, e.g. the use of seeds containing genetic material not originating from the stated parent plants (extraneous DNA), quality of chemicals and reagents used in the analysis, imprecision in the AFLP analysis, and mistakes in reading and interpreting the polymorphic fragment data. In the present study extraneous DNA was probably the most important source of error, because leaf samples were not taken from a single plant since the use of genetic material taken from a single plant could result in atypical genetic data for that cultivar, this being especially true in the case of sister lines.

Several other factors may influence genetic similarity estimates and should be taken into account in studies of this nature. Firstly, the number of markers used can affect the variance in similarity estimate because a marker represents an independent genomic sample (Powell et al., 1996), although in our study the 78 polymorphic markers used were found to be adequate for the analysis of the 317 cultivars.

Secondly, the distribution of markers in the genome is also an important factor to consider in diversity studies because a good coverage of the genome improves its representation efficiency as well as the comparison between individuals. It is normally assumed that markers are randomly distributed in the genome (Williams et al., 1990) and there is evidence that AFLP provides a wide coverage of the plant genome (Maheswaran et al., 1997).

Thirdly, the genetic similarity coefficient used may influence how similarity results are interpreted and grouped. For example, while the coefficient of Nei and Li (Nei and Li, 1979) does not consider the absence of bands as evidence of similarity between individuals the simple matching or common distance coefficient (SSM) of Sokal and Michener (1958) does, which may cause the SSM coefficient to overestimate genetic similarity because the absence of amplification in a dominant marker band common to two genotypes does not necessarily represent genetic similarity among the genotypes (Duarte et al., 1999). In respect to Jaccard's coefficient (SJ) the Nei and Li coefficient differs only by the double weight it assigns to the occurrence of bands in both of any two analyzed genotypes (Duarte et al., 1999; Mohammadi and Prasanna, 2003) and it thus seems that the Nei and Li coefficient is best suited to the type of analysis discussed in this paper.

Genetic modifications taken as de novo variability may also have an effect on this kind of study, with such modifications occurring because of intra-genic recombination, unequal crossing over, transposon activity and DNA methylation, although these factors are not thought to have been an important source of variability in our study.

Genetic similarity estimates based on the parentage coefficient (f) and correlation between the GS and f coefficients of genetic similarity

The 4,950 parentage coefficients (f) of the 100 soybean cultivars, released from 1984, varied from f = 0 to f = 1, with a mean of 0.21. When f = 0 there is no parentage between cultivars pairs, and this occurred in this study in 294 (5.94%) of the GS estimates. The maximum f coefficient (f = 1) was found in 14 (0.28%) of the cultivars pairs. Vello et al. (1988) estimated the parentage coefficient of each of the 69 cultivars recommended for the 1983/84 growing season and observed a mean coefficient of 0.16. In the USA, cultivars released between 1947 and 1988 showed mean f coefficients of 0.18 for the cultivars from the north and 0.23 for the cultivars from the south (Gizlice et al., 1993), values considered high. Sneller (1994) reported a similar finding when comparing elite cultivars from the north and south of the United States, as well as a surprisingly low f coefficient (0.10) between northern and southern regions, suggesting that soybean breeders have kept distinct gene pools. Analysis of Chinese soybean cultivars showed a very low f coefficient (0.02), indicating a high degree in genetic diversity of the germplasm of this country (Cui et al., 2000).

In our study the correlation coefficient between GS and the parentage coefficients (f) was low (r = 0.12) for several reasons. One reason could be the fact that the two types of coefficients are not based on the same type of genetic similarity, because the f coefficient is a mathematical determination based on probabilities, while AFLP molecular markers detect similarities directly at DNA level. Another reason is that the conditions relating to the use of the f coefficient are not always fulfilled when dealing with the germplasm used in plant breeding. In fact, the violation of these assumptions seems to be critical for using this coefficient in the plant breeding context and this may lead to low correlation between f coefficients and genetic marker data. This happens because the f calculation assumes that each parent transfers 50% of its genetic material to its offspring, without considering the effects of selection and genetic drift (Barret et al., 1998). Only in cases where cultivars are derived from selection within cultivars or by mutation must the f coefficient be equal to 1 (i.e. full similarity) because there is no possibility of a derived cultivar possessing different genes. However, the assumption that f = 0 (i.e. no similarity) when the ancestors of a cultivar do not have parentage in common is not always true and may possibly lead to an underestimation of the relationship between two genotypes. For example, Gizlice et al. (1993) used multivariate analysis to calculate similarity coefficients and found coefficients varying from 0 to 0.88 among North American soybean ancestors. Thus, lack of precise knowledge regarding the genealogy of a cultivar may interfere negatively with f coefficient estimates, a fact that does not occur with molecular markers because they do not require previous knowledge of genealogy for the calculation of genetic similarity.

The correlation between genetic similarity values obtained using markers and parentage (f) coefficients has been investigated in many studies with variable results. Very low correlation values (r = 0.33) were obtained for wheat using RFLP (Barbosa-Neto et al., 1996), while high (r = 0.97) values were obtained in maize with the same type of marker (Smith and Smith, 1991). Using AFLP markers, Barret and Kidwell (1998) found a correlation coefficient of 0.42 for wheat, which they explained in a similar manner to that outlined in the previous paragraph.

In this paper, we have presented the first global analysis of genetic similarity in Brazilian soybean germplasm and have shown that AFLP markers are a very rapid, effective and reliable tool for this type of analysis. These findings not only highlight the capacity of the AFLP technique but should also help Brazilian soybean breeders in the selection of parent-plants for their crossing programs.


The authors acknowledge the financial support received from the Brazilian agency Fundação Mato Grosso (FMT) and are grateful to P.A.V. Barroso for the bootstrap and E. Binneck for the cophenetic analysis. A.L.V. Bonato received a doctorate scholarship from the Brazilian agency CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico).

  • Send correspondence to
    A.L.V. Bonato
    Centro Nacional de Pesquisa de Trigo
    Caixa Postal 451
    99001-970 Passo Fundo, RS, Brazil
  • Received: June 1, 2005; Accepted: May 18, 2006.

    Associate Editor: Everaldo Gonçalves de Barros

    • Abdelnoor RV, Barros EG and Moreira MA (1995) Determination of genetic diversity within Brazilian soybean germplasm using random amplified polymorphic DNA techniques and comparative analysis with pedigree data. Braz J Genet 18:265-273.
    • Barbosa-Neto JF, Sorrels ME and Cisar G (1996) Prediction of heterosis in wheat using coefficient of parentage and RFLP-based estimates of genetic relationship. Genome 39:1142-1149.
    • Barrett BA and Kidwell KK (1998) AFLP-based genetic diversity among wheat cultivars from the Pacific Northwest. Crop Sci 38:1261-1271.
    • Barrett BA, Kidwell KK and Fox PN (1998) Comparison of AFLP and pedigree-based genetic diversity assessment methods using wheat cultivars from the Pacific Northwest. Crop Sci 38:1271-1278.
    • Barroso PAV, Geraldi IO, Vieira MLC, Pulcinelli CE, Vencosvky R and Dias CTS (2003) Predicting performance of soybean populations using genetic distances estimated with RAPD markers. Genet Molec Biol 26:343-348.
    • Bonato ALV, Calvo ES, Arias CAA, Toledo JFF and Geraldi IO (2006) Prediction of genetic variability through AFLP-based measure of genetic distance in soybean (Glycine max L. Merrill). Crop Breed Appl Biotechnol 6:30-39.
    • Brown-Guedira GL, Thompson JA, Nelson RL and Warburton ML (2000) Evaluation of genetic diversity of soybean introductions and North American ancestors using RAPD and SSR markers. Crop Sci 40:815-823.
    • Congresso Brasileiro de Soja (1999) Anais... Embrapa Soja, Londrina, 533 pp (Embrapa Soja Documentos 124).
    • Cox TS, Kiang YT, Gorman MB and Rodgers DM (1985) Relationship between coefficient of parentage and genetic similarity indices in the soybean. Crop Sci 25:529-532.
    • Cui Z, Carter Jr TE and Burton JW (2000) Genetic diversity patterns in Chinese soybean cultivars based on coefficient of parentage Crop Sci 40:1780-1793.
    • Delannay X, Rodgers DM and Palmer RG (1983) Relative genetic contribution among ancestral lines to North American soybean cultivars. Crop Sci 23:944-949.
    • Derbyshire E, Carvalho MTV and Bonato ER (1990) Comparison of natural variants of the soybean cultivar Paraná by isoenzyme analysis. Braz J Genet 13:83-87.
    • Diwan N and Cregan PB (1997) Automated sizing of fluorescent-labeled simple sequence repeat (SSR) markers to assay genetic variation in soybean. Theor Appl Genet 95:723-733.
    • Duarte JM, Santos JB dos and Melo LC (1999) Comparison of similarity coefficients based on RAPD markers in the common bean. Genet Molec Biol 22:427-432.
    • Gizlice Z, Carter Jr TE, Gerig TM and Burton JW (1996) Genetic diversity patterns in North American public soybean cultivars based on coefficient of parentage. Crop Sci 33:753-765.
    • Gizlice Z, Carter TE and Burton JW (1993) Genetic diversity in North American soybean: I. Mutivariate analysis of founding stock and relation to coefficient of parentage. Crop Sci 33:614-620.
    • Helms T, Orf J, Vallad G and McClean P (1997) Genetic variance, coefficient of parentage, and genetic distance of six soybean populations. Theor Appl Genet 94:20-26.
    • Hiromoto DM and Vello NA (1986) The genetic base of Brazilian soybean (Glycine max (L.) Merrill) cultivars. Braz J Genet 9:295-306.
    • Kisha TJ, Diers BW, Hoyt JM and Sneller CH (1998) Genetic diversity among soybean plant introductions and North American germplasm. Crop Sci 38:1669-1680.
    • Maheswaran M, Subudhi PK, Nandi S, Xu JC, Parco A, Yang DC and Huang N (1997) Polymorphism, distribution, and segregation of AFLP markers in a doubled haploid rice population. Theor Appl Genet 94:39-45.
    • Malécot G (1947) Les Mathématiques de l'Hérédité. Masson et Cie, Paris, 80 pp.
    • Manjarrez-Sandoval P, Carter Jr TE, Webb DM and Burton JW (1997) RFLP Genetic similarity estimates and coefficient of parentage as genetic variance predictors for soybean yield. Crop Sci 37:698-703
    • Maughan PJ, Saghai-Maroof MA and Buss GR (1996) Amplified fragment length polymorphism (AFLP) in soybean: Species diversity, inheritance, and near-isogenic line analysis. Theor Appl Genet 93:392-401.
    • Mohammadi SA and Prasanna BM (2003) Analysis of genetic diversity in crop plants - Salient statistical tools and considerations. Crop Sci 43:1235-1248.
    • Moser H and Lee M (1994) RFLP variation and genealogical distance, multivariate distance, heterosis, and genetic variance in oats. Theor Appl Genet 87:947-956.
    • Muehlbauer GJ, Specht JE, Thomas-Compton MA, Staswick PE and Bernard RL (1988) Near-isogenic lines - A potential resource in the integration of conventional and molecular marker linkage maps. Crop Sci 28:729-735.
    • Nei M and Li WH (1979) Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76:5269-5273.
    • Oliveira KM, Laborda PR, Garcia AAF, Paterniani MEZ and Souza AP (2004) Evaluating genetic relationships between tropical maize inbred lines by means of AFLP profiling. Hereditas 140:24-33.
    • Pejic I, Ajmone-Marsan P, Morgante M, Kozumplick V, Castiglioni P, Taramino G and Motto M (1998) Comparative analysis of genetic similarity among inbred lines detected by RFLPs, RAPDs, SSRs and AFLPs. Theor Appl Genet 97:1248-1255.
    • Powell W, Morgante M, André C, Hanafey M, Vogel J, Tingey S and Rafalski A (1996) The comparison of RFLP, RAPD, AFLP, and SSR (microsatellite) markers for germplasm analysis. Molec Breed 2:225-238.
    • Priolli RHG, Mendes Junior CT, Arantes NE and Contel EPB (2002) Characterization of Brazilian soybean cultivars using microsatellite markers. Gen Mol Biol 25:185-193.
    • Rohlf FJ (1997) NTSYS-pc: Numerical Taxonomy and Multivariate Analysis System. EXETER Software, New York.
    • Rubin SAL and Santos OS (1996) Progresso do melhoramento genético da soja no Estado do Rio Grande do Sul: I. Rendimento de grãos. Pesq Agropec Gaúcha 2:139-147.
    • Saghai-Maroof MA, Sloman KM, Jorgensen RA and Allard RW (1984) Ribosomal DNA spacer-length polymorphisms in barley: Mendelian inheritance, chromosomal location, and population dynamics. Proc Natl Acad Sci USA 81:8014-8018.
    • Smith JSC and Smith OS (1991) Restriction fragment length polymorphisms can differentiate among U.S. maize hybrids. Crop Sci 31:893-899.
    • Sneller CH (1994) Pedigree analysis of elite soybean lines. Crop Sci 34:1515-1522.
    • Sneller CH, Miles J and Hoyt JM (1997) Agronomic performance of soybean plant introductions and their genetic similarity to elite lines. Crop Sci 37:1595-1600.
    • Sokal RR and Michener CD (1958) A statistical method for evaluating systematic relationships. Univ. Kansas Sci Bull 38:1409-1438.
    • Thompson JA and Nelson RL (1998) Utilization of diverse germplasm for soybean yield improvement. Crop Sci 38:1362-1368.
    • Tivang JG, Nienhuis J and Smith OS (1994) Estimation of sampling variance of molecular marker data using bootstrap procedure. Theor Appl Genet 89:259-264.
    • Toledo JFF, Almeida LA, Kiihl RAS and Menosso OG (1990) Ganho genético em soja no Estado do Paraná, via melhoramento. Pesq Agropec Bras 25:89-94.
    • Ude GN, Kenworthy WJ, Costa JM, Cregan PB and Alvernaz J (2003) Genetic diversity of soybean cultivars from China, Japan, North America, and North American ancestral lines determined by amplified fragment length polymorphism. Crop Sci 43:1858-1867.
    • Vello NA, Hiromoto DM and Azevedo-Filho AJBV (1988) Coefficient of parentage and breeding of Brazilian soybean germplasm. Braz J Genet 11:679-697.
    • Williams JGK, Kubelik AR, Livak KJ, Rafalski JA and Tingey SV (1990) DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucl Acids Res 18:6531-6535.
    • Zhu SL, Monti LM, Avitabile A and Rao R (1999) Evaluation of genetic diversity in Chinese soyabean germplasm by AFLP. Plant Genet Res Newsletter 119:10-14.

    Send correspondence to A.L.V. Bonato Centro Nacional de Pesquisa de Trigo Embrapa Caixa Postal 451 99001-970 Passo Fundo, RS, Brazil E-mail:

    Publication Dates

    • Publication in this collection
      21 Nov 2006
    • Date of issue


    • Accepted
      18 May 2006
    • Received
      01 June 2005
    Sociedade Brasileira de Genética Rua Cap. Adelmio Norberto da Silva, 736, 14025-670 Ribeirão Preto SP Brazil, Tel.: (55 16) 3911-4130 / Fax.: (55 16) 3621-3552 - Ribeirão Preto - SP - Brazil