Characterization of Brazilian soybean cultivars using microsatellite markers

Microsatellite markers or SSR (Simple Sequence Repeats) have proved to be an excellent tool for cultivar identification, pedigree analysis and the evaluation of genetic distance among organisms. Soybean cultivars have been characterized mainly by morphological and biochemical traits. However, these traits have not been sufficient to characterize the large number of cultivars eligible to receive protection under the Brazilian Cultivar Protection Act. In order to define new soybean cultivar markers, the alleles of twelve SSR loci of 186 Brazilian soybean cultivars were studied by estimating the variation in their size range and their respective frequencies. On average, 5.3 alleles per locus were detected, with a mean genetic diversity of 0.64 ± 0.12. These loci were used to distinguish morphologically similar groups, presenting a mean similarity coefficient of 0.46; their use allowed to determine 184 profiles for the 186 cultivars. A dendrogram based on the SSR loci profiles showed good agreement with the cultivar pedigree information.


Introduction
Soybean, Glycine max (L.) Merrill, a legume native from China, is currently one of the most important crops worldwide.Brazil is the second largest producer, with a cultivated area of 13.68 million hectares, and 37.8 million tons harvested in 2000/2001 (http://www.conab.gov.br).The importance of soybean in Brazilian agriculture is due partly to suitable climate and soil management, but particularly to the great number of improved cultivars.For the 2000/2001 harvest alone, the Brazilian Agricultural Research Corporation (Embrapa) listed about 259 soybean cultivars, adapted to the most diverse producing regions in Central Brazil (Embrapa, 2000).This figure has increased year after year, with more productive new cultivars, resistant to pathogens, in both consolidated and expanding cropping areas.
Along with the development of new cultivars, there has been a growing interest in the genetic characterization, for commercial protection provided by the Brazilian Cultivar Protection Act (1997).When referring to the necessary requirements for the protection of a cultivar, it states that the cultivar has to be reliably distinct, homogeneous and stable.
Plant breeders have traditionally used morphological and biochemical traits to register and protect their varieties.Although these traits remain predominant and important, they present limitations, particularly in closely related cultivars.In plants with a narrow genetic base in their gene pool, such as soybean, they may not be sufficient, taking into account the large number of cultivars eligible to be protected.In such cases, molecular descriptors can provide additional information about the characterization, degree of diversity and genetic constitution of the existing germplasm.
Microsatellites or SSR are sequences of a few repeated and adjacent basepairs, well distributed over the eukaryote genome (Powell et al., 1996).Variations in the number of repeats can be detected by polymerase chain reaction (PCR), with the development of primers (20 to 30 base pairs) specifically built for amplification and complementary to single sequences flanking the microsatellite.These markers have been used for genotypic identification of many plant species, such as soybean (Cregan et al., 1994;Diwan and Cregan, 1997;Rongwen et al., 1995;Maughan et al., 1995;Song et al., 1999), grape (Vitis vinifera L.) (Thomas and Scott, 1993), rapeseed (Brassica napus L.) (Kresovich et al., 1995), apple (Malus x domestica Borch) (Hokanson et al., 1998), and many others.
A high level of polymorphism in the SSR loci has been reported for soybean.Akkaya et al. (1992) detected an average of seven alleles at each of three microsatellite loci studied in a group of 43 soybean genotypes.Morgante and Olivieri (1994) detected similar levels of allelic diversity in seven SSR loci in a group of 61 genotypes.Rongwen et al. (1995) reported 11 to 26 alleles at seven loci in a group of 96 soybean cultivars and plant introductions (PIs).Maughan et al. (1995) detected 79 alleles across five SSR loci in a sample of 94 soybean accessions of G. max and G. soja genotypes.Using 12 microsatellite primers, Doldi et al. (1997) found two to six alleles per locus in a group of 18 soybean cultivars.Narvel et al. (2000), using 72 microsatellite loci, detected a total of 397 alleles in 79 elite soybean cultivars and PIs.
Using 20 SSR markers, Diwan and Cregan (1997) were able to distinguish the 35 soybean genotypes that accounted for about 95% of the alleles present in North-American soybean.They detected an average of 10.1 alleles per locus, and concluded that the stuttering related to the dinucleotide loci increased the difficulty in defining the main peak of the allele used to establish their size, suggesting the use of trinucleotide loci for cultivar identification.Song et al. (1999) selected a group of 13 trinucleotide SSR loci to characterize morphologically similar cultivars, and standardized the identification of North-American soybean cultivars by this group of loci.
The objective of the present study was to determine the number of alleles and the gene diversity of trinucleotide loci in a group of soybean cultivars fit to be grown in Brazil, and to select or indicate a set of loci endowed with different profiles for each cultivar.

Soybean plant material and DNA isolation
A group of 186 soybean elite cultivars, developed and released by Brazilian public and private institutions, was selected to represent the complete range of cultivars grown in Brazil.Seeds of each of the 186 cultivars were obtained from the Embrapa-Soybean Germplasm Collection.The cultivars are listed in Table I.
Thirty to fifty plants of each soybean cultivar were grown in a greenhouse for DNA isolation.The equivalent of 30 leaf tissue samples were collected from each cultivar, frozen in liquid nitrogen and lyophilized for 1-2 days.DNA was isolated from the bulked lyophilized leaf tissue of the plants of each cultivar by a mini-prep procedure based on Doyle and Doyle (1990).DNA quality and concentration were evaluated by electrophoresis in 0.8% agarose gel stained with ethidium bromide (EtBr).

Morphological and genealogical traits of the cultivars
The pedigrees and some morphological traits of the soybean cultivars were recorded, following research of the literature and information received from Embrapa-Soybean and private breeders.The 186 cultivars were divided into 15 groups, according to similarities in hypocotyl color (green or purple), flower color (white or purple), pubescence color (gray or brown) and hilum color (buff, brown, yellow, black and imperfect black), denominated with Roman numerals, as shown in Table I.

SSR loci
Twelve pairs of soybean primers flanking the microsatellite regions, previously developed and published by Cregan et al. (1999)

PCR amplification of SSR loci
PCR amplification was performed on each of the 186 soybean genotypes, using primers for each SSR locus.Reaction mixtures contained 30ng of soybean genomic DNA, 0.2 µM 3' and 5' end primers, 200 µM of each nucleotide, 1 X PCR Buffer containing 50 mM KCl, 10 mM Tris-HCl pH 8.9, 2.0 mM MgCl 2 , and 1 unit of Taq DNA polymerase, in a total volume of 25µL.For primers Satt 002, Satt 005 and Satt 009, the MgCl 2 concentration was changed to 2.5 mM for better amplification.A thermal cycler (PCR Machine Robocycler, Stratagene) was programmed for 2 min at 94 °C, followed by 32 cycles of 1 min at 94 °C, 1 min at 47 °C and 1 min at 72 °C, and a final cycle of 10 min at 72 °C.
Amplification products were separated in denaturing gels containing 10% polyacrylamide, 8 M urea and 1 X TBE, during approximately 4 h at 15 mA.The size of each band was estimated by a 25-bp DNA Ladder (Life Technologies-Gibco BRL).Amplified SSR fragments of different sizes were considered as different alleles.The fragments were detected by silver staining, following the Sanguineti et al. (1994) protocol.

Statistical analysis
The gene diversity (Weir, 1990) was calculated as: 1 - , where P ij is the frequency of the j th allele at the i th locus, summed across all alleles in the locus.
A genetic dissimilarity coefficient was calculated for each pair of cultivars, according to Diwan and Cregan (1997), to determine the effectiveness of the group of twelve SSR loci in distinguishing each of the 186 cultivars.These authors state that in elite soybean cultivars (that are often derived from identical plants), 12.5% to 6.25% of heterozygous loci remain in the F 4 and F 5 generations, respectively, whereas such a heterozygosity might be expressed as a mixture of two different homozygotes in later generations.Therefore, they suggest that the segregating bulks should be taken into account in the identification of soybean cultivars.They indicate a computer program to compare each pair of loci and attribute them either similarity or dissimilarity values.In order to obtain a dendrogram with significance values, the bootstrap procedure was applied over the original databank, allowing the construction of 100 different ones, by sorting with replacement of 12 loci, as suggested by Felsenstein (1985).For each databank, Microsoft Excel software, Version 5.0, was used to draw a spreadsheet where each locus of two cultivars would score 1.0 if they shared the same alleles, that is, if both alleles had the same size; 0.5 if only one of the alleles was the same, and 0 if they did not have the same alleles.These values were used to calculate a simple genetic dissimilarity coefficient (1 -Score/12) between each pair of cultivars.The 100 matrices of genetic dissimilarity coefficients were used to construct a consensus UPGMA (Unweighted Pair-Group Method using Arithmetic Average) dendrogram, using the NEIGHBOR and CONSENSE programs contained in the PHYLIP package (Phylogeny Inference Package), Version 3.57c (Felsenstein, 1989).The capacity of the markers to distinguish between morphologically similar groups was also determined by calculating the genetic similarity coefficients (1 -genetic dissimilarity coefficient) of each pair of cultivars in 14 out of the 15 groups shown in Table I, since one of the identified groups consisted of only one cultivar.

SSR polymorphism in 186 soybean cultivars
All the 12 SSR loci were polymorphic, as shown in Table II.The number of alleles per locus varied from four to eight, with an average of 5.3 alleles per locus, distributed among the 186 cultivars.The frequency of seventy-five percent of the 64 detected alleles was lower than 0.25, and that of the remaining 25% was equal to or higher than 0.25.Only one allele in Satt102 showed a frequency higher than 0.75, and two alleles had frequencies lower than 0.01, one in locus Satt005 and the other in locus Satt002.These values confirm the good distribution and the representative aspect of the alleles in the studied sample.The genetic diversity (GD), which is indicative of the effectiveness of SSR loci information, was also relatively high, ranging from 0.41 to 0.82, with a mean value of 0.64 ± 0.12.
The 12 SSR loci provided 184 profiles of the 186 studied cultivars.The four non-distinguished cultivars were Embrapa 1 (IAS 5 RC) with regard to RS 9 (Itaúba), Soybean microsatellite characterization and FT 103 with regard to FT 104.Embrapa 1 (IAS 5RC) and RS 9 (Itaúba) derive from IAS 5.The first one resulted from a backcross of IAS 5 during five generations, and the second one, from a cross between FT 2 and IAS 5.In spite of their unknown origin, the two other cultivars, FT 103 and FT 104, were developed by the same institution by crossing many progenitors (bulk), which does not exclude the possibility that they may have similar origins.Another point to be noted concerning these similar cultivars is that the alleles of the 12 loci which constituted their profile were precisely the most frequent, although the probability of finding identical individuals at random in this sample was practically null.
The genetic dissimilarity coefficients found in the cultivar comparison matrix were relatively high.The distribution analysis of the 17,205 pairwise comparisons (Figure 1) revealed extreme values.Zero indicated similar cultivars, and 1 indicated different cultivars.However, most of the values lied between 0.4 and 0.9, rather indicating a dissimilarity level among the cultivars than the opposite.
The 12 SSR loci were also successful in distinguishing cultivars with identical morphological traits (Table III).The mean similarity value among cultivars belonging to the same group was 0.46.There were totally different cultivars in the same group (coefficient 0.0), as in groups seven and nine, but the average for the minimum similarity values of all groups was 0.25.Although completely similar cultivars (coefficient 1.0) were present in groups 1 and 3, as mentioned above, the mean of the maximum similarity values was 0.81.Therefore, out of the 24 possible comparisons between two morphologically similar cultivars (12 loci x 2 possible alleles), 11 were observed to be identical, on average.

Germplasm
The consensus tree relating the 186 cultivars based on the twelve SSR loci (Figure 2) expresses the distinction of groups with maximum and minimum similarities.The results were also highly consistent with regard to the ancestral descent of the groups, and identified groups with some degree of parentage.For instance, cultivars FT Eureka, Ocepar 8, BRSMG Virtuosa, and almost all the cultivars in the group named Paraná are in the same group as Paraná.Furthermore, all of them descended either from Paraná or from a selection of it.For the same reason, IAC 8, IAC 8-2, BR IAC 21, IAC 17, IAC 18, CAC 1, and CS 303 are in the same group as their ancestor Bragg.Similarly, other groups contain small sets of cultivars, all of them related to the same common ancestral, as shown in Figure 2.
Many of the ancestral genotypes mentioned present some degree of parentage.Dourados, for instance, is a selection of Andrews, which, in turn, is a selection of Santa Rosa.Bragg and FT Cristalina share a common parent, D492491; Paraná and IAS 5 also share a common parent, Hill.All of them are ancestors of other groups.
There was only about 10% discrepancy between the dendrogram and the constituted pedigree, such as the inclusion of MG/BR 46 Conquista in the group containing Santa Rosa, BRSMS Piracanjuba, FT Estrela and RB 603 in the IAS 5 group, as well as all the X-marked cultivars in Figure 2.This incongruity, along with the lack of common parental in some clusters, may mean either that there is no parentage with the indicated ancestral genotype or that precise data on its pedigree are lacking.
Except for the genetic relationships of a variety being selected from another or pedigree relationships, the analysis did not show any correlation with growing habits, similar morphology or geographical origin among the groups.
The dendrogram also revealed which North-American varieties more effectively contributed to the formation of this group of Brazilian cultivars.Santa Rosa, D492491 (sister line of Lee), Hill, Davis and Hood were most frequently used as parents, since they were directly or indirectly identified in most of the clusters.

Discussion
The polymorphism of SSR loci detected in this study was consistent with previous studies by Akkaya (1992), Morgante and Olivieri (1994), Maughan et al. (1995), Doldi et al. (1997) and Narvel et al. (2000), but lower than that obtained by Rongwen (1995) and Diwan and Cregan (1997).One possible reason for this difference is that the materials used in the present study were all from breeding programs, thus having a relatively narrow genetic base.In a study on genetic diversity in soybean, 11 to 26 alleles per microsatellite primer pair were amplified from 96 soybean genotypes, but this number was reduced by five to 10 alelles per primer pair in 26 cultivars from North-American breeding programs (Rongwen et al., 1995).The obtained gene diversity (GD) was in agreement with the data of Rongwen et al. (1995), who found a mean value of 0.74 in a group of 96 soybean genotypes.It is in line with the results of Diwan and Creagan (1997), who found mean GD values close to 0.69 in a group of 36 commercial soybean lines, and in agreement with the data of Narvel et al. (2000), who detected a mean value of 0.50 ± 0.02 in a group of 39 elite cultivars.
The presence of low-frequency alleles in some SSR, as observed in Satt002 and Satt005, may reflect the soybean microsatellite mutation rate, estimated at 10 -5 to 10 -4 per generation (Diwan and Cregan, 1997).These authors argued that such a rate is similar to the human rate, and that it should not be a hindrance to the use of SSR for cultivar identification.They also stated that soybean cultivars should be described for identification based on a bulk of 30 to 50 plants, since possible mutation alleles would not be detected and, therefore, mutations in isolated plants would not alter the allelic constitution of the cultivar.However, Song et al. (1999), using this procedure, detected 10 new alleles in 66 soybean cultivars, that were not present in the 35 ancestral lines; and Narvel et al. (2000) recorded 32 alleles specific for elite cultivars, within a total of 397 alleles that had been detected in 40 lines and in 39 soybean cultivars.
The genetic dissimilarity coefficient derived from the 12 studied loci presented a mean variation of 0.63, which means that, on average, two genotypes presented 15 alleles that differed from one another.Table II shows that, even in groups which are similar for certain morphological traits, the mean average value obtained was 0.46, or 11 common alleles.These results were favorable to the loci, as far as distinguishing the assayed cultivars is concerned.
The existence of non-distinguished cultivars in the sample may reflect the narrow genetic base of the gene pool of Brazilian soybean germplasm.Hiromoto and Vello (1986) already reported that, in that year, all recommended cultivars had derived from only 26 ancestral genotypes, nine of which were responsible for more than 80% of that gene set, and only four of them being responsible for 50% of it.This picture was not so different in the following years, since Abdelnoor et al. (1995) did not find much variation (14.2 to 20.5%) in the genetic distances among 38 Brazilian soybean cultivars, as estimated by RAPD molecular markers.
The obtained data suggest that this group of 12 microsatellite loci can be used to distinguish Brazilian soybean cultivars from each other, inasmuch as 98.9% of the assayed cultivars could be identified.Furthermore, in referring to some morphological traits, identical cultivars could be distinguished by the same SSR loci in 12 out of the 14 established groups.Despite the existence of four non-distinguished cultivars, which, as mentioned above, were closely related in their formation, the use of these 12 SSR loci may be a feasible alternative in identifying and evaluating the soybean to be protected.

Figure 2
Figure 2 -UPGMA consensus dendrogram relating 186 soybean cultivars.Genetic distances were based on information for 12 microsatellite loci and calculated for each pair-wise comparison according to Diwan and Cregan (1997).The brackets on the right indicate the common parent identified in a cluster.x marks refer to cultivars present in a cluster, not corresponding to their common parent.Question marks indicate cultivars with no information on their pedigree.All bootstrap values out of 100 replicates are shown at the corresponding forks.

Table I -
The soybean cultivars used in the present study.

Table II -
Linkage group, allele size range, number, frequency, and gene diversity of 12 SSR loci in 186 soybean cultivars.
Figure 1 -Distribution of genetic distances calculated for 17,205 pairs of genotypes.

Table III -
Mean, maximum and minimum similarity coefficients calculated between cultivars within morphologically identical groups based on 12 SSR loci.