Genetic diversity in soybean germplasm identified by SSR and EST-SSR markers

The objectives of this work were to investigate the genetic variation in 79 soybean (Glycine max) accessions from different regions of the world, to cluster the accessions based on their similarity, and to test the correlation between the two types of markers used. Simple sequence repeat markers present in genomic (SSR) and in expressed regions (EST-SSR) were used. Thirty SSR primer-pairs were selected (20 genomic and 10 EST-SSR) based on their distribution on the 20 genetic linkage groups of soybean, on their trinucleotide repetition unit and on their polymorphism information content. All analyzed loci were polymorphic, and 259 alleles were found. The number of alleles per locus varied from 2–21, with an average of 8.63. The accessions exhibit a significant number of rare alleles, with genotypes 19, 35, 63 and 65 carrying the greater number of exclusive alleles. Accessions 75 and 79 were the most similar and accessions 31 and 35, and 40 and 78, were the most divergent ones. A low correlation between SSR and EST-SSR data was observed, thus genomic and expressed microsatellite markers are required for an appropriate analysis of genetic diversity in soybean. The genetic diversity observed was high and allowed the formation of five groups and several subgroups. A moderate relationship between genetic divergence and geographic origin of accessions was observed.


Introduction
The perception that current soybean cultivars are extremely uniform is corroborated by various studies based on inbreeding coefficient analysis and studies assessing the genetic variability using molecular markers (Hiromoto & Vello, 1986;Priolli et al., 2002;Bonato et al., 2006).These studies showed that a few accessions have contributed to the majority of the genes in current cultivars, and that the genetic diversity in soybean elite germplasm is limited.
The narrow genetic base is a major constraint in breeding programs, due to the lack of genetic variability, cultivar susceptibility to pathogens and herbivores, and reaching of yield plateaus (Martin, 2000;Fu, 2006).
Introducing novel germplasm sources in breeding programs, such as plant introductions (PIs), may provide the necessary genetic variability for the continuous development and adaptation of cultivars to biotic and abiotic factors.Therefore, plant germplasm is a natural source to broaden the current soybean genetic base (Chung & Singh, 2008).The potential of soybean breeding is enormous, since, currently, a small fraction of the existing accessions in germplasm collections contribute to the genetic base of the present cultivars.
The expansion of soybean genetic base may lead to the introduction of new favorable alleles to polygenic traits (Brown-Guedira et al., 2000;Guzman et al., 2007).Considering the great amount of genes hypothesized to be involved in the control of agronomic characteristics, it is unlikely that modern cultivars have concentrated the best alleles corresponding to all loci of economic interest.Undoubtedly, several favorable alleles were lost through genetic bottlenecks during soybean domestication and introduction in producing regions.The choice of accessions to be incorporated in a breeding program must include those carrying and transmitting favorable rare alleles, absent from elite germplasm.Consequently, the knowledge of the sources for such alleles is invaluable.Accessions highly dissimilar to elite genotypes are likely to provide novel alleles to the traits of interest.The challenge is to select which accessions to use in breeding programs from the available germplasm.Therefore, the knowledge of the genetic variation within accessions from germplasm collections is essential to the choice of strategies to incorporate useful diversity into the program, to facilitate the introgression of genes of interest into commercial cultivars, to understand the evolutionary relations among accessions, to better sample germplasm diversity, and to increase conservation efficiency (Fu, 2003).Previous studies have used molecular markers to help the identification of genetic divergent accessions (Lee et al., 2008;Li et al., 2009).Microsatellite or simple sequence repeats (SSR) markers are considered useful to these approaches, due to their effectiveness in genealogy analysis and in the assessment of genetic diversity among organisms (Narvel et al., 2000;Kuroda et al., 2009).The use of functional molecular markers, such as those developed from expressed sequence tags (EST), allows direct access to the population diversity in genes of agronomic interest, facilitating the association between genotype and phenotype.
The objectives of this work were to analyze the genetic diversity of 79 soybean accessions from distinct geographic regions of the world, cluster them into groups according to genetic similarity and to test the correlation between the two types of markers used.

Materials and Methods
Seventy-nine soybean accessions, obtained from Embrapa Soja germplasm bank, were selected according to their geographical origin to represent distinct geographical regions of the world.The selection was also based on variability groups, as defined in previous studies based on geographic distances or molecular analysis (Hymowitz & Kaizuma, 1981;Perry & McIntosh, 1991;Abe et al., 2003).The accessions were numbered from 1 to 79 (Table 1), corresponding to the identification of the accession throughout the work.Accession seeds were germinated and the seedlings were cultivated for DNA extraction in greenhouse, at the Departmento de Genética da Escola Superior de Agricultura Luiz de Queiroz (ESALQ/USP), in Piracicaba, São Paulo State, Brazil.Seedlings were grown in greenhouse conditions and pots were fertilized following the technical recommendations for soybean.Overhead irrigation was used to ensure the establishment of the seedlings, and the temperature was kept under 30°C.
Twenty days after the germination, plant DNA was extracted in bulk, from young fresh leaves of a group of five seedlings, using the CTAB method, as described by Doyle & Doyle (1990).Quality and concentration of DNA were determined by comparison to DNA standard markers using SYBRSafe staining (Invitrogen, Carlsbad, USA) on 1% (w/v) agarose gels.After quantification, DNA concentrations were adjusted to10 ηg µL -1 .
Thirty SSR primer-pairs were selected; 20 corresponding to genomic SSR and 10 to EST-SSR (Table 2).The primerpairs were selected based on their distribution on the 20 soybean genetic linkage groups, on their trinucleotide repetition unit, and on their polymorphism information content (PIC) found on previous studies (Cregan et al., 1999).The twenty genomic SSR allowed effective coverage of the whole soybean genome.The use of EST-SSR markers permits direct investigation of candidate-Pesq.agropec.bras., Brasília, v.45, n.3, p.276-283, mar. 2010 Table 1.Acession number, plant introduction code and geographic origin of the 79 soybean accessions investigated in the present study.Table 2. Genomic (the first 20 loci) and functional (the remaining 10 loci) microsatellite primers used to assess the genetic diversity among 79 soybean accessions (1) .genes involved in metabolic pathways and their association to important agronomic traits.Genomic SSRs are generally more polymorphic than SSR markers derived from EST (Song et al., 2004).Therefore, the used markers were combined attempting to better investigate the genetic diversity and, simultaneously, to search for candidate genes.
Amplifications through PCR were performed in a 15-μL final volume containing 20 ηg of template DNA, 0.2 μmol L -1 of each forward and reverse primers, 200 μmol L -1 of each dNTP, 1.5 mmol L -1 MgCl 2 , 10 mmol L -1 Tris-HCl (pH 8.9), 50 mmol L -1 KCl and 1.5 U Taq DNA polymerase (Invitrogen, São Paulo, Brazil).Reactions were performed in a Bio-RAD thermocycler (MyCycler, Bio-RAD, USA) as follows: 94ºC for 2 min; followed by 32 cycles at 94ºC for 1 min; the annealing temperature specific for each primer pair (Table 2) for 1 min; extension at 72ºC for 1 min, followed by a final elongation step at 72ºC for 10 min.Amplification products were separated by electrophoresis on 7% (w/v) denaturing polyacrylamide gels, with 7 mol L -1 urea and 1X TBE, at constant power (70 Watts), for approximately 3-5 hours, run along with a 10-bp ladder as a size-standard, and silver-stained according to Creste et al. (2001).Amplified fragments displaying distinct sizes were considered to be different alleles.
Allelic and genotypic frequencies for each locus were calculated using the TFPGA software (Miller, 1997).The number of alleles per locus (A), expected heterozygosity (H e ) and observed heterozygosity (H o ) were estimated by GDA software (Lewis & Zaykin, 2000).A measure of allelic diversity at a given locus (PIC) was calculated for all 30 loci, according to the formula referred by Wang et al. (2008): , where P ij is the frequency of the j-th allele for the i-th marker.
To access the genetic relation of the accessions, dissimilarity matrices of Rogers-W were calculated and clustered using NTSYSpc software, version 2.2 (Rohlf, 2005) employing the unweighted pair-group method of the arithmetic average (UPGMA) to generate the dendrograms.
Genetic dissimilarity matrices obtained from genomic and EST-SSR data were compared to the measured degree of relationship between them by computing the correlation (r) and the Mantel-test statistic (Z), with 1,000 permutations, using NTSYSpc software, version 2.2 (Rohlf, 2005).The comparison aimed to investigate the estimates of genetic dissimilarities generated by genomic and EST-SSR data.

Results and Discussion
All analyzed loci were polymorphic (Table 2).This result was expected due to the wide geographic distribution of the accessions.The primer-pairs used showed 259 alleles distributed throughout 30 loci.The number of alleles per locus ranged from 2 to 21, with an average of 8.63.Among the genomic SSR, the locus Sat_001 from the linkage group (LG) D2 had the highest number of alleles ( 21), whereas Satt126, from LG B2, presented the lowest number of alleles (3).Among the EST-SSR, locus GYGY had the highest number of alleles ( 14), whereas loci AW508 (LG L) and PHYA1 had only two different alleles.Similar results were found by Fu et al. (2007), when they analyzed Canadian soybean cultivars and exotic germplasm using SSR markers.
Allelic frequencies were calculated for each locus, and all investigated alleles exhibited a frequency higher than 1%, characterizing the polymorphism inside the population (Table 3).Only 14 alleles exhibited a frequency higher than 50%, in agreement with the previously observed great divergence between the accessions.The majority of the alleles exhibiting frequencies higher than 50% corresponded to EST-SSRs loci.The accessions displayed a high frequency of rare alleles: from the total of 259 alleles, 59 were exclusive, n j=1  4).
The following accessions exhibited exclusive alleles in two loci: 2, 6, 10, 31, 37, 42, 49, 59, 71 and 78. Hyten et al. (2006) concluded that, during soybean domestication, 50% of the genetic diversity and 81% of the rare alleles were lost, and that there were changes in 60% of allelic frequencies.Moreover, the introduction of a few accessions in producing countries might have caused losses of approximately 79% of the rare alleles previously found in domestic populations of soybean.A large significant number of rare alleles may contribute to soybean breeding, since they are absent from elite cultivars.
PIC values for the 30 used primers ranged from 0.166 at locus PHYA1 to 0.921 at locus Sat_001, with an average of 0.626 (Table 2).Regarding exclusively the genomic markers, PIC ranged from 0.360 at locus Satt509 to 0.921 at locus Sat_001, with a mean value of 0.714.The PIC value of functional markers ranged from 0.166 at locus PHYA1 to 0.748 at locus PRP1, with an average of 0.450.Therefore, a high PIC mean value for SSR (0.626) and a medium mean value for EST-SSR (0.450) were identified in the present work.These observations indicate great diversity between the accessions and also demonstrate that the selected primers are highly informative and useful for further studies on soybean genetic diversity.
The dendrogram representation of all analyzed loci (genomic and functional), assuming a cutoff point of 0.82, exhibits five groups and several subgroups, with a relative agreement according to geographic origin (Figure 1).A noteworthy group is constituted by accessions 1, 75, 79, 35, 59, 63, 46, 68, 47 and 69, since all accessions are from Eastern Asia (Japan, Korea, Northeast China, Russia, Philippines and Malaysia).Moreover, the group also concentrates the highest frequencies of rare alleles, in agreement with the fact that this geographic region is a center of origin and diversity of soybean.Other groups consistent with their geographic distribution are: Afghanistan (30 and 31) and Pakistan (33); Nepal (15 and 71) and Malaysia (74); USA (49 and 78) and Center China (60).The remaining group is the largest and contains all African and South American accessions, as well as some Asian accessions.Subgroups are identifiable within the large group and these comprise accessions consistent with geographical origin, such as: China (2 and 17), Japan (42), Korea (3) and Pakistan (26); Vietnam (51, 52 and 64), China (37 and 56) and Japan (41); two very close accessions, South China (62 and 67); all three accessions from South Africa (8, 11 and 16) along with an accession from Mozambique (29); the accessions from Sudan, Uganda and Tanzania (53, 58 and 77); and Liberia (54) and Kenya (48).A significant portion of South American accessions clustered together in a single subgroup: Guatemala (13 and 20), Peru (39 and 70), Argentina (34) and Suriname (25).A small group comprising accessions from El Salvador, Peru and Argentina (9, 12 and 72) was also present.These groups and subgroups are in agreement with previous studies, suggesting similar clustering patterns (Abe et al., 2003;Hymowitz & Kaizuma, 1981).Abe et al. (2003) proposed a cluster composed of accessions from Japan, Korea and Russia, as shown in the current study.Hymowitz & Kaizuma (1981) found similar results.Furthermore, these authors also indicate the existence of a group from Nepal.Perry & McIntosh (1991) suggested the existence of an African group, in  , 3, 4, 5, 8, 15, 18, 24, 27, 33, 43, 44, 45, 46, 47, 48, 51, 55, 60, 68, 69, 74 and 76 2 2, 6, 10, 31, 37, 42, 49, 59, 71  agreement with our findings, since all African accessions were clustered in a single group.Molecular data, analyzed separately for genomic and functional loci, resulted in slightly different dendrograms, indicating that each approach differently accesses the variability present in soybean germplasm.In fact, the relationship degree between matrices derived from genomic and expressed loci, calculated by Mantel tests (Mantel, 1967), shows a low correlation value (r = 0.28**), which, however, was significant and revealed some coherence between both datasets.

Conclusions
1.The genetic diversity of the investigated accessions is high, distributed over five groups and several subgroups, and exhibits a moderate level of association between genetic divergence and geographical origin of accessions.
2. Genetic diversity of soybean is effectively investigated using both genomic and functional microsatellites markers, which allow a more complete coverage of the existent genetic variation.

Figure 1 .
Figure 1.Dendrogram representation of the accession clustering.The distances were calculated using dissimilarity matrices of Rogers-W and clustered according to UPGMA.

Table 3 .
Allelic frequencies, allele distribution on frequencies ranges and percentage of the total genetic pool for each range of the 79 soybean accessions. in a single genotype, and corresponding to approximately 2.5% of the total genetic pool.For the alleles with frequency lower than 5%, the number increased up to 118, representing 8.7% of the total genetic pool.The accessions with the highest number of exclusive alleles were genotypes 19 and 63, each exhibiting exclusive alleles in three loci; accession 65, in four loci, and accession 35, in five loci (Table (1) Except exclusive alleles.Pesq.agropec.bras.,Brasília,v.45,n.3, p.276-283, mar.2010present

Table 4 .
Soybean accessions exhibiting exclusive alleles and allele number.