Using genetic diversity information to establish core collections of Stylosanthes capitata and Stylosanthes macrocephala

Stylosanthes species are important forage legumes in tropical and subtropical areas. S. macrocephala and S. capitata germplasm collections that consist of 134 and 192 accessions, respectively, are maintained at the Brazilian Agricultural Research Corporation Cerrados (Embrapa-Cerrados). Polymorphic microsatellite markers were used to assess genetic diversity and population structure with the aim to assemble a core collection. The mean values of HO and HE for S. macrocephala were 0.08 and 0.36, respectively, whereas the means for S. capitata were 0.48 and 0.50, respectively. Roger’s genetic distance varied from 0 to 0.83 for S. macrocephala and from 0 to 0.85 for S. capitata. Analysis with STRUCTURE software distinguished five groups among the S. macrocephala accessions and four groups among those of S. capitata. Nei’s genetic diversity was 27% in S. macrocephala and 11% in S. capitata. Core collections were assembled for both species. For S. macrocephala, all of the allelic diversity was represented by 23 accessions, whereas only 13 accessions were necessary to represent all allelic diversity for S. capitata. The data presented herein evidence the population structure present in the Embrapa-Cerrados germplasm collections of S. macrocephala and S. capitata, which may be useful for breeding programs and germplasm conservation.


Introduction
The genus Stylosanthes Sw. (Fabaceae) consists of approximately 48 species distributed throughout the tropical regions of the Americas, Africa and Asia (Costa and Ferreira, 1984;Mannetje, 1984;Kumar and Sane, 2003). Brazil is considered the major center of Stylosanthes diversity comprising 45% of all the species within this genus (Ferreira and Costa, 1979;Stace and Cameron, 1984). The central region of Brazil is recognized as having the highest phenotypic variation and endemism for this genus (Costa N, 2006, PhD thesis, Universidade Técnica de Lisboa, Lisboa, Lisbon, Portugal).
Some Stylosanthes species are used as pasture legumes and thus have economic importance in tropical and subtropical regions (Edye and Cameron, 1984). Some of these species can also be used for soil improvement through nitrogen fixation, regeneration of degraded wastelands, and for promoting water and soil conservation (Chakraborty, 2004). seeds and dry matter, and its inflorescences have a high nutrition value (Williams et al., 1984;Costa N, 2006, PhD thesis, Universidade Técnica de Lisboa, Lisbon, Portugal).
Major collections of important crop plants are held in gene banks around the world. These collections serve as repositories of the biodiversity available for each species and thus are a valuable resource for genes useful to plant breeders. The efficient maintenance and use of germplasm are commonly restricted due to the lack of genetic information and/or by the large numbers of accessions in these collections (Virk et al., 1995). Molecular markers, along with morpho-agronomic data and ecological descriptions of sampling sites have proven to be relevant for evaluating germplasm (Westman and Kresovich, 1997;Zong et al., 2009). The use of molecular markers can also help to select material for establishing a core collection, i.e., a group of accessions from an existing germplasm collection that is chosen to represent the genetic spectrum of the entire collection (Hao et al., 2006). Microsatellites or simple sequence repeats (SSRs) have proven to be among the most suitable markers for such purposes (Huang et al., 2002;Hao et al., 2006;Landjeva et al., 2006;Wang et al., 2006;Ebana et al., 2008;Blair et al., 2009;Cipriani et al., 2010).
In this study, we evaluated the genetic diversity and population structure in accessions of the Embrapa-Cer-rados germplasm collections of S. macrocephala and S. capitata using polymorphic SSRs. Based on this diversity information, we determined the minimum sample size acceptable for a core collection of each species.

DNA extraction and PCR
A total of 326 accessions from the Embrapa-Cerrados germplasm collections were used in this study: 134 accessions of S. macrocephala and 192 of S. capitata (Tables 1  and 2). The SSR markers developed by Santos et al. (2009a) (13 SSR S. macrocephala loci) and Santos et al. (2009b) (15 SSR S. capitata loci) were used to assess the genetic diversity of these accessions.
Total DNA was extracted from leaves of three plants from each accession according to the cetyltrimethylammonium bromide method described by Faleiro et al. (2003). PCR amplifications were performed using a PTC-200 (MJ Research) thermocycler in a 20-mL final reaction volume consisting of 1X PCR buffer, 1.5 mM MgCl 2 , 0.25 mM of each dNTP (Invitrogen), 0.8 mM of each primer, 1U Taq DNA polymerase (Invitrogen) and 20 ng genomic DNA. The amplification protocol consisted of an initial denaturation step at 94°C for 1 min, followed by 848 Santos-Garcia et al.   30 cycles of 94°C for 1 min, 60°C for 1 min and at 72°C for 1 min, with a final extension step at 72°C for 5 min. PCR-amplified DNA fragments were separated by electrophoresis on 6% denaturing polyacrylamide gels at 75 W for approximately 2 h and then stained with silver nitrate according to Creste et al. (2001). Allele scoring was done by comparison to a 10-bp DNA ladder (10-330 bp range) (Invitrogen).

Data analysis
Allele frequencies, observed and expected heterozygosities (H O and H E ) and Roger's genetic distance modified by Wright (1978) were calculated using the Tools for Population Genetic Analysis (TFPGA) software (Miller, 1997). Population structure was inferred using STRUCTURE 2.0 software (Pritchard et al., 2000), and the accessions were assigned to groups based on their genotypes. STRUCTURE uses model-based clustering in which a Bayesian approach identifies clusters based on their fit to Hardy-Weinberg and linkage equilibria.
The optimum number of populations (K) was selected after ten independent runs with a burn-in period of 300,000 and 400,000 replications using a model that does not allow for admixture or correlated allele frequencies. The procedure described by Evanno et al. (2005) was used to estimate the most probable number of distinct genetic groups (K) in each germplasm collection. Nei's G ST among the groups defined by the STRUCTURE analysis was calculated using the software FSTAT (Goudet, 2001).
Genetic relationships among the accessions based on the genotypic data and Roger's genetic distance were estimated using a Neighbor-Joining method in DARwin 5.0 software (Perrier and Jacquemoud-Collet, 2006).
Finally, by using the software COREFINDER (Cipriani et al., 2010) we assembled a core collection that should represent 100% of the genetic diversity present within the entire collection.

Results
We used SSR markers developed for S. macrocephala and S. capitata to genotype all of the accessions in germplasm collections of both species. In S. macrocephala, 61 alleles were identified at 13 microsatellite loci, and 51 alleles were identified at 15 loci in S. capitata. In S. macrocephala the range was 2 to 11 alleles per locus (4.7 average) (Table 3), with H E values ranging from 0.02 to 0.85 (0.36 on average) and H O values varying from 0.01 to 0.17 (0.08 on average), thus representing a low level of genetic diversity. With regard to the S. capitata descriptive data, the numbers of alleles ranged from 2 to 9 for all of the loci analyzed (3.4 on average) (Table 4) The method of Evanno et al. (2005) was used to define the maximal DK, which was at K = 5 in the S. macrocephala germplasm collection, based on the STRUCTURE analysis ( Figure 1). Cluster analysis revealed that 75 of the accessions (57%) were assigned to a single group with 852 Santos-Garcia et al.   more than 80% probability, whereas the other 59 accessions represented a mixture of different groups. Group D comprised the largest number of non-mixed accessions, with 79% of the individuals in this cluster showing more than 80% probability of membership. In contrast, most accessions in groups C and E had less than 80% probability of membership (59% and 62%, respectively). The descriptive data calculated for the individual clusters revealed that H O ranged from 0.03 in group D to 0.14 in group C, and that H E values varied from 0.14 in group D to 0.38 in group C. The STRUCTURE procedure clustered the S. capitata germplasm accessions into four groups (Figure 2), wherein 131 accessions (68%) were assigned to a single group with more than 80% probability of membership, and the remaining 61 accessions were so to different groups. Group D contained the largest number of accessions assigned with more than 80% membership probability (97%), whereas group A contained the highest percentage of mixed accessions (61%). H O values ranged from 0.40 in group A to 0.56 in group C, and H E values varied from 0.40 in group A to 0.49 in groups C and D. The Nei's genetic diversity among the groups (G ST ) was calculated to infer the proportion of genetic diversity due to differences among the groups clustered by STRUCTURE in both species. G ST values were 27% and 11% for S. macrocephala and S. capitata, respectively.
We used DARwin software to arrive at a Neighbor-Joining (NJ) tree derived from the Roger's genetic distance results (Figures 1 and 2). In this analysis, the clusters formed by STRUCTURE with high levels of mixed accessions (less than 80% probability) became dispersed along the NJ tree.
We assembled representative core collections for both species (Figure 3), aiming to obtain 100% of the genetic diversity observed in this study. This goal was accomplished with 23 accessions of S. macrocephala and 13 accessions of S. capitata. The alleles identified in this study were fully represented in these core collections.

Discussion
The SSR markers analyzed in this work were suitable for evaluating the genetic information in the accessions of S. macrocephala and S. capitata. Santos et al. (2009c) observed the same range of alleles per locus (2 to 11) in 20 accessions of this same S. macrocephala germplasm collection, but with a smaller average of four alleles per locus. In S. capitata, another study observed a range of alleles per locus that varied from 2 to 7 alleles per locus and averaged 3.3 in 20 accessions of the same germplasm collection analyzed using eight microsatellites (Santos et al., 2009b). In S. guianensis (Aubl.) Sw., the analysis of 20 loci in 20 accessions revealed allele numbers between two and seven, with an average of four (Santos et al., 2009c). However, when the number of S. guianensis accessions was increased to 150, the number of alleles per locus was equal to the vari-Genetic diversity in Stylosanthes spp. 855 ation (2 to 11) and average (4.7) observed here for S. macrocephala (Santos-Garcia MO, 2009, PhD thesis, Universidade Estadual de Campinas, Campinas, Brazil). The allele sizes of S. macrocephala were consistent with the expected sizes reported in Santos et al. (2009a,c), with the exception of a few differences that occurred when higher numbers of alleles were observed for the same loci. The S. capitata accessions exhibited high levels of heterozygosity.
Vander Stappen et al. (2002) showed that allotetraploid Stylosanthes species have high levels of fixed heterozygosity, which may explain the observed heterozygosity rates identified in the germplasm collection described for this study. As we used bulk samples, the observed heterozygosity could be explained by outcrossing and the inclusion of heterozygous individuals, or by heterogeneity in the GenBank accessions (Zhang et al., 2008).
The genetic distances denoted in this study were higher than those previously reported for other species of the genus Stylosanthes. One possible explanation is that a larger number of accessions were analyzed here than in other studies. Furthermore, the types of molecular markers used in the previous studies were generally less polymorphic than our SSR markers. Barros et al. (2005) studied a subset of 86 accessions from the same S. macrocephala germplasm collection studied here using 15 RAPD primers and reported genetic distances ranging from 0.02 to 0.42. Hence, the microsatellite markers used herein revealed more genetic variation than the RAPD markers, similar to what has been shown in studies on other species (Powell et al., 1996;Sun et al., 1999;Laborda et al., 2005). When evaluated using RAPD markers, the genetic dissimilarity in S. scabra J. Vogel was 0.06 among the accessions from Brazil, Colombia and Venezuela, and for S. guianensis, it averaged 0.26 among 31 accessions (Kazan et al., 1993). The genetic distances among 42 S. guianensis accessions varied from 0.05 to 0.69 when measured using AFLP analysis (Chang-Shun et al., 2004), and a recent analysis of 150 S. guianensis accessions using 20 microsatellite markers also resulted in high genetic distance values (Santos-Garcia et al., 2012).
The population structure in the accessions of S. macrocephala and S. capitata was examined using STRUCTURE 2.0, which uses a Bayesian clustering approach to probabilistically assign individuals to populations based on their genotypes. The analysis of population structure using the model-based approach of Pritchard et al. (2000) provided support for the existence of genetic structure in these germplasm collections. Accordingly, five groups were formed among the S. macrocephala accessions, and four groups were formed among the S. capitata accessions.
The observed and expected heterozygosities were calculated considering the clusters as independent populations. Within the S. macrocephala groups we found that group C had the highest level of genetic diversity, whereas group D was the most homogeneous, with a low rate of heterozygosity. For S. capitata, the results showed no differences among groups. Such homogeneity was not unexpected because most of the accessions of the S. capitata collection were sampled in two locations only. When calculating the Nei's G ST value among the groups formed by the STRUCTURE analysis approach, the S. macrocephala values were similar to other studies on species belonging to the Fabaceae family (Hamrick and Godt, 1996). In the S. capitata groups, the G ST values were lower than those found for other Stylosanthes species. AFLP studies estimated a 30% variation between S. humilis accessions from Mexico and South America (Vander Stappen et al., 2000), and another analysis on S. humilis H. B. K., based on AFLP, estimated 59% variation among groups. In contrast, the estimated variation among groups of S. viscosa (L.) Sw. was 66%, which is a higher degree of genetic difference than that observed for either of the species in our study (Sawkins et al., 2001).
The sampling locations of the accessions of the S. macrocephala germplasm collection are listed in Table 1. The samples were collected in the Brazilian States of Bahia, Goiás, Minas Gerais, Piauí and the Distrito Federal, though information regarding the exact site of collection is lacking for several accessions.
Group A (Figure 1) consisted of accessions from Bahia and Goiás, and groups B and E included accessions from Bahia and Minas Gerais. Group C consisted mostly of accessions from Bahia, whereas group D included accessions from Bahia, Goiás, and the Distrito Federal. Barros et al. (2005) described 10 groups of S. macrocephala inferred from RAPD markers; 75% of all of the accessions were clustered into only one group, whereas seven of the remaining groups contained no more than two accessions. This clustering of 75% of the accessions into the same group limited the analysis of the genetic diversity and population structure in the S. macrocephala collection. Furthermore, Genetic diversity in Stylosanthes spp. 857 the grouping created difficulties for comparing the RAPDderived clusters with those inferred from microsatellites. In this work, the Bayesian approach made it posssible to identify patterns of genetic variation among five S. macrocephala clusters and clarified the relationships among accessions within the same RAPD cluster previously described by Barros et al. (2005). Our results showed that the accessions collected in Bahia were distributed throughout all five of the groups obtained with STRUCTURE and that the group consisting mostly of accessions collected in this state exhibited the highest levels of genetic diversity. Based on these results, we hypothesize that the state of Bahia might be the location of the origin of S. macrocephala. However, data from natural populations are necessary to confirm this hypothesis. The sampling locations of the accessions of the S. capitata germplasm collection are listed in Table 2. The plants were collected in several Brazilian states, along with the Distrito Federal, and samples were also obtained from Colombia and Venezuela. The Colombia accession (CPAC 1618) is a mixture of several Brazilian accessions developed by Instituto Colombiano Agropecuario (ICA) as "Capita" variety and is considered a reference to S. capitata. The Capita variety was used as standard to check the phenotypic characterization of the S. capitata germplasm. Notwithstanding, most of the accessions were collected in Goiás and Bahia (54 and 39, respectively), representing 49% of the total collection. Groups A, B and C contained higher numbers of Bahia and Goiás accessions, whereas group B contained more samples from Bahia than from Goiás. Group D also contained several Bahia and Goiás accessions, but the majority of the accessions were from Minas Gerais. The only accession from Colombia was allocated to group B. The eight accessions from Venezuela were distributed among groups A, B, C and D, with five accessions from Venezuela clustering in group C, whereas each of the other groups contained only one accession each from this country. Group A comprised a great heterogeneity of localities, with accessions collected from all of the Brazilian states and South American countries, except for São Paulo and Colombia. Groups B and C contained the majority of the accessions from the northeastern states of Brazil and Goiás (central western region), whereas group D had more accessions from the southeastern states.
Due to sampling issues, many of the Brazilian states were poorly represented, and the genetic groups defined by STRUCTURE could not be correlated with geographic regions. Thus, for a more complete study of the genetic diversity of S. capitata in Brazil, new samples must be acquired, especially so from natural populations.
Using DARwin software, we constructed an NJ tree based on the Roger's genetic distances for S. macrocephala ( Figure 1) and S. capitata (Figure 2). For S. macrocephala, groups B and D, which contained the highest number of accessions assigned with more than 80% probability in the STRUCTURE analysis, mostly remained clustered together in the tree. In contrast, other groups with more mixed individuals were randomly distributed along the NJ tree. Similar results were obtained for S. capitata, in which group A, with more mixed accessions, was also dispersed over the NJ tree. For the remaining groups, the majority of accessions clustered together in the NJ tree.
When directly compared, the results of the STRUCTURE and the NJ tree analyses revealed certain differences related to the number of groups and their genetic structure, but such differences are expected because these methods are based on distinct assumptions . Model-based approaches, such as STRUCTURE, are more efficient than distance-based methods in discriminating genetic groups, as cluster identification is not affected by the genetic distance or graphical representation chosen (Pritchard et al., 2000). Nevertheless, a combined analysis using different approaches may provide a better definition of the genetic diversity and structure in both of the Stylosanthes collections. Genetic diversity is the basis for genetic improvement, and consequently, knowledge about germplasm diversity has a significant impact on plant breeding (Huang et al., 2002). Costa and Schultze-Kraft (1993) preformed a clustering analysis for S. capitata based on geographical regions and morpho-agronomic characteristics. As we used SSR markers obtained from genomic DNA, it is not possible to infer an association between the genetic markers and the phenotypic characters of the accessions. The groups obtained through molecular marker analysis are thus different from the ones obtained by Costa and Schultze-Kraft (1993), and both should be of importance to Stylosanthes breeders. In classical plant breeding programs, selection is done based on phenotypic evaluation, and improved progenies are obtained through crossing individuals of superior phenotypes and which, in general, are also genetically distant. Studies using molecular markers are complementary to phenotypic evaluation (Costa and Schultze-Kraft, 1993), and both are fundamental to genetic breeding programs.
Core collections were herein assembled for both Stylosanthes species, aiming to represent the entire genetic diversity identified in this study. The COREFINDER analysis showed that for S. macrocephala, 100% of the alleles found in this study could be represented by a core collection of 23 accessions. For S. capitata, only 13 accessions were necessary to represent 100% of the observed genetic diversity. Thus, we found that only a relatively small number of accessions were indeed necessary to represent the molecular diversity revealed in this study.
Certain factors may have contributed to the low number of accessions in the core collections suggested here. First, in terms of numbers of individuals collected in each region, the germplasm collection does not equally represent all of the distribution regions. As stated before, the germplasm collection includes some regions, such as the state of Goiás, with 54 different accessions, while others have only few representatives. We think this unequal representation may to some extent compromise the genetic diversity present in the collection and is likely reflected in the reduced number of individuals necessary to fully represent allelic diversity. In addition, S. capitata is an allotetraploid species that exhibits high levels of heterozygosity, which may contribute to reducing the size of the core collection (Cipriani et al., 2010). Sampling proportion and representation of base collection variation are the two most important characteristics to be observed when establishing a core collection (Hao et al., 2006). Brown et al. (1987) suggested that the number of accessions in the core should account for 5 -10% of the base collection, representing at least 70% of its genetic diversity. Van Hintum (1999) recommended that the sampling proportion should vary between 5% and 20% of the base collection, depending on the main objective. Both of the core collections proposed here represent 100% of the molecular diversity found in this study, with the number of accessions accounting for 17% and 7% of the base collection for S. macrocephala and S. capitata, respectively.
Our results demonstrate the great potential of using molecular data to construct a core collection and thus improve the management and utilization of the Stylosanthes germplasm collection of Embrapa-Cerrados. Nevertheless, because we used a relatively small number of genomic markers for the genetic analysis, the data presented here should not be used alone when deciding on which accessions from the germplasm collection should be discarded or maintained. Additional molecular markers, including more SSRs and single nucleotide polymorphisms (SNPs), should be used to provide better coverage of the genome. This information should be coupled with phenotypic data for traits of interest, such as phenology and disease resistance traits, to make a final decision on the accessions to be maintained. To initiate this effort, more genotyping and phenotyping should be initiated with the core collection proposed here and expanded to other accessions as necessary. In addition, the core collection can also be used in the selection of parents for future crosses, based both on genetic distance and phenotypic traits of the accessions.
Another issue that requires consideration is the genetic purity of the accessions used in this work. It was previously shown by our group that S. capitata and S. guianensis can cross-pollinate (Santos-Garcia et al., 2010), but breeders have not accounted for cross-pollination during Stylosanthes seed multiplication. Here, we demonstrated a high level of heterozygosity in S. capitata in some undefined genetic groups obtained with STRUCTURE and the Neighbor-Joining based tree. These results might have been influenced by contaminations of the different accessions by seed multiplication plots established close to each other in the field.
In this work, we used polymorphic microsatellite markers to evaluate the genetic diversity of two Stylosanthes germplasm collections, and the results revealed a population structure among the accessions of both species. Our work indicates that even a small number of microsatellite markers is informative for genetic diversity studies in Stylosanthes species, providing a rapid and low-cost procedure for screening Stylosanthes germplasm collections. The results for S. macrocephala suggest some correlation between the region of collection and distribution among the groups based on the SSR markers. The same conclusion could not be reached for S. capitata because the collection does not equally represent the regions of distribution of this species in terms of quantity of accessions from each region, thereby indicating a need to improve sampling for this collection. The data from this study will certainly provide valuable information to geneticists and breeders for future improvement and conservation of Stylosanthes species.