Genetic diversity analysis of common beans based on molecular markers

A core collection of the common bean (Phaseolus vulgaris L.), representing genetic diversity in the entire Mexican holding, is kept at the INIFAP (Instituto Nacional de Investigaciones Forestales, Agricolas y Pecuarias, Mexico) Germplasm Bank. After evaluation, the genetic structure of this collection (200 accessions) was compared with that of landraces from the states of Oaxaca, Chiapas and Veracruz (10 genotypes from each), as well as a further 10 cultivars, by means of four amplified fragment length polymorphisms (AFLP) +3/+3 primer combinations and seven simple sequence repeats (SSR) loci, in order to define genetic diversity, variability and mutual relationships. Data underwent cluster (UPGMA) and molecular variance (AMOVA) analyses. AFLP analysis produced 530 bands (88.5% polymorphic) while SSR primers amplified 174 alleles, all polymorphic (8.2 alleles per locus). AFLP indicated that the highest genetic diversity was to be found in ten commercial-seed classes from two major groups of accessions from Central Mexico and Chiapas, which seems to be an important center of diversity in the south. A third group included genotypes from Nueva Granada, Mesoamerica, Jalisco and Durango races. Here, SSR analysis indicated a reduced number of shared haplotypes among accessions, whereas the highest genetic components of AMOVA variation were found within accessions. Genetic diversity observed in the common-bean core collection represents an important sample of the total Phaseolus genetic variability at the main Germplasm Bank of INIFAP. Molecular marker strategies could contribute to a better understanding of the genetic structure of the core collection as well as to its improvement and validation.


Introduction
Mexico is a major center in domestication and genetic diversity of the common bean (Phaseolus vulgaris L.) (Logozzo et al., 2006), which, together with maize (Zea mays L.), constitutes the most important source of proteins in the Mexican diet, and comprises the basic daily meal for most people, countrywide. In Mexico, wild, weedy populations have been primarily considered as genetic sources for germplasm and cultivar development (Chacón et al., 2005). Nevertheless, domestication has caused genetic drift in populations planted in the primary domestication centers (Gepts, 2004). Although selection based mainly on seedclass has reduced genetic diversity in cultivars developed in several production areas, variety in consumer preference has been of aid in its maintenance (Rosales Serna et al., 2005). Common-bean cultivars originally arose from the domestication of multiple wild, weedy landrace populations Gepts and Debouck, 1991;Chacón et al., 2007). Concomitantly, genetic diversity is being maintained at the INIFAP Germplasm Bank located near Texcoco.
There are 7,846 accessions at the INIFAP commonbean Germplasm Bank. Previous attempts have been made to select and characterize representative samples. Notwithstanding, difficulties arose in the management of such a large number. Detailed crop evaluation of the numerous accessions is essential for creating small, representative and man-ageable core collections, which, as thoroughly representative samples, would facilitate crop diversity characterization. Thus, a preliminary common-bean core collection was formed in 2004 to represent the complete diversity contained at the Bank. Manifold analyses and corrections were required to maximize core collection representation and increase use in genetic breeding programs. Subsequently, in 2006, a subset consisting of 200 accessions was selected to represent all the common-bean holdings in the Bank. Finally, the core collection was formed by selecting accessions with differences in morphology, phenology, disease resistance and seed traits (culinary quality) (Vargas et al., 2006). Geographical origin, by representing the 30 Mexican states where the beans are planted, was also considered. A higher cultivar number was included from states where they are traditionally planted and domestication could have occurred, such as Jalisco (Payró-de la Cruz et al., 2005).
A complete understanding of the genetic diversity and population structure of the common bean is essential for its conservation and management, but limited germplasm characterization is a major challenge for systematic use of common bean diversity in genetic breeding programs. Classical methods for characterizing genetic diversity in plants include the use of morpho-agronomic traits to establish genetic relationships among commercial cultivars, landraces and wild relatives (Newbury and Ford-Lloyd, 1997). Several types of DNA markers, developed to study genetic diversity and crop evolution, are now considered to be better for documenting the organization of diversity, when compared to former methods, such as morphologic markers (Charcosset and Gallais, 2002;Gaitán-Solís et al., 2002;Blair et al., 2009;Kwak and Gepts, 2009;Burle et al., 2010). Human-directed selection of common-bean populations has influenced crop evolution, with cultivars originating through domestication of adjacent areas now being conceived as showing higher mutual similarity than germplasm from distant regions. Differences have been found among populations from the southern (Mesoamerica race), central (Jalisco race) and northern (Durango race) regions (Singh et al., 1991;Rosales-Serna et al., 2005). Molecular characterization is required, not only to corroborate previous findings based on morpho-agronomic characterization, but also to increase the efficient use of germplasm for crop breeding. Molecular markers would also be beneficial towards improving representation in the core collection, by using a reduced number of cultivars.
In this work, we report on the results of estimations of genetic diversity in the INIFAP common-bean core collection, using AFLP and SSR, and their relationships with both landraces and cultivars.

Germplasm
The INIFAP common bean core collection includes 200 accessions collected from 30 states of Mexico. 10 cultivars released by INIFAP were included, as well as 10 landraces from each state (Oaxaca, Veracruz and Chiapas) as out-groups. Germplasm was classified by geographic origin (state) ( Table 1) and commercial-seed class (color) ( Table 2) (Vargas et al., 2006;Vargas-Vázquez et al., 2008).

DNA extraction
Genomic DNA was isolated from completely expanded and young 15-day-old leaves (the first trifoliate leaf) of ten plants grown under greenhouse conditions, and then bulked by accession using the protocol of Dellaporta et 596 Gill-Langarica et al. al. (1983). Since the common bean is a predominantly self-pollinating species, high levels of observable heterogeneity were not expected.

AFLP analysis
AFLP analysis was according to Vos et al. (1995). Two restriction enzymes (EcoRI and PstI) were employed to digest the DNA, and four AFLP primer combinations to amplify selective fragments. Oligonucleotide primers used for the AFLP pre-amplification step were EcoRI (EcoRI + A): 5'-AGACTGCGTACCAATTC/A-3'; and MseI (MseI + A): 5'-GACGATGAGTCCTGAGTAA/A-3'. Preamplification was followed by a second step of selective amplification with three selective nucleotides. The resultant products were separated by electrophoresis on 6% polyacrylamide gels, by using an automatic sequencing system (IR2 model; Li-Cor; Lincoln, NE, USA). Gel read-ings and binary matrix construction were obtained using Cross Checker V2.9 software (Buntjer, 1999).

SSR analysis
Seven SSR loci (BM143, BM152, BM164, BM183, BM188, BM210 and GATs91) obtained from previously reported genomic sequences (Yu et al., 2000;Blair et al., 2006) were used for analyzing the entire core collection. PCR amplification was according to conditions reported by Yu et al. (2000) for each SSR. The reaction volume was 20 mL, this consisting of 75 ng of DNA, 0.16 mM of each primer (sense and antisense), 2 mL of 10X PCR buffer, 1.5-2.5 mM of Mg (depending on the primer), 2 mM of dNTPs and 1 U of Taq DNA polymerase. Amplification products were separated on 6% polyacrylamide gels (1300 V for 2 h). Hyperladder V (Promega®) was used as the molecular weight-marker ladder, and PCR products vi-Diversity of Mexican bean core collection 597 sualized with a silver staining kit (Promega, Madison, WI, USA). Gels were documented for allele detection by way of the Kodak Molecular Imaging System v. 4.0 (Eastman Kodak, Rochester, USA). Raw allele size calls were then binned to assign a whole integred allele value using the AlleloBin software program (Idury and Cardon, 1997).

Data analysis
AFLP bands were numbered according to molecular weight one or zero being used to denote the presence or absence of each fragment, respectively, whereas the level of polymorphism was expressed as a percentage, based on the number of polymorphic bands obtained from the total number of fragments amplified with a marker (Table 3). SSR bands were also numbered according to molecular weight, and the number of alleles per locus determined with GenAlEx 6.0 software (Peakall and Smouse, 2006).
Both amplified bands by primer combination (AFLP) and frequencies of amplified alleles per locus (SSR) were applied to calculate the genetic diversity index (DI) (Nei, 1978). GenAlEx 6.0 was used to evaluate the average number of alleles (A), effective number of alleles (EA), polymorphic loci (P) and polymorphic loci percentage (%) in SSR loci, by way of accessions previously classified by commercial-seed class and geographical origin.
The 0/1 matrix of the AFLP markers was used for calculating genetic dissimilitude according to Nei (1978). Bootstrap was applied to corroborate the Neighbor-Joining dendrogram (Saitou and Nei, 1987) and to determine its robustness. One thousand bootstrap replicates, obtained from the original data, were classified by commercial-seed class and geographical origin. All the calculations, as well as dendrogram construction, were carried out with a DARwin 5.0 (Perrier and Jacquemoud-Collet, 2006). Genetic dissimilitude, based on simple-matching dissimilarity measures, was calculated with the amplified allele frequencies per SSR locus of all the accessions, prior to constructing a Neighbor-Joining dendrogram per commercial-seed class and geographical origin, followed by bootstrap for corroboration.
Haplotype definition was with SSR data. Genetix 4.0 (Belkhir et al., 1996(Belkhir et al., -2004 was used for converting geno-type data into the file format used in Arlequin 3.11 (Excoffier and Schneider, 2005), whence shared haplotypes were estimated. As the common bean is predominantly an autogamous species we considered the studied accessions as homozygous lines, whereby a haploid genome (Papa and Gepts, 2003) was assumed for data analysis. Finally, the similarity matrix was applied for hierarchical analysis of molecular variance (AMOVA) (Excoffier et al., 1992) using Arlequin 3.11. Accessions were classified according to geographical origin or commercial seed-class. In both cases, accessions were divided into three hierarchies, viz., groups (commercial-seed class or state of origin), accessions within groups and accessions. Genetic differentiation among hierarchies was defined based on F PT values, which had been tested by 1000 permutations for significance in all analyses.

Results
Sixteen commercial-seed classes were visually identified in the INIFAP preliminary core collection, with a variable number of accessions by state and seed class (Tables 1 and 2). Seed-class diversity was higher in Aguascalientes (23 accessions with 10 seed classes), Jalisco (22 accessions with nine classes), Puebla (16 accessions with nine classes) and Zacatecas (14 accessions with nine classes). In those states with a higher number of commercial-seed classes, there was a tendency for more accessions. Chiapas was the exception, with a high number of accessions (21), but a low number of seed classes (6). The 'cream' class, encountered in 19 states, was represented by 51 accessions, 'marbling' in 12 states with 18 accessions, 'black' in 10 states with 24 accessions, and 'creammottled' in 10 states with 14 accessions.

AFLP analysis
530 reproducible AFLP bands were obtained by using four oligonucleotide combinations with 469 polymorphic bands (89%) ( Table 3). Accessions from the core collections of Jalisco, Aguascalientes and Oaxaca were the most diverse, with cream and brown seed genotypes presenting the highest DIs, and both landraces from the state of Oaxaca, and cultivars and accessions from Veracruz, the lowest. The low DIs also noted in gray (0.17), red mottled (0.18) and pink striped (0.18) accessions was probably due to the reduced number of accessions of these seed types (Tables 4  and 5). Significant genetic differentiation (p < 0.01) among accessions within groups and accessions was detected with AMOVA, only when germplasm was classified by geographical origin (Tables 6 and 7).

SSR analysis
The highest number of allele (A) per locus, average number of alleles, and effective number of alleles were all observed in those states with the highest number of accessions, such as Aguascalientes, Jalisco and Chiapas. These 598 Gill-Langarica et al.  (Table 4). The highest A, average number of alleles, effective number of alleles and DI were observed in cream, black and purple beans and the lowest in the gray, pink-striped and red (Table 5). Similar results were observed for EA, with corroboration of the lowest DI values in pink striped accessions. EA per SSR locus was seven, all of which polymorphic (Tables 4 and 5). Significant genetic differentiation (p < 0.01) among groups and within accessions was detected by AMOVA, when germplasm was classified by geographical and commercial seed classes. The highest genetic variance was found within populations (> 70%) (Tables 6 and 7).

Genetic diversity and relationships
The application of AMOVA to AFLP and SSR data indicated that most genetic variation occurred within accessions rather than among the other two hierarchies, viz., groups and accessions within groups, although a high percentage of genetic variation did indeed occur among groups, when germplasm was analyzed with AFLPs as to the different origins of accessions. Nonetheless, F PT values indicated genetic differentiation among groups and within accessions, when germplasm was analyzed with SSR markers, regardless of the geographical origin or commercialseed class (Tables 6 and 7). Cluster analysis divided germplasm into three groups. For SSRs, the first group included accessions from southern and central Mexico (Chiapas, Aguascalientes) commercial varieties, and landraces from Veracruz and Chiapas, whereas for AFLPs, it included landraces from Chiapas and Veracruz, and cultivars and germplasm from Chiapas and Tamaulipas. The second group was composed, not only of accessions from Jalisco, Aguascalientes, Tamaulipas, Chihuahua, Durango, Oaxaca and Zacatecas (SSRs), but also from southern (Chiapas), central (Aguascalientes, Jalisco) and northern (Chihuahua, Durango, Zacatecas) Mexico. The third group comprised accessions from the north (Zacatecas, Chihuahua, Durango) (SSRs), as well as those accessions from the north and landraces from Veracruz, which had been grouped separately from the other AFLPs (Figures 1 and 2). As regards commercial-seed classes, the first group included three types of seeds, which presented greater diversity in southern Mexico, namely red-mottled, purple and yellow (SSRs), as well as red-mottled, brown, black and brownstriped (AFLPs). The second group comprised accessions with cream, light-purple and brown-striped seed types (SSRs), as well as pink, yellow and light-purple (AFLPs). The third group included pink, cream, marbling, and cream mottled with black types (SSRs), as well as purple with marbling (AFLPs) (Figures 3 and 4).  (Nei, 1978).  (Nei, 1978).

Discussion
High genetic diversity was detected in the INIFAP common-bean core collection, this diversity increasing, when the various geographical origins or seed types of the germplasm came under analysis. Germplasm diversity and variations in the number of accessions per state made it difficult to establish an ideal balanced common-bean core collection. The thoroughness of this collection could be improved by using balanced samples from all the Mexican states, according to the number of accessions per state and commercial-seed class. Selection of an additional 200 accessions is also possible, seeing that core collections could include 5 to 20% of the total collection (Gepts, 2006). Similar results for diversity indices were found for both AFLP and SSR markers. The high polymorphism rates detected with AFLPs could be useful for common-bean germplasm characterization. Polymorphism levels were high compared to previous reports on common beans when using RAPDs (Duarte et al., 1999;Beebe et al., 2000), although with AFLPs they were lower than with SSRs. SSR polymorphism itself was either high compared to other reports (Gómez et al., 2004;Blair et al., 2006;Díaz and Blair, 2006;Rossi et al., 2009;Burle et al., 2010), or similar (Kwak and Gepts, 2009). The polymorphic SSR marker in the present study presented from 6 to 13 alleles, 8.21 per locus on an average. In contrast, Blair et al. (2006) reported an average of 11 alleles per locus in an SSR analysis of a worldwide common-bean collection, whereas Blair et al. (2009) reported over 72 alleles in an SSR analysis, with an average of 18 alleles per locus, in an international collection of common beans from Andean and Mesoamerican 600 Gill-Langarica et al.      gene pools. The high diversity observed in cream (0.86 and 0.30 with SSR and AFLP data, respectively) and brown (0.81 and 0.30 with SSR and AFLP, respectively) seed accessions could possibly be due to independent domestication events with these commercial seed classes, as these seed-colors occur in numerous wild populations of the common bean in Mexico (Vargas-Vázquez et al., 2008). As these types of commercial classes are very popular with consumers in the central and western regions, beanbreeding has focused on developing cultivars with these grain-traits. In contrast, less diverse seed types resulted from localized domestication events. In both cases, consumer preference influenced selection in domestication, as well as grain-size, cooking traits and taste later on. The present results reinforce the theory of multiple domestication centers (Chacón et al., 2005), and constant germplasm mobilization among producing regions from the south towards the center and vice versa. For example, an intensive breeding program in Mexico has been under way for more than 30-years, to develop new cream-seeded beans ('azufrados') by using Andean gene-pool germplasm. Major results include Andean x Mesoamerican gene-pool hybrids, the so-called 'peruano' beans, as well as cream-seeded genotypes from Mexican and Peruvian crosses (Voysest, 2000). Significant genetic differentiation among groups and within accessions was detected, the highest proportion of genetic variance being assigned to within-populations. The present results imply the poor or unclear genetic structure in populations as originating from wild populations of the Andean gene pool, since, on using RAPDs, Cattan-Toupance et al. (1998) showed a much higher withinpopulation variance component in germplasm from Argentina (> 67%), than that estimated by Papa and Gepts (2003) in wild and domesticated beans from the Mesoamerican gene pool of Mexico (from 44 to 58%), the more marked geographic structure of genetic diversity thus conditioning genetic differentiation. Germplasm with limited geographical structure and less differentiation among populations and regions can result in a much higher within-population genetic-diversity component. These findings can be attributed to the effects of seed exchange among farmers and homogeneous selection in different environments (Papa and Gepts, 2003).
No shared haplotypes were found among the INIFAP common-bean core collection accessions, which contrasted with previous findings involving Jalisco and Durango races (Chacón et al., 2005;Díaz and Blair, 2006). This absence of shared haplotypes indicated careful genotype selection to construct a collection with high diversity. However, the present data need additional confirmation by means of an increased number of SSR loci, in order to obtain improved statistical support and to clearly assert that any accession is genetically distinct from any other from within the whole collection. Notwithstanding, both unique genotypes and the representativeness of each accession from all the regions were corroborated. On using any marker strategy, relationships found between genetic diversity levels and number of accessions confirmed that in order to improve the thoroughness of the INIFAP core collection, balanced accession numbers based on classification criteria, such as common-bean commercial-seed class or geographic region of origin, or both, are required. Increased representation could be obtained using the maximum number of alleles from each commercial-seed class or agro-ecological origin. To improve the core collection, the balanced selection of those classes with higher diversity, as well as the effective sampling of less diverse types, is required.
The high genetic diversity found within accessions makes it difficult to select representative accessions from each commercial-seed class. Variations observed among accessions within groups also need to be exploited for germplasm selection, and for broadening the genetic base in the common-bean core collection. The fixation indices AFLP PT and SSRF ST could be used as additional tools for germplasm selection, although SSR data should be interpreted with caution, as only a few microsatellites were used for analysis. The high genetic diversity encountered in germplasm collected in central Mexico gives additional support to previous inferences, that domestication events took place thereabouts. The genetic complex found here could be a main source of diversity for the southern and northern regions. Germplasm dispersal from the Central Highlands was mainly directed towards the south, since similarities were observed in germplasm from these two regions. Easily observable differences were detected between some commercial-seed types from the north and germplasm collected in the Central highlands. Other commercial-seed types from the north revealed genetic similarities with germplasm from central and southern Mexico, most likely due to human migration and seed mobilization.
The genetic diversity observed in the common-bean core collection represents an important sample of the total genetic variability contained in the main INIFAP Germplasm Bank of Phaseolus. Although significant genetic diversity has been included in this collection, further analysis is required, this including defining genetic diversity within accessions and among commercial-seed classes, states and/or regions of origin, genetic races and gene pools. AFLP and SSR markers are important tools for a better understanding of genetic relationships among accessions and germplasm, and for accession selection and construction, as well as validation, of the INIFAP core collection.