Quantification of the diversity among common bean accessions using Ward ‐ MLM strategy

The present work aimed at evaluating the divergence among common bean accessions by their agronomic, morphological and molecular traits, based on the Ward-MLM procedure. A collection of 57 accessions from the gene bank of Universidade Federal do Espírito Santo was used in this study, from which: 31 were landraces belonging to the community Fortaleza, in the municipality of Muqui, ES, Brazil; 20 accessions were provided by Embrapa Trigo; and 6 were commercial cultivars. Five agronomic traits (plant cycle, number of seeds per pod, number of pods per plant, weight of 100 seeds, and grain yield), five morphological traits (growth habit, plant size, seed shape, seed color, and commercial group) and 16 microsatellite primers were evaluated. High genetic variability was detected considering morphological, agronomic and molecular traits in the 57 common bean accessions studied. The Ward-MLM procedure showed that the ideal number of groups was five, according to the pseudo F and pseudo t2 criteria. The accessions from Andean origin had heavier seeds than others and formed a cluster. The Ward-MLM statistical procedure is a useful technique to detect genetic divergence and to cluster genotypes by simultaneously using morphological, agronomic and molecular data.


Introduction
Common bean (Phaseolus vulgaris L.) is considered one of the main sources of proteins in the Brazilian diet, with a per capita consumption higher than 17 kg year -1 (Burle et al., 2010).
In Espírito Santo state, common bean is produced almost exclusively by small farmers, who have been selecting local varieties adapted to their agricultural and socioeconomic conditions for several generations (Fonseca et al., 2007).However, competition for productivity and quality has led these farmers to use the genetically improved cultivars available in the market and to abandon local varieties of great significance for agrobiodiversity.To avoid "genetic erosion", it is necessary to collect and preserve these varieties in germplasm banks, making them available for future breeding programs (Gepts, 2006).However, the value of a germplasm bank depends on the information used to promote its use.In this sense, the characterization and evaluation of accessions of germplasm banks are very important for allowing of better knowledge about these accessions, enabling the detection of possible genotypes to be used in plant breeding programs and the identification of possible duplicates in the gene banks.
Accession classifications and the genetic diversity quantification in germplasm banks aim at identifying similar groups based on separate analyses of the quantitative trais (plant height, pod weight and days for flowering) and the qualitative ones (seed color and seed shape, the presence or absence of a certain trait, or a molecular marker) (Mohammadi & Prasanna, 2003;Crossa & Franco, 2004;Gonçalves et al., 2009).The knowledge of the accessions in a gene bank should be detailed to allow of the identification of different accession sets or clusters.The data set obtained during characterization is commonly analyzed considering separate types of descriptors or variables (quantitative or qualitative).However, joint analysis simultaneously considering results from qualitative and quantitative characterization data is an interesting alternative for breeders and curators of gene banks for a better quantification of genetic variability (Gonçalves et al., 2009).
The modified location model (MLM), proposed by Franco et al. (1998), is an interesting strategy to quantify the variability using quantitative and qualitative variables simultaneously.The MLM has two stages.In the first one, the Ward clustering method (Ward Junior, 1963) defines the groups using the dissimilarity matrix of Gower (Gower, 1971).In the second stage, the vector average of the quantitative variable is estimated by MLM procedure, for each subpopulation, regardless of the qualitative variable values.This procedure have been used for different purposes and with various crops such as maize (Gutiérrez et al., 2003;Franco et al., 2005;Ortiz et al., 2008), oilseed radish (Padilha et al., 2005), tomato (Gonçalves et al., 2009), snap bean (Barbé et al., 2010) and pepper (Sudré et al., 2010).
The present work aimed at evaluating the divergence among common bean accessions by simultaneously using the agronomic, morphological and molecular traits, based on the Ward-MLM procedure.

Materials and Methods
A collection of 57 common bean accessions from the gene bank of Universidade Federal do Espírito Santo (Table 1) was used in this study, from which: 31 local accessions belonging to the community Fortaleza, in the municipality of Muqui, ES, Brazil; 20 accessions provided by the Embrapa Trigo; and 6 commercial cultivars -Carioca, Serrano, Iapar 31, Iapar 44, Iapar 81 and Pérola.
The accessions were evaluated and characterized considering their morphological and agronomic descriptors.For the morphoagronomic characterization, an experiment was carried out in field conditions in Alegre municipality, ES, Brazil (20 o 45'49" S, 41 o 28'59" W, 150 m altitude), in a randomized block design, with three replicates.The experimental unit was composed of five 1.2 m length rows, spaced 0.5 m from each other, with a seeding rate of 10 seeds per meter.First and last rows and first and last plants of each row per plot were considered as borders.Plants were managed in a conventional cultivation system following the crop recommendations indicated by Vieira et al. (2006), including soil preparation, pest and disease control, and harvest.
Ten descriptors were used for the morpho-agronomic evaluation and characterization, five of which were morphological, described as follows: growth habit (GH) -determinate bush, indeterminate bush, indeterminate prostrate, and indeterminate climber; plant size (PZ) -ranked as erect, semi-erect, and prostrate; seed shape (SS) -ranked as spherical, elliptical, oblong/short reniform, oblong/average reniform, and oblong/long reniform; seed color (SC)classified as uniform, and desuniform; and commercial group (CG) -with the types White, Carioca, Jalo, Mulatinho, Black, Rosinha, Purple, and others.The other five descriptors were agronomic characters: crop cycle (CC), in days; number of seeds per pod (NSP); number of pods per plant (NPP); weight of 100 seeds (WS), in grams; and grain yield (GY), in kilograms per hectare.
The 57 accessions were sown in identified plastic pots, containing commercial substrate, and placed in a greenhouse until the first trifoliate leaves emerged.Samples of the first trifoliate leaves were collected and immediately placed in liquid nitrogen; following, they were identified and stored in biofreezer (-86 o C) until DNA extraction step.The DNA extraction was carried out according to Doyle & Doyle (1987) protocol, with modifications proposed by Abdelnoor et al. (1995).
Sixteen pairs of SSR primers were selected for common bean (Gaitán-Solís et al., 2002;Blair et al., 2003).The criteria for the selection of these primers were the high polymorphism, SSR distribution in the linkage groups of bean consensus map -so as to achieve higher cover of the genome -, and correlation between the SSR with agronomic traits.Primer amplification was carried out in a total volume of 15 μL, containing: MgCl 2 (2.4 mmol L -1 ), Tris-KCl pH 8,3 (0.25 mmol L -1 ), dNTP (0.25 mmol L -1 of each nucleotide), 0.6 µmol L -1 of each primer, one unit of Taq-polimerase, and 30 ng of DNA.
The amplifications were done in a thermocycler Techne TC-412 (Techne, Staffordshire, UK) under the following conditions: an initial stage of 5 min at 94ºC, and 30 cycles of 1 min at 94ºC; 1 min at 50ºC, and 2 min at 72ºC; and a final stage of 10 min at 72ºC.For the primers with two annealing temperatures, only the cycles were altered: 9 cycles at 94ºC for 20 seconds, 58ºC for 20 s, and 72ºC for 20 s; besides, 25 other cycles at 94ºC for 20 s, 60ºC for 20 s and 72ºC for 20 s.
The significance of the quantitative variables was first analyzed by F test, at 5% probability.As for the molecular data, the polymorphic information content (PIC) was calculated.
Later, the quantitative and qualitative variables were analyzed simultaneously, using the Ward-MLM to compose the access groups through the cluster and IML procedures of the SAS program (SAS Institute, 2000).For the Ward clustering method, the distance matrix was provided by Gower's algorithm (Gower, 1971).The definition of the ideal number of groups was performed according to the pseudo F and pseudo t 2 criteria (SAS Institute, 2000).
The boxplot analysis was carried out for the quantitative data to visualize the formed groups.Difference among groups, the correlation between the variables and the canonical variable were evaluated graphically using the Candisc procedure of the SAS program (SAS Institute, 2000).The distance for the distribution of the joint variables (quantitative and qualitative) proposed by Matusita (1955), adapted by Krzanowski (1983) and, later, by Franco et al. (1998), was used to determine the dissimilarity among the formed groups.

Results and Discussion
In the univariate analysis of the five quantitative variables used, a significant effect (p<0.01) was observed for all the characters, which implies the existence of genetic variability among the studied accessions.Plant cycle, weight of 100 seeds, number of seeds per pod and number of pods per plant showed environmental coefficient of variation (CV e ) below 10%, which is considered low, indicating that these characteristics are less affected by environmental variations (Table 2).For grain yield, the CV e was 12.24%, which is considered a medium value.In this context, the values achieved in the present work showed a good experimental accuracy, thus conferring reliability to the observed results.
In the commercial group classification, Black (50.88%) predominated, followed by Carioca (12.28%),Mulatinho (12.28%),Others (10.53%),Rosinha (8.77%) and Jalo (5.26%).Fonseca et al. (2007) evaluated 122 accessions of beans, collected in 14 municipalities of the mountain region and southern Espírito Santo, and verified the predominance of the Black group, with 40.16%, which shows the higher acceptance of this group by producers and consumers of that region.However, observing specifically the local varieties of the community Fortaleza, it was found a wide variety of commercial groups cultivation: 11, 7, 5, 4, 3 and 1 accessions of the groups Black, Mulatinho, Rosinha, Others, Jalo and Carioca, respectively.
The molecular characterization of the 57 accessions showed that 13 out of the 16 evaluated primers were  polymorphic, with a total of 29 polymorphic alleles.
The number of alleles per locus varied from 2 to 4, with an average of 2.23 (Table 3).The highest number of alleles was observed in the locus of the SSR BM141 primer, with 4 alleles.Similar number of alleles per locus were found in common beans by Campos et al. (2007), Hanai et al. (2007) and Kumar et al. (2009).The polymorphic information content (PIC) varied from 0 (BMd10, BM142 and BM199) to 0.51 (BM141), with 0.32 average (Table 2).Kumar et al. ( 2009) evaluated 115 accessions of common beans (70 Indian local varieties, 24 cultivars and 21 exotic varieties), using 17 SSR primers, and verified a PIC variation between 0 and 0.684, with 0.29 average, which agrees with the results achieved in the present work.
The Ward-MLM procedure determined that the ideal number of groups was five, according to the pseudo F and pseudo t 2 criteria, since these parameters maximum values were achieved at this point (Table 4).According to Mingoti (2007), the pseudo F and pseudo t 2 tests, which determine the ideal number of accessions groups in a cluster analysis, are similar to a hypothesis test in which each clustering step is related to a test to compare mean vectors of the two clusters jointed to form a new group.Nevertheless, focus should be given to larger values of pseudo F and pseudo t 2 , since they are related to the least probability of test significance and rejecting the equality of means with major significance.If the equality of mean vectors are rejected, two groups will not be united again in another different group.
The groups 1 and 2 were formed by 61.40% of the accessions (Table 4).The group 1 was composed of 17 accessions: 12 accessions from Embrapa Trigo, three commercial cultivars (Carioca,IAPAR 31 and IAPAR 44) and two varieties of the community Fortaleza.The group 2 was composed of 18 accessions: 13 from the community Fortaleza, four accessions from Embrapa Trigo, and one commercial cultivar (Serrano).These two groups revealed high similarity in their agronomic and morphological traits, and differences in higher magnitudes were observed in relation to weight of 100 seeds, grain yield and seed shape (Figure 1 and Table 5).As for the molecular data, a dissimilarity was observed in the bands between these groups, allowing their separation into different groups.Most accessions of the group 1 were from Embrapa Trigo, while varieties of community Fortaleza prevailed in group 2.
Group 3 was composed of 11 accessions: seven accessions from community Fortaleza, two commercial cultivars (Pérola and Carioca) and two accessions from Embrapa Trigo (Table 4).This group had the accessions with longer cycles (average of 76.09 days) and with the lowest yield (1,029.20 kg ha -1 average) (Figure 1).As for the commercial group, the type Carioca (five accessions) prevailed and was followed by Rosinha, Mulatinho, Black and Others, with 3, 1, 1 and 1 accessions, respectively (Table 5).Group 4 was composed of seven accessions of community Fortaleza, and only one accession from Embrapa Trigo, and was the group with the highest average values for 100seed weight (between 30.5 and 49.29 g per 100 grains) (Figure 1).This group also gathered only accessions that showed determined type I growth habit and oblong/long reniform seed shape (accessions BGF-7, BGF-24, BGF-25, BGF-26, BGF-27 and BGF-51).Group 5 comprised the accessions BGF-2, BGF-19 and BGF-46, which had higher number of pods per plant, with 15.23, 15.33, and 15.00, respectively (Figure 1).Accessions 3 and 51 were the most productive of all the evaluated accessions, with grain yield estimates of 2,654.69 and 2,768.15kg ha -1 , respectively.
An adequate separation of the accessions was noticed among the groups formed by simultaneously using the agronomic, morphological and molecular data with the Ward-MLM methodology.Ortiz et al. (2008) worked with 50 accessions of eight maize races from high altitude regions of Peru, with six agronomic descriptors, and concluded that the Ward-MLM procedure soundly classifies the accessions and may be an additional refinement and a complement for the racial classification based on the visual evaluation.Sudré et al. ( 2010), studied the genetic diversity of 56 Capsicum spp.accessions, using 26 morpho-agronomic descriptors, and verified that the Ward-MLM procedure  The two first canonical variables explained 87.86% of the variability between groups (Figure 2).This value indicates that the graphical representation of the two first canonical variables is appropriate for the visualization of the relationships among groups and among accessions inside the same group.The weight of 100 seeds showed the highest variable correlation with the first canonical variable, followed by number of pods per plant and number of seeds per pod, with values of -0.82, 0.73 and 0.71, respectively, while, for the second canonical variable, the highest correlation estimates occurred for crop cycle and grain yield, with 0.61 and -0.57, respectively.
Except for groups 1 and 2, all the others were separated by the graphical representation of the two first canonical variables (Figure 2).Groups 1 and 2 revealed high similarity in their morpho-agronomic traits (Figure 1 and Table 5).Group 4 was the farthest from the other groups, possibly because the accessions of this group are from the Andean gene pool, which has larger seeds, an important trait of this group (Figure 1).The distance between the groups agreed with the graphic of canonical variables, and groups 1 and 2 were the closest, with distance of 5.81, while group 4 was the farthest from the other groups (Table 6).

Conclusions
1.There is genetic variability in the 57 common bean accessions studied considering the morphological, agronomic and molecular traits.
2. The Ward-MLM statistical procedure is a useful technique to detect genetic divergence and to cluster genotypes using data originated simultaneously from morpho-agronomic, agronomic and molecular descriptors.

Figure 1 .
Figure 1.Boxplot of the minimum, maximum and medium values -25, 50 and 75 quartiles -of the descriptors plant cycle, number of seeds per pod, number of pods per plant, weight of 100 seeds and yield, for the five groups (G1-G5) formed by Ward-MLM strategy.

Table 1 .
Identification, procedence and commercial group of the 57 accessions of common beans from Universidade Federal do Espírito Santo gene bank (BGF-Ufes).

Table 2 .
g per 100 grains, respectively, and they are probably derived from the Andean gene pool, characterized by the predominance of cultivars with the "T" type phaseolin and larger seeds, while the Mesoamerican gene pool is characterized by Analysis of variance of the five traits evaluated in 57 accessions of common bean from the gene bank of Universidade Federal do Espírito Santo.

Table 3 .
Base sequence, annealing temperature (T m ), linkage group (GL), number of alleles (Al) and polymorphic information content (PIC) of the 16 SSR primers used.

Table 4 .
Clustering of the 57 accessions according to the Ward-MLM procedure.allowed of the separation of the species C. annuum, C. frutescens, C. baccatum and C. chinense into different groups.