Identification of common bean ( Phaseolus vulgaris ) duplicates using agromorphological and molecular data

We used agromorphological and random amplified polymorphic DNA (RAPD) molecular marker data to identify duplicate common bean (Phaseolus vulgaris L. Fabaceae) accessions in the Common Bean Germplasm Bank of the Agronomical Institute IAC (Banco de Germoplasma de feijoeiro do Instituto Agronômico de Campinas (IAC), SP, Brazil). A total of 116 accessions with the same names and similar agromorphological traits was analyzed. The divergence between the accessions was initially evaluated by means of the agromorphological descriptors using single linkage clustering, from the Euclidean distance. Multivariate analysis identified four duplicate accessions (Carioca Lustroso, Bico de Ouro, Jamapa and Preto), with 17 other same-name accessions being suspected duplicates due to their low divergence levels. Accessions with low genetic distance values (indicating that they were duplicates) were further compared using RAPD markers which confirmed the results of the multivariate analyses in relation to the four duplicate accessions, although only two of the other 17 suspect accessions were confirmed to be duplicates, in this case of accessions IAPAR 57 and Sacavem. These results show that the combined use of agromorphological and molecular information allowed a better characterization of the acessions in the common bean Germplasm Bank.


Introduction
Germplasm banks are places where samples of genotypes, improved varieties, landraces, wild species and species related to a determined species of interest (generically denominated accessions) are stored under adequate conditions (Zimmerman et al., 1996).The characterization of accessions allows quantification and structuring of the genetic variability in the germplasm which is highly important for improvement programs and for the conservation and preservation of genetic diversity.To characterize a germplasm basically means to identify and describe differences between the accessions.Besides the information on the origin of the material (passport data) differences related to the agricultural performance of the accessions are nor-mally also considered (e.g.yield, growth and flowering habit, pathogen and pest responses) as well as botanical differences related to normally taxon-specific descriptors (Vanderborght, 1988).
The Common Bean Germplasm Bank of the Agronomical Institute -IAC (IAC-BGF) (Banco de Germoplasma de Feijoeiro do Instituto Agronômico -IAC, Campinas, SP, Brazil) holds 1500 accessions representing the two principal centers of origin (Andean and Mesoamerican) of the species as well as ecotypes from different south-American countries and a large number of lines from both Brazilian and international genetic improvement programs, most of which were obtained by germplasm interchange.Although germplasm interchange is advantageous because it permits the amplification of the variability of traits of agricultural interest, almost inevitably, it leads to the appearance of duplicates (repeated accessions) and consequently increases the costs of germplasm maintenance and characterization.On the other hand, accessions with the same name do not always correspond to the same genotypes and in such cases the putative duplicate could be erroneously discarded.
Multivariate techniques such as Euclidean distance, Mahalanobis' generalized distance and clustering analyses techniques such as single linkage allow the identification of duplicates.These analyses are generally carried out using agromorphological (phenotypic) data to produce dissimilarity matrixes for the accessions, which are subsequently used for cluster analyses in order to identify and group the accessions by their similarity.Another multivariate technique is principal component analyses which uses aims at visualizing the proximity of accessions and any links between the analyzed descriptors.However, Tatieni et al. (1996) argue that the phenotypic traits traditionally used for the characterization and estimation of genetic divergence may be of limited importance because they are generally influenced by the environment and the developmental stage of the plant.Isoenzyme and DNA markers which are little influenced by the environment are therefore more adequate for germplasm characterizations.Singh et al. (1991a) suggest that agromorphological, biochemical and molecular data should be combined for diversity studies since this combination offers complementary results.
Based on these premises, the objective of the present study was to identify duplicate common bean accessions in the IAC-BGF and to verify if accessions with the same name correspond to the same genotype.Multivariate analysis was applied to agromorphological and random amplified polymorphic DNA (RAPD) data and principal component analyses used to quantify the variation and identify redundant agromorphological descriptors.

Evaluated accessions
The IAC-BGF contains 1500 active accessions, 116 (Table 1) of which were selected for duplicate identification based on the fact that they had similar names and agromorphological descriptors (Table 2).The selected accessions were sown in 2002 and 2003 at the Regional Center of the University of Espírito Santo do Pinhal (Centro Regional Universitário de Espírito Santo do Pinhal -CREUPI), Espírito Santo of Pinhal, São Paulo state, Brazil.For seed propagation and descriptor evaluation the experimental field consisted of 3 meter rows spaced 0.5 meters apart without replications but with a row of Phaseolus vulgaris cultivar (cv) IAC-Carioca Eté sown between every 10 accessions as a local control.

Evaluation of the agromorphological descriptors
Seventeen agromorphological descriptors that refer to the plant, pod, and seed data and the reaction to the anthracnose pathogen (Colletotrichum lindemuthianum) were evaluated.The reaction to anthracnose was evaluated in the laboratory by inoculating the 116 accessions under conditions of controlled temperature and humidity with the three principal C. lindemuthianum physiological races (race 31, 65 and 89) most frequently occurring in São Paulo state, these isolates being provided by the IAC Plant Health Research and Development Center (Centro de Pesquisa e Desenvolvimento de Fitossanidade do Instituto Agronômico -IAC).

Statistical analyses
Univariate analysis was performed to quantify the existing variability for each descriptor by means of the phenotypic variation coefficient.After univariate analysis we performed multivariate analysis on the data, including Euclidean distance, principal component analysis, and the clustering of the accessions by the single linkage method were used for the quantification of the genetic divergence among the accessions.Because different measurement scales were used for the various descriptors the data were standardized before the analyses.Thus, the original mean (X ij ) obtained for descriptor j of accession i was divided by the standard deviation of the corresponding descriptor j, leading to a reduced mean (Z ij ) of the unitary variance., 11, 14, 19, 20, 26, 46, 65, 74, 75, 105, 106 Rubi (2) 107, 116 2,5,8,10,12,16,23,24,35,42,52,53,56,61,62,63,64,66,67,68,73,77,94,95 Tupi (2) 33, 57 Turrialba (2) 21, 70 Principal component analysis was used to visualize the variation and for the identification of the less important descriptors, the criterion used to statistically identify the redundant descriptors being based on Jolliffe (1973) and completed using the procedure used by Dias et al. (1997).The descriptors associated with the largest loading coefficients in the eigenvectors of the last principal components (i.e.those responsible for the smallest part of the total variance) were taken into consideration for the disposal.The divergence between accessions was evaluated using a Euclidean distance dissimilarity matrix, the single-linkage clustering method being applied to this distance matrix to produce a dendrogram showing the relative position of the accession groups.Possible duplicates were located when the dissimilarity between accessions was equal to zero.All multivariate analyses were processed using the 'Genes' software (Cruz, 2001).
Some accessions with similar names and agromorphological descriptors presented low levels of genetic dissimilarity, casting doubts on the possibility that they were duplicates.Analyses with RAPD molecular markers were used to confirm whether-or-not they were duplicates.

Molecular analyses
Suspected duplicate accessions were sown in pots in a greenhouse.Young leaves were collected for the extraction of genomic DNA according to the protocol of Rafalski et al. (1996).Samples of DNA of each accession were RAPD amplified in a total volume of 15 mL containing 0.4 mM primer, 10 mM Tris-GHl (pH 8.3), 50 mM KCl, 2 mM MgCL 2 , 4 ng DNA, 100 mM of each deoxyribonucleotide (dNTPs) and 1.0 unit of Taq polymerase (Invitrogen).The amplifications were conducted in a PTC-100 thermocycler (MJ Research) programmed for 95 °C during 4 min, followed by 45 cycles of 1 min at 95 °C, 1 min at 35 °C and 1.5 min at 72 °C, plus an extension of 72 °C during 7 min.Fifteen different Operon Technologies primers were used (AS13, A18, AZ20, AB03, F10, H18, H20, AL09, W13, C08, L4, B3, J01, I16 and G08) and triplicated to confirm the results.Fragments were separated by agarose gel (1.5%) electrophoresis, stained with ethidium bromide and visualized under ultra violet light.

Results
The most variable descriptors based on the coefficient of variation were those of the reaction to the anthracnose races 31, 65 and 89, pod profile and seed gloss (Table 2).In the same multivariate context these descriptors were responsible for 73.38% of the total variation in the accessions.
Principal component analysis showed that the mass of a thousand seeds was a redundant descriptor and that of the 17 principal components the first two components were responsible for 33% of the variation, the first three compo-nents for 42.75% and the first four components for 50.52%, the results showing that the first eight components accounted for 70% of the variation (Table 3).The traits responsible for separation along the first principal component were the number of seeds in a pod, seed profile and reaction to anthracnose race 65.Traits affecting separation along the second principal component were pod width, seed shape and reaction to anthracnose race 89.Traits affecting the third principal component were growth habit and seed gloss.
Although the Sacavem (67) and Huasano (76) accessions were very close in terms of genetic distance they were not considered duplicates because their seeds had different colors.
Homonymous accessions showing a high degree of similarity but not always identified as duplicates by agromorphological descriptor cluster analysis were subjected to RAPD analysis.Together with these accessions, another with the name Jamapa (82) was included which did not group with its homonymous accessions (Jamapa 55 and 87), identified in Figure 1.Cultivar Jamapa is known worldwide and is used a lot in genetic improvement programs which led to its inclusion in the study with markers.The RAPD analyses showed that only H18, F10, AZ20, W13, G08, AL09 and AS13 were polymorphic in relation to the 15 primers tested (Figure 2).The RAPD markers confirmed the results of the analysis based on agromorphological data in which four clusters were formed in which there was total similarity among the accessions, i.e. accessions within the same cluster were genetically identical (Figures 2.1,2.2,2.3 for accessions Jamapa 55 and 87 and 2.4).Genetic identity was confirmed in only two of the suspected duplicate accessions (IAPAR -57 and Sacavem, Figures 2.5 and 2.6).Differences in RAPD banding patterns being found for the other accessions (Figures 2.3 for accession Jamapa 82, 2.7, 2.8, 2.9, 2.10, 2.11 and 2.12).

Discussion
In the IAC common bean genetic improvement program any cultivar recommended for planting must present resistance to the C. lindemuthianum anthracnose races 31, 65 and 89 due to the fact that these pathogen races predominate in São Paulo state (Carbonell et al., 1999).The relative contribution of the three anthracnose race descriptors to accession divergence was 61.88%, indicating that the common bean accessions in the IAC active germplasm bank showed a high level of variability in their response to these three pathogen races.It is interesting to note that the accessions analyzed in our study were chosen because they had similar names and agromorphological descriptors and represented only 7.7% of the total of accessions of the active IAC common bean germplasm bank, indicating that the variability would be even higher if all the accessions in the germplasm bank had been included in the study.
The mass of a thousand seeds descriptor was discarded due to the fact that the accessions analyzed were fairly uniformity of in relation to this trait with the majority (75%) of accessions showing a mean mass of 200 to 250 g per thousand seeds.This descriptor also showed only moderate correlation (r = 0.58**) with seed profile and was considered redundant and disposed of.Other disposal simulations were performed but principal component analysis did not reveal any other redundant descriptors.Fonseca and Silva (1999) obtained different results during the identification of duplicates Common Bean Germplasm Bank of the Embrapa, were weight of 100 seeds descriptor was the main responsible for divergence between the common bean groups Amendoim, Bagagô and Chita Fina, where it was responsible for 47% of the total variation.
The percentage of variation obtained in the first principal components was similar to that obtained in the evaluation of 306 common bean accessions in the International Center for Tropical Agriculture (CIAT, Centro Internacional de Agricultura Tropical) Germplasm Bank by Singh et al. (1991b), who found that 43% of the variation was made up of the first three components.Castineiras (1990) evaluated 60 common bean cultivars using 34 quantitative and qualitative descriptors and found that the first three components were responsible for 37% of the total variation and that 23 of the 34 descriptors were disposable.Rodrigues et al. (2002) showed that after discarding 25 of the 40 descriptors evaluated in 37 common bean cultivars 69% of the total of variation was made up of the first four components.Chiorato et al. (2005) evaluated 993 common bean accessions belonging to the IAC Germplasm Bank and found that 33% of the variation occurred in the first two components and concluded that only 18 of the 23 agromorphological descriptors were needed for evaluation of the accessions.Results in literature suggest that 10 to 20 descriptors should be considered for the characterization of common bean germplasm banks because the use of higher numbers is costly and adds little the analysis.However, it 108 Chiorato et al. should be remembered that the most informative and variable descriptors are not always the same for all the acces-sions in a germplasm bank.Another important point is the lack of association between common bean descriptors, Identification of common bean duplicates using agromorphological and molecular data 109 which means that disposal of the redundant descriptors does not result in a significant increase in the percentage of variation allocated to the first principal components.
Our multivariate analysis based on agromorphological descriptors detected four duplicate accessions but left the status of 17 other accessions in doubt, out of which RAPD analysis identified two duplicate accessions (Figures 2.5 and 2.6).Only in relation to the data of width and mean number of seeds per pod (data not shown) the dissimilarity observed among these accessions in the multivariate analyses was small (Figure 1).These quantitative traits being subject to environmental influences (including sowing accessions different cropping seasons) which may account for the small variations seen.
It is possible that the 15 phenotypically similar accessions which had similar names and which could not be separated either by agromorphological descriptors or RAPD analysis had been mistakenly indexed when stored, perhaps due to clerical errors when the accessions were received from other institutions during interchange.Another possibility is that recombination with other accessions occurred during seed propagation and that these segregation were not noticed.
Our data suggests that for the following accessions one of the homonymous genotypes could be discarded with no risks of the germplasm bank losing genetic variability: IAPAR-57 accessions 96 and 113; Carioca Lustroso 104 and 108; Bico de Ouro 90 and 93; Jamapa 55 and 87; Preto 81 and 88; and Sacavem 52 and 53.It also seems that  Jamapa accession 82 should be discarded due to the differences found in the banding-patterns in relation to Jamapa accessions 55 and 87.
For the homonymous accessions identified by molecular and multivariate analyses as not being duplicates (103 acessions which had 15 similar names) we propose the elaboration of new acronyms as a form of enhancing the identification of the genotypes (accessions) in the active IAC Common Bean Germplasm Bank.
Our results show that the combined use of agromorphological information and molecular data allowed the improved characterization of the accessions in the IAC Common Bean Germplasm Bank and that this approach could be applied to other plant germplasm collections.

Figure 1 -
Figure 1 -Single linkage dendrogram based on 17 agromorphological descriptors for 116 common bean accessions from the IAC Germplasm Bank.(G) Accessions considered duplicates (I).Accessions suspected of being duplicates

Figure 2 -
Figure 2 -Random amplified polymorphic DNA (RAPD) molecular marker agarose gel for the genetic differentiation of duplicate accessions in the active IAC Common Bean Germplasm Bank.The accession numbers correspond to those presented in Figure 1.

Table 1 -
Common bean accessions evaluated to identify duplicates due to similarities in the names and agromorphological descriptors.

Table 2 -
Relative contribution (%) of the descriptors to the multivariate divergence, loading coefficients (LC) of the eigenvectors of the last principal component, and coefficient of phenotypic variation (CPV) of each descriptor.
*Greatest loading coefficient in the last component, indicating a redundancy of the descriptor associated to this component.

Table 3 -
Eigenvalues associated to the principal components, their relative and accumulated (%) variances, referring to 17 agromorphological descriptors evaluated in 116 common bean accessions of the IAC Germplasm Bank.