Isolation of ( GA ) n Microsatellite Sequences and Description of a Predicted MADS-box Sequence Isolated from Common Bean ( Phaseolus vulgaris L . )

The isolation of (GA)n microsatellites using a highly microsatellite-enriched library is described for the first time in common bean (Phaseolus vulgaris L.). A relatively simple and effective method to isolate DNA repeats from microsatellite-enriched libraries based on hybridization-capture of repeat regions using biotin-conjugated oligonucleotids and non-radioactive colony hybridization was carried out. PCR products from 200 to 800 bp were obtained and cloned. Of the 60 clones sequenced, 21 yielded (GA)n microsatellites with n values equal or higher than six. These (GA)n microsatellite-containing loci could be useful for further genetic mapping studies. A (GA)n microsatellite linked to a putative MADS-box gene was identified. This sequence, which represents the first MADS-box locus described to date in common bean, showed a very high similarity with other known MADS-box sequences and was grouped within the AGL2 subfamily cluster of the Arabidopsis MADS-box genes. The vicinity of microsatellites to some genes is also discussed.


Introduction
Repetitive DNA covers up to 90% of the plant genome (Nagl et al., 1983).Tandemly repetitive DNA is classified into three major classes (Tautz and Renz, 1984): (i) satellite DNA which shows repeat units with a length of up to 300 base pairs (bp), (ii) minisatellite monomers comprised of 9-100 bp, and (iii) microsatellites or simple sequence repeats (SSR) which exhibit repeat units of 1-6 bp in length.SSRs occur ubiquitously and abundantly in eukaryotic genomes.Each microsatellite is usually located at a single locus with large variation in the number of repeats among individuals.Thus, microsatellite loci are often multi-allelic (Saghai-Maroof et al., 1994).In addition to high levels of polymorphism, SSR sequences possess most of the desirable attributes of molecular markers, including information content, unambiguous designation of alleles, neutral selectively (although they can be subjected to hitch-hiking effects), high reproducibility, codominance, and fast and easy assaying of genotypes and therefore microsatellite markers or SSR have proved to be very useful for cultivar identification, pedigree analysis and the evaluation of genetic distance between organisms (Priolli et al., 2002) and genetic mapping (Yu et al., 2000).
The most abundant microsatellite in several wellknown mammals is (AC) n (Beckmann and Weber, 1992), while in many plant species they are (AT) n or (AG) n (Wang et al., 1994).A high abundance of (GA) n microsatellites compared to other dinucleotid SSR has been observed in plant genomes such as Oryza, Aegilops, Arabidopsis or Brassica (Gupta and Varshney, 2000;Guyomarc et al., 2002;Suwabe et al., 2002;Uzunova and Ecke, 1999).Previous studies on plant (GA) n microsatellites also show that they are well-distributed throughout the genome, thus ensuring good genome coverage.In many cases, SSRcontaining sequences are part (in exons or introns) of, or linked to, some important genes of agronomic interest (see Yu et al., 2000).
The MADS box is a highly conserved sequence motif found in a family of transcriptional factors which play important roles in developmental processes and have been found in species from all the eukaryotic kingdoms.In plants MADS-box genes are scattered throughout the entire genome and encode a family of transcriptional factors which control diverse developmental processes ranging from root to flower and fruit development (Sommer et al., 1990;Theissen et al., 2000).In fact, many of the genes which direct flower development contain a MADS-box domain (Schwarz-Sommer et al., 1992).The MADS-box proteins of plants usually contain other domains, but the MADS domain is by far the most conserved region of these proteins, it is the major determinant of DNA binding, but it also performs dimerization and accessory factor-binding functions (Theissen et al., 2000).The MADS-box is highly conserved, about 56 amino acids long, and is usually found at the N-terminus of the protein (Penueli et al., 1991).The MADS-box gene family is also an important source of plant evolutionary data.For instance, Winter et al. (1999) proved that the gnetophytes are more related to conifers than to flowering plants by constructing phylogenic trees based on MADS-box gene-families, and Theissen et al. (2000) demonstrated that the phylogeny of MADS-box genes was strongly correlated with the origin and evolution of plant productive structures, such as ovules and flowers, by reviewing current knowledge on MADS-box genes in ferns, gymnosperms and different types of angiosperms.Finally, the MADS-box plays a role in the plant-microbial interaction at least in the nodule cells where its expression has been localized in infected alfalfa (Medicago sativa) root nodule cells (Heard and Dunn, 1995).So far, of more than two hundred plant MADS genes described only four have been described in legume species (two in Pisum sativum and two in Medicago sativa).Here we show the first MADS sequence described in Phaseolus to date, a sequence which is linked to an upstream (GA) n repeat.

Microsatellite isolation
DNA was extracted from trifoliate leaves of 15-dayold Phaseolus vulgaris L. seedlings, using the method described by Vallejos et al. (1992) with minor modifications.Ten µg of common bean genomic DNA were digested with five U/µg RsaI restriction enzyme (GTAC target site).The size of the DNA fragment obtained was checked by agarose gel electrophoresis.Blunt-end DNA fragments were ligated by T4 DNA ligase (Promega, Madison, USA) to MluI self-complementary adaptors (10 µmol) RSA21 5-CTCTTGCTTACGCGTGGACTA-3 and RSA25 5-AGTCCACGCGTAAGCAAGAGCACA-3 according to Edwards et al. (1996).The ligation was checked by PCR amplification.Five ng of ligated DNA were amplified in a final volume of 25 µL with 1 µL of 10 µmol RSA21 primer in a buffer containing 10 mM TrisHCl, pH 8, 100 mM KCl, 0.05% w/v gelatin and 1.5 mM MgCl 2 , using the following PCR program: denaturation at 95 °C for 1 min and 28 cycles of 94 °C for 40 s, 60 °C for 60 s, 72 °C for 120 s.The size of the amplified ligated fragments was checked by agarose gel electrophoresis and the rest were purified in anion exchange micro columns (GIBCO-BRL).Microsatellite sequences were selected using biotin-labeled microsatellite oligoprobe and streptavidin-coated magnetic beads, following the hybridization based capture methodology adapted from Kijas et al. (1994) and Billote et al. (1999).Magnetic bead-based selection was carried out us-ing the Magnetosphere Magnetic Separation Product Kit (Promega Madison, USA).The oligoprobe used in this experiment was 5-I * IIIITCTCTCTCTCTCTCTC-3 with the inosine at 5 biotynilated.Purified PCR products were denatured at 95 °C for 15 min before adding three µL of 50 µM biotynilated oligoprobe and 13 µL of 20XSSC.Hybridization was carried out for 20 min at room temperature.Six hundred µg of pre-washed streptavidin-coated magnetic beads were used following the manufacturer's instructions.Aliquots of the resulting solution after the last elution were used as templates for a second PCR round following the same above-mentioned conditions.The purified PCR products were cloned into pGEM-T plasmid (Promega, Madison, USA) following the manufacturer's instructions and the plasmid was used to transform a DH5α competent E. coli strain.About 1,000 colonies were blotted to positive charged nylon membranes (Boehringer Mannheim) for hybridization with a digoxigenin labeled (GA) 10 probe.The colony hybridization and washing temperatures were carried out at 37 °C and at room temperature, respectively.Other conditions followed the standard protocols.Chemiluminescent detection was carried out using CSPD ® (Boehringer Mannheim).Positive clones were cultured in a liquid LB medium, the plasmids were extracted, and the inserts sequenced by using the dideoxynucleotide chain termination method.Universal and reverse primers, the Thermo Sequenase Fluorescent Labelled Primer Cycle Sequence Kit (Amersham Pharmacia Biotech, N.J.) following the manufacturer's instructions, and automatic sequencing, ALF TM Manager v. 2.6 (Amersham Pharmacia Biotech, N.J.) were used for DNA sequencing.About 60 colonies were tested; the sequences which contained interesting information were repeated at least three times.

Amplification of microsatellite (SSR) sequences from genomic DNA
In order to check the amplification of SSR-containing sequences from common bean genomic DNA, specific primer pairs were designed from sequences flanking SSRs using the OLIGOTEST V 2.0 program (Beroud et al., 1990).Pairs of primers (length 18-24) were designated with annealing temperatures ranging from 50 °C to 60 °C, and a variance limited to 4 °C.PCRs were carried out using 50 ng of DNA in a final volume of 25 µL with 10 pmol of each primer in a buffer containing 10 mM TrisHCl, pH 8, 100 mM KCl, 0.05% w/v gelatin and 1.5 mM MgCl 2 , and 1.5 units of Taq DNA polymerase (Promega, Madison, USA).DNA was extracted from trifoliate leaves of two weeks old "Jules" plants.The following PCR program was used: denaturation at 95 °C for 1 min and 36 cycles of 94 °C for 40 s, the specific annealing temperature for 40 s, 72 °C for 60 s and a final extension temperature of 72 °C for 10 min.The annealing temperature used for each specific PCR is shown in Table 1.

MADS-box sequence analysis
MADS-box comparative sequence analysis was carried out with the CLUSTAL W method (Thompson et al., 1994) using the standard parameters suggested in the program.The DNA Maximum Likelihood (Dnaml) (Felsens-tein, 1993) method was used to construct the phylogenetic tree.Database Searching was carried out using Blast web page http://www.ncbi.nlm.nih.gov/blastfrom the National Library of Science, USA.Some information was also obtained through the MADS-box home page: http://www.Core motifs are considered only when they are tandemly repeated at least three times within a sequence.

GA microsatellite and MADS sequence in beans 339
2 Motifs are listed depending on their arrangement within each sequence. 3 The highest repetitions from each microsatellite type are only mentioned. 4 Complex motifs may represent long imperfect and multiple repetitions DNA (consult GeneBank database).
mpiz-koeln.mpg.de/madsfrom the "Max-Planck-Institut für Züchtungsforschung", Köln, Germany.SSRs were screened within the sequence using the Simple Sequence Repeat Identification Tool (SSRIT), which is available on the web page address http://ars-genome.cornell.edu/rice/tools.htmlfrom the Rice Genome Data Base of Cornell University, USA.

Results and Discussion
Characteristic of (GA) n microsatellites of common bean Small DNA fragments (200-800 bp) were obtained by means of the PCR amplification.These fragments were cloned and the positive clones were easily identified using the colony hybridization method.A total of 21 out of 60 sequenced inserts contained (GA) n motifs of different sizes, arrangements and types (Table 1).Dimeric, trimeric, tetrameric tandem nucleotide repeat motifs were identified.A combination of several types of microsatellites (SSR) was noted within a sequence, which is a common feature in microsatellites isolated for other different plant species (Wang et al., 1994).This is probably due to the fact that microsatellites are highly mutable.Some sequences contained a relatively simple arrangement of microsatellites (e.g.14M1 which contains a (GA) 19 motif ).However, other sequences had motifs that were distributed in complex and multiple arrays (e.g.210M3) (Table 1).Several of these complex motifs were imperfect repetitions of the basic mo-tif and were most likely generated by microsatellite instability.
The majority of SSR-containing sequences (17 out of 21) were amplified from common bean genomic DNA (Figure 1) using primer combinations indicated in Table 1.For the other four sequences no product was obtained or non-specific products were observed at lower annealing temperatures.

Description of a MADS-box sequence
A DNA sequence 350 bp long, which included the self-complementary adapters on both sides, a GA microsatellite, and a MADS-box sequence, was isolated.The microsatellite included two perfect GA motifs, (GA) 10 and (GA) 7 , which are part of an imperfect (GA) 26 sequence.The current sequence differed from the (GA) 26 by the addition of an A and three base substitutions (Figure 2).Upstream from this GA region was a sequence rich in A + T pairs (65%).The methionine codon of the MADS-box was immediately downstream from the GA microsatellite sequence (Figure 2).Analysis of the deduced amino acid sequence and a search for the available protein sequence in data bases revealed highly significant similarity with other MADS-box peptide sequences, i.e., it was identical to DEFH72 isolated from Antirrhinum majus, NSMADS3 from Nicotiana sylvestris, AGL9 from Arabidopsis thaliana, and MDMADS1 from Malus domestica (Figure 3).
known MADS sequence motifs showed that PVMADS is related to the AGL2 group of the MADS-box gene family (Figure 4).The Agl2 family of MADS-box genes are normally involved in floral growth and development (Theissen et al., 2000).The highest similarity of the MADS-box sequence and the linked region was with the DEFH72 locus of Antirrhinum majus, which is also linked to a shorter microsatellite motif in its upstream direction (i.e. an imperfect (GA) 11 motif).The MADS sequence here is the first reported in Phaseolus, but it is very likely that there are additional MADS-box genes distributed throughout the bean genome since plant species have a large number of MADS-box genes, for instance, in Arabidopsis the MADS-box gene family consists of at least 28 different genes, and in maize at least 50 different MADS-box genes are dispersed in its genome (Fischer et al., 1995).
Vicinity of satellite elements to some important genes This Phaseolus putative MADS-box gene is located downstream from an imperfect (GA) 21 microsatellite (Figure 2).The proximity of microsatellite elements to some important genes in common bean has been reported (Yu et al., 2000) as well as in other plant species.For example, Li et al. (2000), reported a polymorphic microsatellite linked to the dextrinase gene of barley, and a satellite sequence is linked to the rDNA of lentil (Fernández, 2002) and common bean (Falquet et al., 1997).Furthermore, several important plant resistance genes are linked to SSR or Inter Simple Sequence Repeats (ISSR) (e.g., Yu et al., 1996).Thus, the linkage of microsatellite motifs to some important genes is relevant for several genetically applied studies.Microsatellites are ideal markers.However, the development of microsatellite markers is expensive and time consuming.In our study, we show a new microsatellite set in Phaseolus that can be useful for gene mapping and other purposes.

Figure 2 -
Figure 2 -Nucleotide and deduced amino acid sequences of a putative MADS-box sequence of Phaseolus vulgaris (PVMADS).The imperfect microsatellite motif (GA) 26, contiguous to the N-terminal of MADSsequence, is in italics

Figure 4 -
Figure 4 -An un-rooted phylogenetic tree of the MADS-box DNA sequences of Arabidopsis and the common bean sequence (PVMADS) constructed using the Kimura two-parameters.Sequence members of the AGL2 subfamily are in bold.Bar indicates distance.

Table 1 -
Description of the 21 microsatellite clones isolated from common bean, Phaseolus vulgaris L., and microsatellite amplification conditions.