Application of a double-enrichment procedure for microsatellite isolation and the use of tailed primers for high throughput genotyping

The number of microsatellite loci and their allelic diversity contribute to increase accuracy and informativity of genetic estimates, however, the isolation of microsatellite loci is not only laborious but also quite expensive. We used (GATA)n and (GACA)n tetranucleotide probes and singleand double-enrichment hybridization to construct and screen a genomic library with an increased proportion of DNA fragments containing repeat motifs. Repeats were found using both types of hybridization but the double-enrichment procedure recovered sequences of which 100% contained (GATA)n and (GACA)n motifs. Microsatellite loci primers were then designed with an M13R-tail or CAG-tag to produce scorable PCR products with minimal stutter. The approach used in this study suggests that double-enrichment is a worthwhile strategy when isolating repeat motifs from eukaryotic genomes. Moreover, the use of tailed microsatellite primers provides increased resolution for compound microsatellite loci, with a significant decrease in costs.

Microsatellites, also known as simple sequence repeats (SSRs), are present throughout the eukaryotic genome, often at high concentrations. They are useful markers for a wide range of analyses because in many taxa they show high levels of intraspecific genetic variability. These markers are also easily detectable by the polymerase chain reaction (PCR), producing highly reproducible results when compared with other molecular markers (Tautz et al., 1986;Wright & Bentzen, 1994;O'Connell & Wright, 1997). Applications of microsatellite markers include biomedical diagnosis of diseases (Girardet et al., 2005), genome mapping (Park et al., 2005), parentage analysis and relatedness of individuals or groups (Porta et al., 2006), assessing demographic history (Jacobsen et al., 2005) and examining the genetic structure of subpopulations and populations (Diniz et al., 2004). The information revealed by the genetic data obtained from these markers is a function of the number of loci and allelic diversity, both of which contribute interactively to increase the accuracy and suc-cess of genetic estimates and consequently the reliability of inferences (Brookfield & Parkin, 1993;Chakraborty & Jin, 1993;Blouin et al., 1996). However, the isolation of microsatellite loci is not only demanding but also expensive because of the need for numerous polymorphic microsatellites and, consequently, specific fluorescent primers. Many techniques have been applied (Zane et al., 2002;Fujishima-Kanaya et al., 2003;Waldbieser et al., 2003;Chen et al., 2005) but there is no guarantee that a particular method will result in a large number of loci being identified. In addition, current mainstream analysis methods involve the use of fluorescently labeled primers which are expensive, especially when large numbers of loci are required.
In an effort to increase the proportion of genomic DNA fragments containing repeat motifs a simple and inexpensive methodology is described and optimized for the construction of an enriched genomic library. Moreover, one of the primers from every pair designed for the microsatellite flanking sequences were M13-tailed or CAGtagged to produce microsatellite primer sets that were easier to score and did not require expensive fluorescent labeling. Genetics and Molecular Biology, 30, 2, 380-384 (2007) Copyright by the Brazilian Society of Genetics. Printed in Brazil www.sbg.org.br The spiny lobster Panulirus argus was used as a model organism in this study. Specimens were collected along the Brazilian coastline between latitudes 03°43' and 03°54' South. Samples were taken from the fresh tissue of the walking legs of the lobsters and immediately preserved in 20% (v/v) dimethyl sulfoxide (DMSO) in saturated sodium chloride solution (Dawson et al., 1998). Genomic DNA was extracted from the tissue using the phenol/chloroform-isoamyl alcohol protocol described by Sambrook et al. (1989).
Two genomic libraries enriched for tetranucleotide repeats were prepared as outlined in Hamilton et al. (1999) and McPherson et al. (2001), with minor modifications (Diniz et al., 2005). The first library was single-enriched for (GACA) 4 and (GATA) 7 . The second library was constructed using an additional enrichment after the first hybridization capture (biotin/streptavidin) technique (Kijas et al., 1994) with the previously used tetranucleotide probes. Genomic DNA (~200 ng/μL) was digested separately with HincII, RsaI, BstUI and HaeIII at 37°C overnight. Digests were recovered using PCR Purification columns (Qiagen) according to manufacturer's instructions, and the terminal ends dephosphorylated using calf intestinal phosphatase. The digested-dephosphorylated genomic DNA was cleaned of all modifying enzymes and then ligated to doublestranded linkers, termed SNX (StuI, NheI and XmnI) for the restriction sites they contain or form when dimerisation occurs (SNX-F: 5'-CTAAGGCCTTGCTAGCAGAAGC-3' and SNX-R: 5'-pGCTTCTGCTAGCAAGGCCTTAGAA AA-3'). The ligation reaction used the enzyme T 4 DNA ligase in the presence of XmnI to prevent blunt-ended dimers between complementary linkers. Linker-ligated fragments were PCR amplified with the SNX-F linker (Hamilton et al., 1999) as primer. Amplifications were carried out in a 50 μL reaction volume containing 20-100 ng DNA, 1 x Thermopol® buffer supplemented with 1.5 mM MgCl, 50 μM of each dNTP, 0.5 U Taq DNA polymerase, and 0.3 to 0.5 μM of SNX-F linker. The PCR amplifications were carried out on a MJ Research DNA Engine Tetrad PTC-225 thermalcycler using the following protocol: 95°C for 5 min, followed by 20 cycles (95°C for 45 s, 62°C for 1 min and 72°C for 2 min) and then 72°C for 30 min. Unless otherwise stated, all materials came from New England Biolabs, USA.
The PCR products were used for subtractive hybridization with biotinylated tetranucleotide probes (GACA) 4 and (GATA) 7 (Operon Technologies) bound to magnetic beads (Dynal Biotech Inc.). Following hybridization, the beads were washed with saline sodium citrate buffer in the presence of 0.5 ng μL -1 of SNX-F primer. After the final wash, 30 μL of Tris-EDTA buffer (10 mM Tris, 0.1 mM EDTA, pH 8.0) was added to the beads, and the mixture was incubated at 98°C for 15 min to release the DNA from the probes. The enriched-DNA recovered from the beads was amplified again with the SNX-F linker to generate double stranded DNA using the PCR conditions described above. Amplified-enriched DNA was cleaned with purification columns (Qiagen) and a second round of hybridization was performed using a small fraction of the postenrichment amplified inserts and the same tetranucleotide probes and hybridization conditions described above. Double-enriched DNA was recovered from the beads and PCR amplified once again under the same thermal profile as first described, except for variation in the number of cycles, between 10 and 25.
Amplified-enriched DNA from the single-and double-enrichment procedures was ligated into the pDrive cloning vector (Qiagen PCR cloning kit) and used to transform OneShot® competent Escherichia coli (Invitrogen) which were plated onto Lauria-Bertani agar (LB; Difco) supplemented with 50 μg mL -1 ampicillin. Plasmid inserts were amplified directly from the transformed E. coli colonies (PCR screening) using primers (M13-F: 5'-GGAA ACAGCTATGACCATG-3' and M13-R: 5'-GTAAAAC GACGCCAGTG-3') that flank the cloning site and sizefractionated on 1.5% (w/v) agarose gels. Recombinant E. coli clones giving a positive signal for the insertion of the P. argus 500-1000 base-pair (bp) DNA sequence were incubated overnight on LB-ampicillin and the plasmid DNA purified and extracted using the Qiaprep® spin miniprep kit (Qiagen). Each clone was cycle-sequenced (96°C for 3 min, followed by 40 cycles of 96°C for 20 s, 52°C for 20 s and 60°C for 4 min) in one direction using the M13-F primer and the CEQ DTCS -Quick Start kit (Beckman Coulter Inc.) and sequenced on a CEQ 8000XL DNA Analysis System (Beckman Coulter Inc.). Confirmed positives were further sequenced for the opposite strand using the SP6 (5'-CATACGATTTAGGTGACACTATAG-3') and M13-R universal primers.
Microsatellite loci primers were designed on the unique flanking regions of each locus using PRIMER 3 (Rozen & Skaletsky, 2000) and an oligonucleotide tail corresponding to the 5'-GGAAACAGCTATGACCAT-3' M13-R universal primer (Oetting et al., 1995;Boutin-Ganache et al., 2001) or the 5'-CAGTCGGGCGTCATC A-3' CAG tag (Hauswaldt & Glenn, 2003) was added to the 5' end of one primer of each pair to facilitate fluorescent labeling of the PCR products during amplification. This approach allowed the amplification with three primers, a tailed STR primer conjugate (forward -F T or reverse -R T ), a non-tailed STR primer (F or R) and the labeled primer containing either hexachloro-6-carboxyfluorescein (HEX) or 6-carboxyfluorescein (FAM) fluorescent dye (T L ). The labeled primer was the M13R or CAG tag oligonucleotide and therefore could be used with any primer containing the same sequence as the tail. Selection of each tail -primer combination was made using the computer program GENRUNNER v 3.02 (Hastings Software, USA).
HaeIII-digested samples produced mainly high (> 3000 bp) molecular weight fragments with no specific banding pattern and were subjected to further experiments. Minimum over-amplification of ligated DNA and repeatenriched fragments was warranted when PCR was carried out with 15 cycles.
The libraries enriched for tetranucleotide repeat motifs showed high levels of microsatellite enrichment (Figure 1). Each enriched library had 32 provisional positive clones sequenced. The single-enriched library showed very similar numbers of sequences containing no repeat motifs and sequences recovered in duplicate, except for the (GATA) 7 library, which had four times more sequences containing no repeats compared to the library probed with (GATA) 4 ( Figure 1A). Moreover, this more commonly used sequence enrichment resulted in three sequences being unsuitable for primer design because they either had short or no available flanking regions ( Figure 1A). The mean percentage of sequences with repeat motifs from single-enriched libraries was high (67%), ranging from about 9 microsatellite sequences (55%) for (GACA) 4 to about 13 microsatellite sequences (80%) for (GATA) 7 libraries (Figure 1B). The overall high success rate of the single-enrichment protocol employed here was probably due to the presence of large numbers of these tetranucleotide repeats within the P. argus genome coupled with the effectiveness of using the hybridization capture (biotin/ streptavidin) enrichment technique (Kijas et al., 1994). Double-enrichment produced libraries in which all of the sequences contained (GACA) n and (GATA) n motifs (16 sequences each). Out of the 32 sequences isolated after double-enrichment, six had limited flanking sequence for primer design or were second copy clones. The high frequency of positive clones found in the recombinant colonies also suggests that PCR screening for fragments of appropriate length (500-1000 bp inserts) considerably increases the chance of detecting clones with repeat arrays and with longer flanking regions for primer design.
Microsatellite sequences from both procedures were characterized by length and type according to Weber (1990) (Figure 1B). Most tandem arrays recovered from the single enrichment procedure were compound (11 sequences) or imperfect repeats (3 sequences). This was even more noticeable in the double-enriched library, where 21 microsatellite sequences were compound repeats and 7 sequences were imperfect repeats. Compound microsatellites were frequently associated with dinucleotides (AC/TG) n in the case of the GACA-enriched library and with (GA/CT) n in the case of the GATA-enriched library. The relative frequency of perfect repeat sequences was higher in singleenrichment, especially for the GATA repeat motif. Microsatellite sequences have been deposited in GenBank under accession numbers AY536335 to AY536353 and further characterized (including the primer sequences) by Diniz et al. (2004Diniz et al. ( , 2005. Genotyping 32 P. argus samples at all the microsatellite loci isolated in this study showed clear bands with no shadow bands and minimal stutter ( Figure 2). Shadow or stutter bands, occurring during PCR due to replication slippage are not a common difficulty in scoring tetranucleotide loci, but may be a problem for microsatellite loci consisting of compound tetranucleotide loci (i.e. tetra + dinucleotides) producing stutter fragments to levels seen with dinucleotides (Hauge & Litt 1993;O'Reilly et al., 2000). The tailing of one of the microsatellite primers per locus has been found to improve scoring of stuttering fragments from di-, 382 Double-enrichment microsatellite isolation  tri-or tetranucleotide microsatellite amplifications (Oetting et al., 1998;Boutin-Ganache et al., 2001). This increases the robustness of scoring, and eliminates the end-labeling of one of the PCR primers for each specific locus, thus reducing the costs associated with labeling primers with fluorescent dye (Yu et al., 2001). However, the inclusion of such tails does increase the chance of formation of hairpins and primer-dimers (Hauswaldt & Glenn, 2003). Therefore, the selection of each tail -primer combination should be tested using a specialized program such as the GENRUNNER program to minimize potential secondary structures. Because the number of primers is increased in the PCR reaction, optimization of each primer set could be more labour-intensive. However, this is a drawback that can easily be overcome, and the benefits of the procedure far outweigh the drawbacks.
Our results show that double-enrichment should certainly be considered in order to overcome low percentages of clones containing tandem repeats. In this study, we were able to increase the number of sequences with repeat sequences to 100%. Even though we did not notice any problem with recombination in the PCR following the hybridization steps, care should be taken to avoid this possible complication. Keeping PCR cycles to a minimum seems to be an effective way to overcome the problem. The use of a double-enrichment protocol and the addition of a sequence tag to the microsatellite amplification primers provide an inexpensive approach for increasing the number of loci that can be screened and decreases the costs of genotyping. We are conducting further work to apply these markers to population discrimination of P. argus throughout its range.