In silico analysis of polymorphic microsatellites in penaeid shrimp and construction of a free-access database

We performed an in silico analysis of all microsatellites so far described for penaeid shrimp and for which the polymorphic behavior has previously been analyzed. The objective of the study was to evaluate the structural characteristics of these microsatellites and identifying patterns which allow the characterization of the nature of these sequences in the penaeid genome. All data were compiled in a free-access database specially constructed for this study. Three hundred non-mononucleotide polymorphic microsatellite loci described for 12 shrimp species belonging to the family Penaeidae were analyzed and simple and compound microsatellites with di-, tri-, tetra-, pentaand hexanucleotide motifs were found. Dinucleotides and trinucleotides were the most frequent motifs among both the simple and the compound microsatellites. Although a certain bias related to different microsatellite isolation methodology could not be discarded, it is possible that part of this microsatellite abundance reflects some degree of conservation of microsatellite motifs among the different species. There was a pronounced motif variability within and between species, indicating high differentiation dynamism of these repetitions in this animal group. This study not only sheds light on the structure of the microsatellites present in the penaeid shrimp genome but also resulted in the free-access Penaeid Shrimp Microsatellite Database (available at http://www.shrimp.ufscar.br) which may be very useful for optimizing the use of these microsatellites.

Over the last few decades, microsatellite sequences have been widely used in many animal groups, including shrimp, and principally in economically important species (Wolfus et al., 1997;Bierne et al., 2000;Xu et al., 2001).Due to their high level of allelic variation and codominant nature, microsatellite loci have been extremely useful for the determination of genetic diversity, stock discrimination, identification of lineages and individuals, the establishment of pedigrees, the development of breeding programs, linkage map studies and the identification of loci related to commercially important characteristics (Moore et al., 1999;Ozaki et al., 2000).
In the past half decade, different research groups have been striving to characterize microsatellite markers in the genome of a number of penaeid shrimp species (Cruz et al., 2002;Meehan et al., 2003;Pérez et al., 2005).However, the validation of these sequences has generally not been easy.The penaeid genome seems to contain very long microsatellite repetitions, hindering the cloning and sequencing of microsatellites containing both flanking regions.A few comparisons demonstrate that the microsatellites found in penaeids (Tassanakajon et al., 1998;Cruz et al., 2002) are approximately twice the size of the sequences found in other animal groups (Estoup et al., 1993;Brooker et al., 1994), being four or five times larger in some cases.
Currently, there is a significant number of polymorphic microsatellite loci characterized for different penaeid species.This important source of markers is available in public databanks and/or published in indexed journals that may be accessed online.We performed an in silico analysis of the polymorphic microsatellite loci described for penaeids up to June 2006, the aim being to characterize the structural patterns of these sequences and compile a freeaccess database containing the information needed for their optimized.
The microsatellite nucleotide sequences analyzed were accessed using their GenBank (NCBI) ID codes and/or via an electronic search of the journals available online.All the sequences were converted to the FASTA format and submitted to analysis using the Tandem Repeats Finder (TRF) program version 3.21 (Benson, 1999) for the identification of the repeats and localization of the flanking regions.The TRF program generated data files describing the motif regions and respective flanking regions.These data were compared with those described for the referred locus in the scientific literature and deposited in a freeaccess Penaeid Shrimp Microsatellite Database, the address of which is given below in the Internet resources section.
This database allows researchers to perform different searches and generate specific reports based on eight preestablished criteria: (1) author; (2) species; (3) ID code; (4) microsatellite type (simple or compound); ( 5) number of bases in the motif; (6) type of bases in the motif, i.e.AT-rich, CT-rich, etc.; (7) number and size of alleles per locus; and (8) results generated by the TRF program during the analysis, including any inconsistencies with the literature concerning the locus in question.Among the main observed inconsistencies in the sequences analyzed were the presence of microsatellites which were not flanked by one or both of the described primers, partially or totally absent repeats in the sequence described, microsatellites described with different motifs than those found by the TRF program and microsatellites with more than one GenBank ID or for which the ID numbers were different to those described in the paper in which it was cited.
Simple and compound microsatellites (Garza and Desmarais, 2000) with di-, tri-, tetra-, penta-and hexanucleotide motifs were found.Almost 52% of the total microsatellites analyzed contained only one type of motif and were characterized as simple microsatellites.In a few cases, simple microsatellites presented interruptions be-tween their repetition units and were thus classified as imperfect microsatellites.Approximately 48% were microsatellites composed of more than one motif-type, sometimes interrupted by divergent bases.Approximately 50% of the loci studied were described for L. vannamei and about 24% for P. monodon, both of which are considered to be the most commercially important marine shrimp in the world.Over 25% of the loci have been described for the other species, with, to date, only a single characterized and validated locus for the shrimp F. merguiensis (Table 1).
Dinucleotide sequences (interrupted or uninterrupted) were the predominant motif, present in 56% of the total microsatellites analyzed.Nearly 58% of the simple and 55% of the compound microsatellites contained dinucleotides, TC (19%) and GT (17%) being the most abundant motifs.Although a certain bias related to different microsatellite isolation methodology could not be discarded, it is possible that part of this abundance reflects some conservation of microsatellites motifs among the different species.Trinucleotide sequences also appeared at high frequencies and represented 27% of the total simple and 39% of the total compound microsatellites, and although a wide diversity of motifs has been described for this specific class of microsatellites the most frequent motifs were GAA (13%) for the simple trinucleotide microsatellites and ATT (7%) for the compound trinucleotide microsatellites.Tetra-, penta-and hexanucleotides were observed at lower frequencies, although tetranucleotides appeared in more than 30% of compound microsatellites.
The same microsatellite distribution trend verified for the set of all penaeid species was also observed for the species analyzed separately, although the trinucleotide microsatellites in P. esculentus were predominant.Dinucleotide repetitions appeared in 64% of the total microsatellites described for L. vannamei and in 45% for P. monodon, the two penaeid species with the largest number of known loci.However, the motifs TA (24%) in L. vannamei and TC and CA that appeared in equal proportion (6%) in P. monodon were the most abundant, differing from those observed when the set of all species is considered.The tetranucleotides were also abundant in the compound microsatellites of both species, corroborating the results that considered the entire penaeid set.
Despite the observed exceptions, similar to those reported for other crustaceans and insects (Robainas et al., 2003), microsatellites containing dinucleotide repetitions seem to be the most frequent among penaeid shrimp, followed by microsatellites containing trinucleotides.Moreover, even though a few motifs of all the analyzed classes may appear at greater abundance, there is a pronounced variability within and between species, indicating the high differentiation dynamism of these repetitions within this animal group.
Thus, besides contributing to a better understanding of the microsatellite structure in the penaeid shrimp genome, our analysis and the microsatellite database supporting it may be very useful for further genetic studies in shrimp and for a greater optimization of the use of these microsatellites.
Table 1 -Abundance and distribution of penaeid shrimp simple (S) and compound (C) polymorphic microsatellites among 300 microsatellite loci available from published work and/or genetic databases.Motif and size (di-, tri-, tetra-, penta-and hexanucleotide) repeats.N = number of described loci.