Development and characterization of polymorphic microsatellite markers for the sweet orange ( Citrus sinensis L . Osbeck )

The aims of this study were to develop simple sequence repeat markers (SSRs or microsatellite markers) in citrus and to evaluate the efficiency of these markers for characterization of sweet orange. We developed SSRs from a genomic library of ‘Pêra IAC’ sweet orange enriched for AG/TC, GT/CA, TCA/AGT and AAC/TTG sequence repeats. We selected 279 sequences from which 171 primer pairs were designed of which 113 with the best banding patterns were selected. Characterization of sweet orange microsatellite loci revealed that AG/TC was the most abundant (69%) microsatellite class isolated, followed by GT/CA (15.9%), TCA/AGT (8%) and AAC/TTG (6.2%). The number of alleles ranged from 1 to 4, with a mean of 2 alleles per locus. Four microsatellite loci developed in this study were found to be useful for sweet orange DNA typing. The data obtained from microsatellites loci considered polymorphic will be useful as tools in the selection of zygotic and nucellar plants, identification of seedlings etc. for the cultivars Pêra IAC, Lanceta, Pêra GS 2000, Lamb Summer, Lima, Lima Tardia, Lima Verde, Mimo do Céu, Valência Folha Murcha, Valência Folha Concha, Natal Murcha, Sangüínea and Baía Gigante.


Introduction
Citrus fruits are one of the main fruit crops, sweet orange (Citrus sinensis L. Osbeck) being the most representative and recognizable species of this group.The sweet orange originated from Asia and its hybrid characteristic seems to come from a cross between mandarin (Citrus reticulata Blanco) and pummelo (Citrus grandis L. Osbeck) (Davies and Albrigo, 1992;Nicolosi et al., 2000).
Citrus varieties show diversity in their morphological traits such as size and shape of canopy, color, size, type and ripening season of the fruits and the number of seeds per fruit.According to Hodgson (1967), sweet oranges are classified into four groups: common, low acidity, pigmented and navel oranges.Despite the existence of substantial diversity among cultivated genotypes in respect of morphological, physiological and agronomic traits, very little DNA variation has been detected using DNA-markers such as variable number of tandem repeats (VNTRs) loci (Orford et al., 1995), inter-simple sequence repeats (ISSRs) markers and restriction fragment length polymorphisms (RFLPs) (Fang and Roose, 1997), random amplified poly-morphic DNA (RAPD) (Targon et al., 2000) and isozymes (Novelli et al., 2000).
Research on germplasm characterization, genetic variation and breeding in oranges and other Citrus species has been hampered because of characteristics related to the reproductive biology of these species, i.e. high interspecific fertility, apomictic reproduction, polyembryony, a long juvenile phase and a paucity of polymorphic DNA markers (Bretó et al., 2001;Corazza-Nunes et al., 2002).
Microsatellites, or simple sequence repeats (SSRs), are short (1-6 base pairs, bp) tandem repeats of DNA sequences that are dispersed in all eukaryotic genomes (Zhao and Kochert, 1993).Genetic studies using microsatellite markers have increased rapidly because they are highly polymorphic, heterozygous conserved sequences which can be used as codominant markers (Gupta et al., 1996;Zane et al., 2002).
The objectives of the study reported in this paper were to develop and characterize new microsatellite markers from enriched citrus genomic libraries not only to evaluate the usefulness of these markers for assessing intraspecific variability for the identification and characterization of sweet orange (C.sinensis) accessions but also to compare the patterns of genetic diversity of accessions separately and in genomic pools.

Plant materials
In this study we used 41 Citrus sinensis (L.Osbeck) cultivars (Table 1) representing different groups of oranges (Hodgson 1967)

Construction of enriched libraries, sequencing and primers design
Genomic DNA from the sweet orange Citrus sinensis (L.Osbeck) cv.Pêra-IAC was used to construct libraries enriched in dinucleotides and trinucleotides.The enrichment was based on the procedure described by Kijas et al. (1995) with modifications.Briefly, genomic DNA was extracted from the leaf tissue of cv.Pêra-IAC by the method of Murray and Thompson (1980) and partially digested with the Sau3AI restriction enzyme to produce DNA fragments of »200 and 800 bp which were excised and purified by the chloroform:phenol method, quantified and ligated with specific Sau3AI adapters.The libraries were enriched for dinucleotide (AG/TC and GT/CA) and trinucleotide (TCA/AGT and AAC/TTG) sequences by biotinylation of these oligonucleotides and separation with streptavidinmagnetic beads (Promega, Madison, Wis.).The selected fragments were amplified using primer sequences comple- mentary to the adapters, then attached to the vector (pGEM-T Easy Vector Systems, Promega, Madison, Wis.) and transformed into E. coli host strain DH5a (CAPTACSM).Transformed colonies were cultivated on LB-agar (Sambrook et al., 1989) supplemented with 100 mg mL -1 ampicillin, 100 mM IPTG (isopropyl-beta-Dthiogalactopyranoside) and 50 mg mL -1 X-Gal (5-brom-4-chloro-3-indolyl-beta-D-galactopyranoside).White colonies were transferred to fresh plates and a dot-blot performed using nylon membranes (Hybond TM -N, Amersham, Piscataway, N. J., USA).Positive clones were identified by hybridization with probes specific to the AG/TC, GT/CA, TCA/AGT and AAC/TTG repeats labeled with biotin 16-ddUTP and the plasmids isolated.Sequencing of the inserts was performed using the ABI 377 Big-Dye Terminator (Applied Biosystems, USA) and sequences containing microsatellites were used to design PCR primers complementary to the flanking region of the microsatellites using the Primer Designer 2 software (Lincoln et al., 1991) and the primers synthesized by Operon Technologies (USA) and Bio-Synthesis (USA).We gave each microsatellite primer marker a name consisting of the prefix CCSM (Centro Citros Sylvio Moreira) followed by a number.

Amplification and analysis of microsatellite loci
The microsatellite loci were amplified in a 25 mL reaction mixture containing 2.5 mL of 10X Buffer [100 mM Tris-HCl pH 8.3, 500 mM KCl, 25 mM MgCl 2 and 0.01% (w/v) gelatin] (Gibco BRL, Life Technologies, USA), 200 mM of each dNTP (Gibco BRL, Life Technologies, USA), 24 pmol of each primer (forward and reverse), 100-150 ng of genomic DNA and 1.0 unit of Taq polymerase (Gibco BRL, Life Technologies, USA).Amplification was performed according to Kijas et al. (1995) using 55 °C for annealing, and by a touchdown program (TD1) of 30 cycles of 30 s at 94 °C, 30 s at 65-56 °C (touchdown 0.3 °C every cycle) and 5 s at 72 °C.Amplification products were separated on 3% (w/v) MetaPhor agarose (Karlan, FMC, USA) containing 0.5 ng/mL of ethidium bromide and primers that showed clear and scorable amplification patterns were scored on gels containing 10% (w/v) polyacrylamide (29:1 acrylamide/bisacrylamide) and 5X TBE (Tris-Borate-EDTA) buffer.After each run, the gels were developed with silver nitrate according to the method of Beidler et al. (1982).
Preliminary analysis was conducted with 171 SSR loci and seven cultivars (LIM, HAM, PER, VAL, VFM, LAN and VFC), with, additionally, DNA from each of these accessions being extracted and mixed to form a genomic pool of equal molar concentration.After initial analysis, 113 SSR loci showed interpretable PCR products which were used to screen the 41 cultivars shown in Table 1.
For each cultivar we determined the absence or presence and total number of alleles of each microsatellite marker, putative alleles being indicated by the estimated size in bp.The genetic information was assessed using the following parameters: number of alleles per locus; observed heterozygosity (Ho); expected heterozygosity (He); and the polymorphism information content (PIC) values based on the equation 1 -S pi 2 , where pi is the frequency of the ith allele in the 41 genotypes tested.All parameters were calculated using the CERVUS 2.0 program (Marshall, 2001, University of Edinburgh, UK).
Cluster analysis was performed with the numerical taxonomy and multivariate analysis software package NTSYS-pc version 2.02g (Rohlf, 1993) using the unweighted pair-group method with averages (UPGMA) (Sneath, 1978).

Description of microsatellites
A total of 279 sequences were chosen and checked for the presence of microsatellites, from which 171 microsatellite PCR primer pairs were synthesized and tested for polymorphism.Microsatellites are classified as perfect, imperfect or compound (Weber, 1990) and we found that most (78%) of our repeats were perfect (65% dinucleotide, 13% trinucleotide), with only 22% being compound and/or imperfect repeats containing both di-and trinucleotides.The locus with the lowest number of repeats was a five-repeat trinucleotide while the locus with the largest number of repeats was a 47-repeat dinucleotide.
We analyzed a total of 171 microsatellite primer pairs, 113 (66%) producing the expected DNA fragments in their PCR products while 32 (18.8%) produced nonspecific products and 26 (15.2%) showed complex (uninterpretable) patterns.For the 113 microsatellite loci which produced the expected PCR products, microsatellites flanking AG/TC dinucleotide repeats were the most frequent (69%) with those flanking trinucleotide repeats accounting for only 14.2% (8% TCA/AGT, 6.2% AAC/TTG).Of the 113 loci producing the expected PCR products, 63 microsatellites were uninformative because they showed just one allele and were not used in the PCR typing of the orange cultivars and hence will not be discussed further in this paper.We used the 50 remaining microsatellite loci in the PCR typing of the orange genotypes shown in The number of putative alleles per loci ranged from one to four with an average of two alleles per loci.The variation of PCR fragment sizes among the different alleles within the individual microsatellite loci tested ranged from 50 bp to 380 bp.
The CCSM13 locus gave two alleles (320 and 380 bp) for the majority of cultivars, with the exception of the PER, LAN, PGS and LSU cultivars which shared the same 320/320bp homozygous banding pattern.The CCSM17 locus showed two alleles (80 and 100 bp), with the LIM, LIT, LIV and MIM cultivars being 100/100bp homozygotes while the remaining cultivars were 80/100 bp heterozygotes.The CCSM18 locus was made up of four alleles (150, 280, 300 and 380 bp), with the VFM, MIM and VFC cultivars being 150/380 bp heterozygotes while LAN, NAM and SGN were 280/300 bp heterozygotes, the remaining cultivars being 280/380 heterozygotes.The CCSM147 locus consisted of two alleles (100 and 120 bp), cultivars LIM, LIT, LIV, MIM and BGI being 100/100 bp homozygotes and the remaining cultivars 100/120 bp heterozygotes.
The level of polymorphism of the 50 microsatellite loci was investigated in all 41 sweet orange cultivars (Table 2), with the expected heterozygosity ranging from 0.499 to 0.575 (mean He = 0.507) and the observed heterozygosity from 0.878 to 1.000 (mean Ho = 0.993).The CCSM18 locus showed the highest polymorphism information content (PIC = 0.473) and the CCSM147 locus the lowest (PIC = 0.371), mean PIC being 0.376.
A similarity dendrogram (Figure 2) was constructed using UPGMA cluster analysis which grouped the 41 culti-vars into eight groups and also showed that all these sweet orange cultivars are very closely related.

Discussion
The microsatellite enrichment procedure was highly successful, producing 50 informative microsatellite primer pairs out of 279 sequences from the genomic libraries of the citrus cultivars studied.These results mean that a large number of new microsatellite markers are now available as a useful tool in research involving citrus genera and species, details of these markers being available on the  CAPTACSM website site (www.centrodecitricultura.br/SSR.htm).
The origin and evolution of microsatellites are not well known, but they presumably originate from single or multiple mutational events such as the unequal recombination of chromatids, duplications during replication, base substitutions and insertion/deletions (Gupta et al., 1996;Zane et al., 2002).According to Kutil and Williams (2001) compound microsatellites are less persistent than perfect microsatellites because the former contain more imperfections and deletions and probably represent the last stage before degradation.Perfect microsatellites were the most frequent type in our study, followed compound and the imperfect microsatellites, and this suggests a certain degree of sequence stability and conservation in their recent evolution, although it is also possible that the use of enriched libraries may have enhanced the number of perfect microsatellites (Moriguchi et al., 2003).
The number of dinucleotide repeats detected was higher to that of trinucleotide repeats.According to Gupta et al. (1996), (AT) n repeats are the most frequent in plants, followed by (AG) n and (AC) n .In our study, analysis of 113 microsatellite loci indicated that the AG/TC repeat was the most frequently (69.9%) dispersed form of microsatellite detected.This is consistent with surveys of microsatellite repeats in other species (Moriguchi et al., 2003;He et al., 2003b;Alghanim and Almirall, 2003;Ferguson et al., 2004) in which the AG/TC motif was the most abundant.However, because dinucleotide repeats are more abundant and variable they also generate stutter bands (i.e.smeared bands) during amplification (Schlotterer, 1998).Although our PCR conditions were optimized we found that of the 171 pairs of primers tested 48 pairs did not amplify interpretable patterns and that sequences with a higher number of repeats generated considerable stutter, these 48 microsatellite being excluded from the citrus genotype analysis.Rossetto et al. (1999) reported that agarose and acrylamide gels gave comparable electrophoresis during screening, while we found that although the banding patterns on 3% MetaPhor gel were satisfactory for all primers the resolution on 10% polyacrylamide gel was more accurate when the interpretation involved small fragments.
The vast majority of microsatellite loci have highly conserved flanking regions (Zane et al., 2002) and most (113) of our sweet orange microsatellite loci showed patterns that could be utilized (66%) in other citrus species and related genera as already employed by some authors (Corazza-Nunes et al., 2002;Oliveira et al., 2002;Koehler-Santos et al., 2003;Cristofani et al., 2001;2003).
Because of the very high similarity of DNA sequence in different sweet orange cultivars genomic pools are not recommended for use in the selection of polymorphic primers because some genotypes could be easily missed during analysis.An example of this is the LIM cultivar which showed only one 100 bp allele for the CCSM17 locus, because of which we were unable to score this locus as homozygous when using the genomic pool.Pooling of genomic DNA is thus unsuitable for samples with low polymorphism (i.e.sweet orange cultivars) but would be ideal for studies regarding the identification and selection of progenies whose markers were linked to a distinct phenotype (Ferreira and Grattapaglia, 1998).
Of the 113 microsatellite loci detected by us 63 showed just one allele and were excluded from the genetic information analysis, which meant that we scored 50 microsatellite loci in the 41 sweet orange accessions tested.The total number of alleles found was 102 and the average number of alleles per locus was 2, this value being low when compared to data from plants such as Melaleuca (tea tree) (Rossetto et al., 1999), Cryptomeria (Moriguchi et al., 2003), Cannabis (Alghanim and Almirall, 2003), Litchi (Viruel and Hormaza, 2004) and peanut (He et al., 2003b;Ferguson et al., 2004).On the other hand, the majority of the sweet orange cultivars tested were heterozygous for almost all the microsatellite loci analyzed, as shown by the fact that the mean Ho value was 0.993 (details published as supporting information on the CAPTACSM website site www.centrodecitricultura.br/SSR.htm).
It recognized that sweet oranges have a narrow genetic basis and that most morphological characters origi- nated through mutations, propagation of sweet oranges being by vegetative propagation as is the case for the majority of citrus species (Herrero et al., 1996;Bretó et al., 2001).The similarity between the sweet orange cultivars investigated by us can be seen in the dendrogram shown in Figure 2, this close relationship supports the view that most sweet orange cultivars arose through mutation.
Difficulties in obtaining markers for the characterization of sweet oranges was reported by Orford et al. (1995) who identified no differences among sweet orange cultivars analyzed using minisatellite markers and similar results were observed by Novelli et al. (2000) using isozymes and RAPD markers (Targon et al., 2000).Low polymorphism was also described by Fang and Roose (1997) who detected only four ISSR markers able to differentiate some sweet orange cultivars and suggested that most cultivars probably originated from only one ancestor by mutation.
Genetic variability in citrus is considered to be the result of many factors, such as hybridization, mutation and type of reproduction (mostly apomictic).The low intraspecific diversity found in cultivated species such as sweet orange contrasts with the high variability of agriculturally important traits such as ripening period and color and size of fruits (Herrero et al., 1996).Because of these factors and the general lack of citrus molecular markers the distinction between cultivars is still based mainly on morphological traits, especially fruit traits (Fang and Roose, 1997).
The four polymorphic microsatellite markers CCSM13, CCSM17, CCSM18 and CCSM147 represented 3.5% of the microsatellite loci analyzed by us, the polymorphism observed in these loci allowing the characterization of some of the sweet orange cultivars.The CCSM17 (PIC = 0.373) and CCSM147 (PIC = 0.371) loci were polymorphic in the low acidity sweet orange cultivar Mimo do Céu (normally classified as a common sweet orange, q.v.Table 1) and the navel orange Baía Gigante.A similar situation occurred for the CCSM13 locus, which was polymorphic in the common sweet oranges group (PIC = 0.373), while the CCSM18 locus (PIC = 0.473) showed polymorphism among common and pigmented sweet orange cultivars.
The data obtained from polymorphic microsatellites loci will be useful as tools in the selection of zygotic and nucellar plants, the identification of seedlings etc. for the following sweet orange cultivars: Pêra IAC, Lanceta, Pêra GS 2000, Lamb Summer, Lima, Lima Tardia, Lima Verde, Mimo do Céu, Valência Folha Murcha, Valência Folha Concha, Natal Murcha, Sangüínea and Baía Gigante.
Although the development of microsatellites is considered laborious and expensive when compared to other markers (Fang and Roose, 1997), data from other crops show that they are extremely efficient as reproducible and highly informative genetic markers (Zane et al., 2002;Viruel and Hormaza, 2004;He et al., 2003a).
Microsatellite markers specific for citrus are restricted to less than twenty pairs of primers designed by Kijas et al. (1995;1997) but not evaluated in sweet oranges, although they show a high level of sequence conservation among Citrus, Poncirus and Severinia genera where they have been used in various studies (Kijas et al., 1997;Ruiz et al., 2000;Pang et al., 2003).It is interesting to note that the level of conservation of the microsatellite loci developed in our study has also been observed in studies on grapefruit and mandarin (Corazza-Nunes et al., 2002;Koehler-Santos et al., 2003), in work on the saturation of genetic maps (Cristofani et al., 2003) and in the multiple analyses of the loci of inter-specific progenies (Cristofani et al., 2001;Oliveira et al., 2002).For this reason it is important that the microsatellite markers developed by us are made available for studies involving the molecular characterization of different citrus species and hybrids, to which end we have fully listed them on the CAPTACSM website site (www.centrodecitricultura.br/SSR.htm).
The study reported in this paper is the first to present a large number of microsatellite markers exclusively developed for the characterization of sweet oranges.Four intraspecific polymorphic microsatellite markers were found to be able to distinguish between important cultivars and will certainly be useful for purposes such as the certification of varieties and the identification of zygotic plants from controlled crosses between sweet orange cultivars.

94Figure 2 -
Figure 2 -Dendrogram illustrating the relationships among 41 sweet orange Citrus sinensis cultivars.The dendrogram was draw based on unweighted pair group method with averages (UPGMA) cluster analysis using the similarity matrix derived from 50 microsatellite loci.

Table 1 -
Sweet orange cultivars used in this study.

Table 1 ,
fur-  92Polymorphic microsatellite markers for the sweet orange ther details on these microsatellite primers are available at the CAPTACSM website (www.centrodecitricultura.br/SSR.htm).

Table 2 -
Description and variability parameters for 4 informative microsatellites in 41 sweet orange cultivars.
* F = Forward and R = Reverse.# Polymorphism information content.