Study of simple sequence repeat markers from coffee expressed sequences associated to leaf miner resistance

The objective of this work was to identify expressed simple sequence repeats (SSR) markers associated to leaf miner resistance in coffee progenies. Identification of SSR markers was accomplished by directed searches on the Brazilian Coffee Expressed Sequence Tags (EST) database. Sequence analysis of 32 selected SSR loci showed that 65% repeats are of tetra-, 21% of triand 14% of dinucleotides. Also, expressed SSR are localized frequently in the 5’–UTR of gene transcript. Moreover, most of the genes containing SSR are associated with defense mechanisms. Polymorphisms were analyzed in progenies segregating for resistance to the leaf miner and corresponding to advanced generations of a Coffea arabica x Coffea racemosa hybrid. Frequency of SSR alleles was 2.1 per locus. However, no polymorphism associated with leaf miner resistance was identified. These results suggest that marker-assisted selection in coffee breeding should be performed on the initial cross, in which genetic variability is still significant.


Introduction
The species Coffea arabica is an allotetraploid with 2n = 4X = 44, as a result of a natural hybridization of the diploid species C. eugenioides and C. canephora.The species is predominantly autogamous, while other Coffea species are diploids, allogamous and autoincompatible.Overall genetic variability of C. arabica is very restricted, not only in cultivated materials but also in wild accessions.This is a consequence of the mode of reproduction, of the narrow center of origin and of low number of seeds used for world dispersion (Maluf et al., 2005).
The leaf miner Leucoptera coffeella (Guérin-Méneville, 1842) (Lepidoptera: Lyonetiidae) represents the major plague in coffee plantations, and production losses due to insect infestation can reach 50%.The leaf miner is an obligate parasite and requires coffee plants to complete its life cycle.Upon oviposition on leaves, ecloded larvae feed directly from parenchyma tissues, which causes reduction of foliar surface and eventual leaf drop (Magalhães, 1964).This damage results in reduction of photosynthetic area and, consequently, of production and of plant survival.
Since 1934, the Instituto Agronômico (IAC) has been looking for resistance sources among Coffea species to use them as gene donators for the development of leaf miner resistant cultivars (Guerreiro-Filho et al., 1999).Different levels of resistance to the insect were observed among Coffea species (Guerreiro-Filho, 2006), and based on these responses they can be classified in highly resistant (C. stenophylla, C. brevipes, C. salvatrix and C. liberica), resistant (C. racemosa, C. kapakata, C. dewevrei and C. eugenioides), susceptible (C.congensis and C. canephora), and highly susceptible (C.arabica).Among these species, C. racemosa was chosen for donating resistance genes to C. arabica cultivars since this species also exhibits drought resistance, intense flowering, fair productivity and efficiency in crosses with C. arabica (Carvalho & Mônaco, 1968).
Although leaf miner defense mechanism in C. racemosa is not understood yet, genetic analysis demonstrated that resistance is dominant and controlled by two complimentary genes (Guerreiro-Filho et al., 1999).Resistance genes were transferred to C. arabica varieties by crosses, and therefore a large number of hybrid progenies is available and under selection for resistance to leaf miner.However, this is a timeconsuming process due to the long life cycle of coffee plants and to complex resistance evaluation methods (Guerreiro-Filho et al., 1999).
In this way, development of genomic tools for assisted selection would represent a significant advance in coffee breeding programs.Molecular markers have been used in several plant species for assisted selection, mapping and fingerprinting.Among molecular markers, the SSR are very useful for breeding and mapping purposes as they are highly reproducible, multi-allelics, codominants, highly polymorphic and widely distributed in the genome (Caixeta et al., 2006).
Recently, with the completion of large-scale sequencing projects, including a coffee EST sequencing project, an alternative strategy to identify SSR loci was to directly search for those repeats in genome databases, especially in EST collections.Using this approach, SSRS have been rapidly identified in several plant species such as sugarcane (Cordeiro et al., 2001), wheat (Eujayl et al., 2001) and Eucalyptus (Ceresini et al., 2005).Use of SSR associated with EST represent several advantages over genomic SSR.First, as they are associated with an expressed gene, which is involved in the development of a specific characteristic, there may be a chance that SSR affect this characteristic as well; and being associated with candidate genes SSR could also be used for gene mapping.Besides this, the presence of SSR in gene transcripts suggests a possible role during gene expression and function (Kantety et al., 2002).
The Coffee Genome Project was an initiative that involved several Brazilian research institutions and accomplished the sequencing of 200 thousand EST, which upon analyses resulted in about 35 thousand single genes.
The objective of this work was to identify expressed simple sequence repeats (SSR) markers associated with resistance to leaf miner in coffee progenies.Selected loci were amplified in a C. racemosa x C. arabica backcrossed hybrid population, under selection, to verify possible cosegregation patterns between SSR and resistance to leaf-miner.

Material and Methods
The segregating population used in this study represents an advanced generation of an interspecific hybrid between the susceptible species C. arabica and the resistant one C. racemosa.The complete genealogy of this program is shown in Table 1.The population consists of 136 plants, derived from open pollination of the resistant accession H14954-46 C1351 EP473.Plants were evaluated for resistance using an infected-disc methodology (Guerreiro-Filho et al., 1999).Only plants exhibiting scores corresponding to highly resistant (score 1) and highly susceptible (score 4) in a 1-4 scale were selected for further analyses.The following parental plants were also evaluated: cultivar Catuaí Vermelho IAC 81; C1195-5-6-2 (C.arabica x C1195-5-6); H13685-1 C1841 (IAC 81 x H11421-11); H14954-46 C1351 (IAC 62 x H13685-1).
Genomic DNA was extracted according to Maluf et al. (2005).DNA samples corresponding to resistant and susceptible plants were grouped for bulk analysis (BSA), according to Michelmore et al. (1991).A total Table 1.Genealogy of evaluated coffee population.
(1) Parental accessions evaluated in this study.of 21 bulks were assembled, being 11 of resistant and 10 of susceptible plants.
Directed searches were conducted in the Coffee EST Genome Database to identify SSR sequences, using both BLAST (Altschul et al., 1997) and Tandem Repeats Finder version 3.01 (Benson, 1999) software.Candidate EST-SSR were analyzed, and primers were selected using the Prime3 software (Rozen & Skaletsky, 1996).Sequences of evaluated primers could not be displayed due to a confidentiality agreement signed by Embrapa, IAC and Fapesp.A total of 315 EST-SSR were previously identified, and 32 were selected for this study.Criteria for EST-SSR choice consisted of expression in libraries from leaves and or suspension cells, treated with some type of stress agent, such as inoculation with leaf miner and leaf rust, abiotic stress, and also treated with nonspecific defense inductors.This approach aimed at identifying of SSR loci potentially associated with genes involved in defense mechanisms.
Microsatellites were initially amplified on parental genotypes to optimize reaction conditions.Afterwards, amplifications were conducted on all bulks, and those that exhibited polymorphism were amplified in all genotypes.Final reaction conditions were 40 ng of genomic DNA, 1X reaction buffer, 0.1 mM of dNTP, 2 mM of MgCl 2 , 5 pmol of each primer and 0.5 U of Taq DNA polymerase in 25 µL final volume.Complete thermal cycle program was 3 min at 94°C, followed by 30 cycles of 1 min at 94°C, 1 min at corresponding annealing temperature and 1 min at 72°C, and a final 5 min of elongation time at 72°C.Amplified fragments were separated by electrophoresis on both 2% agarose gel and denaturing 6% polyacrylamide gel.Amplified fragments separated in polyacrylamide gel were silver stained according to Creste et al. (2001).
Amplification of SSR loci was performed only in plants that were classified as highly resistant (score 1) and highly susceptible (score 4).This strategy aimed the identification of polymorphisms truly associated with the resistance characteristic.Also, the BSA strategy was employed to optimize the analyses avoiding unnecessary tests.Primers flanking 32 SSR loci were tested for amplification pattern in genomes of susceptible and resistant bulks.Amplified products were first separated in agarose gels, and those exhibiting single fragments were further separated through denaturing electrophoresis (Figure 1).In silico analyses included the characterization of EST-SSR regarding copy number, repetition motif, and identity of SSR-containing genes with corresponding position of the SSR on transcript.Analysis of SSR products were performed based on presence/absence of amplified fragments and characterization of number, size and frequencies of alleles within the population.Average allele number per locus (A) was calculated with the total number of alleles divided by total number of evaluated loci.

Results and Discussion
The results are displayed on Tables 2 and 3. Simple sequence repeat loci in coffee exhibited complex repeated motifs, being 65% of tetranucleotide repeats, 21% of tri-and 14% of dinucleotides.However, dinucleotides motif copy numbers were higher than others, and ranged from 17 to 30 copies.In similar studies, different results were observed for expressed SSR in several cereal species, as trinucleotide repeats were more frequent, corresponding to 54-78% of all evaluated SSR (Varshney et al., 2002).Also, in a survey of sugarcane expressed genome database, Pinto et al. (2004) identified SSR motifs in similar frequencies as those observed in this study, being 8.2% of dinucleotide repeats, 30.5% of triand 61.3% of tetranucleotides.The high frequency of complex motifs observed in both sugarcane and coffee genomes may be due to the polyploid nature of these species, specially C. arabica, that resulted from hybridization of different Coffea species, which could lead to chromosome and sequence rearrangements.Also, another hypothesis to explain the high frequency of complex motifs is a result of DNA polymerase slippage due to strands mispairing, during DNA replication of long repeat sequences (Chistiakov et al., 2006).
Identity of SSR-containing genes was performed by verifying manually each automatic annotation of corresponding gene sequences in the Coffee Genome Database.Result analysis indicated that 65.5% of selected genes coded for different known classes of proteins, including metabolic enzymes, structural proteins and transcription factors (Table 3).Also, some of those proteins have important roles during defense mechanism, such as peroxidase and PR homologue.
In general, biological function of SSR-containing genes and the possible role of these repeated motifs on expressed sequences are poorly documented in the literature.In human, trinucleotide repeats associated with several expressed genes were implicated in neurological disorders.In this case, the normal gene contains from 6 to 34 CAG repeats, and disease symptoms are severed when alleles containing higher repeat numbers are present (Sasaki et al., 1996).
The position of the SSR on the transcript may also imply a possible role of these repeats.In this study, 53.5% of all evaluated SSR loci were localized on transcripts 5'-non-translated region, or 5'-UTR (Table 3).This region is implicated in transcript stability and recognition by protein synthesis machinery .In plant and animals, SSR have been identified in 5'-UTR of several genes.Analyses of the waxy gene in rice demonstrated that CT-repeats, present in the 5'-UTR, are associated with variation on amylose content in the grain (Bligh et al., 1995).Although this variability is the result of a SNP in the coding region, near the SSR, CT repeat affects a splicing site (Larkin & Park, 1999).Also, the presence of CCG repeats in the 5'-UTR of ribosomal protein genes of maize is associated with fertilization regulation (Dresselhaus et al., 1999).
All these preliminary reports indicate that expressed SSR could play an important role during protein synthesis and activity.If these genes are related to the development of an agronomic trait, such as resistance to a pathogen, the presence of a SSR may also affect the evolution of this trait.Clearly, a broad characterization of the expression pattern of SSR-containing genes is necessary to corroborate this possibility.In coffee, there is little information regarding gene expression in response to leaf miner.Using a subtractive hybridization approach, Mondego et al. (2005) were able to identify a group of genes differentially expressed in resistant and susceptible coffee leaves upon leaf miner attack.However, no SSR was observed in those identified genes.
Only two out of 32 primer-pairs of selected SSR loci could not amplify any fragment, and the other ones amplified fragments corresponding to the expected size (Table 2).Allele sizes varied from 140 to 300 bp, except for some loci that exhibited alleles ranging from 700 to 900 bp, indicating the presence of an intron.Generally, all loci amplified two alleles, but some loci, such as SSR17 and SSR19, amplified four alleles.This would be expected, since C. arabica is a tetraploid species. (1)Non-translated region.Similar results were also observed for other species.In barley, SSR loci amplified from both EST and total genome were compared, demonstrating that amplification of larger-size alleles are actually more frequent than those of expected size (Thiel et al., 2003).Also, null SSR alleles derived from EST have been reported in other species such as rice (Cho et al., 2000) and wheat (Gupta et al., 2003).
Evaluated SSR loci amplified a total of 61 alleles, and the average number of alleles per locus was 2.1.In general, 80% of SSR loci amplified two alleles, 10% amplified four alleles and 10% amplified only one allele (Table 2).Although a significant amount of alleles was identified, only one of these was polymorphic among all genotypes evaluated, including parental genotypes.According to previous report, SSR derived from EST are indeed less polymorphic than those identified in genomic samples (Thiel et al., 2003), which could explain the low polymorphism level observed in this study.Actually, genomic SSR analyses in C. arabica cultivars demonstrated that a similar number of alleles (65) were amplified.However, heterozigosity level (H) was estimated in 0.33, even though the genotypes evaluated are genetically homogenous inbred lines (Maluf et al., 2005).
The polymorphic locus was the SSR18 which amplified a total of four alleles, distributed in different bulks (Figure 1A).Polymorphism was observed in bulk 18, corresponding to susceptible genotypes (Figure 1C).Segregation pattern of corresponding alleles in all genotypes demonstrated that polymorphic alleles came from P1, the resistant BC 2 5 genotype, and from P4, the susceptible cultivar Catuaí Vermelho IAC 81.However, only two plants exhibited the polymorphism, suggesting that this locus may be associated with other agronomic characteristics than resistance or susceptibility to leaf miner.
For this study, a segregating population from an interspecific cross between C. arabica and C. racemosa was selected.Reports described C. racemosa as the most genetically divergent species from C. arabica (Ruas et al., 2003).Therefore, it was expected that the selected population revealed a significant genetic variability, although representing an advanced generation (BC 2 F 5 ) of the inbreeding program.Nevertheless, no polymorphism was either observed on the accessions of earlier crosses evaluated in this study.These results suggest that probable polymorphism present in initial crosses were rapidly eliminated during the first selection cycles.
Since the selection for resistance to leaf miner was the major goal of this program, a low level of polymorphism on genes associated with defense mechanism may be expected, as these genes are under selection pressure.In this sense, the choice of EST-SSR potentially associated with defense genes could also explain the absence of polymorphic SSR alleles.A random choice of SSR would certainly reveal a higher number of polymorphic alleles, but not necessarily associated with resistance characteristic.In fact, the only polymorphic SSR locus identified here did not cosegregate with response to leaf miner (Figure 1).All these observations suggest that the use of EST-SSR for assisted selection in coffee is not suitable, when advanced generations of breeding programs are under investigation.
Although no molecular marker was identified here, this exploratory study may be important for future understanding of genetic events involved during interspecific crosses, especially those involving polyploid species.In recent analyses, Adams et al. (2003) have shown that cotton, an allopoliploid species resulting for the natural hybridization of Gossypium herbaceum x Gossypium raimondii, has particular gene expression patterns, where expressed allele is not randomly selected regarding ancestral species origin, in different plant tissues.This is an example of unknown genetic event associated with gene expression that favors the expression of a specific allele.A hypothesis to explain the role of expressed SSR is that these repeats could be implicated in the regulation of gene expression.If a SSR could be associated with genes that exhibit preferential expression in different biological situations, then selection of these genes could be performed using SSR loci as markers.In this context, the next step would be the characterization of expression of genes containing SSR, in coffee plants submitted to several situations and biological conditions.

Conclusions
1. Coffee simple sequence repeats-expressed sequence tags exhibit complex repeat motifs.
3. There is no correlation between simple sequence repeats polymorphisms in the genes evaluated and resistance to leaf miner in the population analyzed.

Table 2 .
Data-mining for EST-SSR on the Coffee EST Database.