Survey of transposable elements in sugarcane expressed sequence tags ( ESTs )

The sugarcane expressed sequence tag (SUCEST) project has produced a large number of cDNA sequences from several plant tissues submitted or not to different conditions of stress. In this paper we report the result of a search for transposable elements (TEs) revealing a surprising amount of expressed TEs homologues. Of the 260,781 sequences grouped in 81,223 fragment assembly program (Phrap) clusters, a total of 276 clones showed homology to previously reported TEs using a stringent cut-off value of e-50 or better. Homologous clones to Copia/Ty1 and Gypsy/Ty3 groups of long terminal repeat (LTR) retrotransposons were found but no non-LTR retroelements were identified. All major transposon families were represented in sugarcane including Activator (Ac), Mutator (MuDR), Suppressor-mutator (En/Spm) and Mariner. In order to compare the TE diversity in grasses genomes, we carried out a search for TEs described in sugarcane related species O.sativa, Z. mays and S. bicolor. We also present preliminary results showing the potential use of TEs insertion pattern polymorphism as molecular markers for cultivar identification.


INTRODUCTION
TEs are genetic units capable of movement within the genome.As a consequence of this mobility, these elements are mutagenic agents and their activity produces structural changes in single genes or overall genome followed by altered spatial and temporal patterns of gene expression and, ultimately, gene function.They are present in all living organisms and in some large genomes (e.g.maize) they represent over 50% of nuclear DNA, providing an enormous source of variability that can be used to create novel genes or modify genetic functions (Bennetzen, 2000).Sugarcane (Saccharum officinarum) has an even more complex genome than maize and no information is available concerning the analysis of TEs.These considerations point out the significance of TEs contributions in the sugarcane complex genome understanding.
According to their mode of transposition, TEs are classified into two major classes, RNA mediated transposable elements or retroelements (Class I) and DNA transposable elements or classical transposons (Class II).Retrotransposons transpose through an RNA intermediate molecule, which is reverse transcribed to generate a cDNA copy that will be integrated elsewhere in the genome.They are highly represented in plant genomes, where there are 4 of the 5 recognized classes in eukaryotic cells.These elements, when flanked with long terminal repeats (LTRs), are organized into two subclasses copia and gypsy differing in the position of the integrase domain within the polyprotein.Retroelements without LTRs are the long interspersed nuclear elements (LINEs) and the defective small interspersed nuclear elements (SINEs).LINEs are autonomous, encod-ing for nucleocapsid (GAG) and reverse transcriptase (RT) proteins needed for transposition, while SINEs have only few hundred bases and need trans-acting RT polymerase and integrase to transpose.Finally, retroviruses, only found in animals, are derived from gypsy retrotransposons which have acquired an envelope protein that allows them to be infective (Xiong et al., 1990).
Transposons move as DNA molecules and their activity was first identified and described in plants in the early work of McClintock (1946).This class comprises families of elements like Ac (Doring et al., 1984) , En/Spm (Gierl and Saedler, 1989) and MuDR (Barker et al., 1984) of maize.The main characteristic of these elements is the presence of the transposase (Tpase) gene flanked by terminal inverted repeats (TIRs), this gene encodes the protein responsible for the excision/insertion of the transposon.DNA transposable elements with the same TIR are considered members of the same family (Bennetzen, 2000) and these TIRs may range from 11 bp for Ac to few hundred bases for MuDR.
There is another group of TE, miniature inverted-repeat transposable elements (MITEs) that, to date, hasn't been included in any of the previous described classes because their transposition mechanism is still unclear.They have short length (125-500 bp) and share features with both transposons and retrotransposons, because they have TIRs (10-15 bp) like transposons and a high copy number like retrotransposons (Zhang et al., 2000).MITEs have been described as non-autonomous elements, and it was hypothesized that their either have an undiscovered autonomous element or that they use host functions to transpose like DNA replication protein complex (J.Casacuberta pers.communication).Although the vast majority of MITEs are defective for transposase, recently, Tourist and Stowaway-like elements have been described containing putative transposases (Le et al., 2000;Turcotte et al., 2001).
Sugarcane is a highly allopoliploid derived from an interspecific hybridization between the wild species Saccharum spontaneum (2n = 36-128), contributing with the vegetative vigor and resistance traits, and the cultivated Saccharum officinarum (2n = 70-140) responsible for the high sugar content (Ming et al., 1998).Thus, resulting in a large and very complex genome with a chromosome number between 100-130 (D'Hont et al., 1996).
The sugarcane EST sequencing project (SUCEST) produced a large amount of partial cDNA sequences from different tissues submitted or not to different conditions of stress.SUCEST database represents an invaluable source of information for a little known and genetically complex species like sugarcane.After a thorough search of SUCEST database, we identified a variable spectrum of expressed TE.In this paper we present these data and analyze the polymorphic pattern of a particular clone of Ac-family homologue.

SUCEST ESTs database screening
The SUCEST database contains sugarcane cDNA partial sequences of 37 libraries from different tissues, physiological and stressed conditions (Vettore et al., 2001).Sequence comparisons against GenBank were available for keyword searching (Telles et al., 2001).TEs survey was carried out on 260,781 sequences grouped in 81,223 fragment assembly program (Phrap) clusters (Telles and Silva, 2001).The keywords used were "transposable element", "transposase", "transposon" and "retrotransposon".Due to the enormous amount of data and to avoid spurious matches a very stringent expectation cut-off value (e -50 or better) was used.
The protein and nucleotide public database of GenBank was also searched with the same keywords in order to detect all mobile genetic elements described in sugarcane related species Zea mays, Oryza sativa and Sorghum bicolor.
SUCEST clones were assigned a family according to the best sequence alignment against a fully characterized element using BLAST program (Altschul et al., 1990).

Expressed TEs in sugarcane
To analyze the number and types of expressed TEs present in SUCEST database, we used keyword search with an expectation cut-off value of e -50 or better identifying 276 TE homologous clones representing 117 Phrap clusters.A list of all the identified clusters can be found in SUCEST home page (http://sucest.lad.dcc.unicamp.br/private/mining-reports/UT/UT-mining.htm).Since we looked at elements that can present several polymorphic insertions along the genome, and a cluster may represent the grouping of mRNAs coming from different insertions, we decided to consider individual clones instead of clusters for our analyses.
The proportion of transposons (54%) and retrotransposons (46%) homologues was slightly different.Class I TEs were further classified into LTR, Copia/Ty1 or Gypsy/Ty3, and non LTR retroelements.Table I presents the number of clones identified for each category.The amount of Copia/Ty1 expressed elements was significantly higher than Gypsy/Ty3 and no non-LTR retrotransposons were found.Clones were assigned to a family according to the best alignment against a fully characterized element using BLAST program which enable us to classify the 276 SUCEST clones in 21 different families (Figure 1).It is possible that some of the TEs described in this paper represent a new TE family specific from sugarcane but this will only be confirmed after a detailed analysis of the element complete sequences.
As expected, no MITEs were identified since these defective elements are not transcribed.

Comparison with related species
In order to compare TEs found in sugarcane with those reported for related species, a species specific keyword search was made in public GenBank for all mobile elements described for Zea mays, Oryza sativa and Sorghum bicolor obtaining a total of 5348, 18 and 543 entries respectively.All the entries were analyzed to eliminate redundancy and spurious matches and were subsequently organized according to the classes they belong.Table II was generated from this study plus sugarcane data, and represents all known TEs described, to date, for these 4 grass species.Of the total of 71 different TEs 60.6% belong to Class I, 23.9% to Class II and 15.5% representing MITEs.
Since plant TEs were discovered in maize, this species became the model for plant TEs study and consequently, the vast majority of mobile genetic elements were originally described in Z. mays.This fact is clearly reflected in Table II where 60.4% of retrotransposons, 64.7% of transposons and 63.6% of MITES shown were first described in maize.Only 12 and 4 TEs were first characterized in rice and sorghum respectively while all the rice transposons were found by homology to previously described elements.Although retroelements are more represented, only MuDR and Ac families are present in all four species investigated.

Polymorphic insertion patterns of an Ac homologue Tpase
To evaluate the polymorphism level between different sugarcane cultivars generated by any of the TE homologues described, we chose an Ac-like clone and a PCR based strategy.Clone SCCCCL4010E02 sequenced from a callus-derived library is similar to Ac Tpase at e -114 value.Full length sequence of this clone resulted in a 1081 bp poliA + fragment homologous to the central part of the 3.5 kb Ac element mRNA (Kunze et al.,1987).This result indicated that SCCCCL4010E02 may be either a partial cDNA clone where the 5' end is missing or an mRNA from a truncated element unit or both.A pair of primers with an expected PCR product of 517 bp for sugarcane Ac homologous clone was designed.No introns were expected within this PCR fragment according to the original Ac element sequence.Genomic PCRs were carried out using total genomic DNA isolated from different sugarcane cultivars.Each cultivar has its own fingerprint with some shared fragments.The 517 bp expected band is present in all cultivars but intensity varies.Presence-absence type of polymorphic bands (1.6 and 1.1 kb bands) indicate the presence of other insertions of this Ac homologous element in the genome with variable sizes of the amplified fragment (probable due to insertion within the 517 bp PCR product) or concatamers of the same element.Differences in intensity (0.9 and 0.5 kb bands) indicate the presence of variable numbers of copies of that particular locus in the different cultivars.These results strongly suggest the activity of an Ac-like transposable element during these cultivars establishment process (Figure 2A).
To confirm the specificity of the designed primers, some of the amplified bands were cloned, sequenced and analyzed against GenBank.Results showed that those bands were homologous to Ac element demonstrating that PCR patterns are indeed different insertions of Ac homologues (data not shown).
Figure 2B shows the amplification pattern for 4 individuals of SP80-1043 X SP82-3349 segregating progeny.There are no new polymorphic bands with respect to parental lines, meaning that no major rearrangements associated to the element occurred during the cross.On the other hand, contributions of both parents to the Ac-like element genetic composition in the progeny can be observed as no single individual has exactly the same pattern as either of the parental lines.

DISCUSSION
Sugarcane revealed a surprising amount and diverse spectrum of expressed transposable elements.Considering that TEs are the most abundant non-genic DNA in plants, providing a rich source for genome evolution, and their special abundance in grass genomes (Bennetzen, 2000;Zhang and Peterson, 1999), it is absolutely necessary for sugarcane genomics understanding to evaluate transposable elements contribution.Ubiquitous and abundant, retroelements constitute more than 50% of the maize genome and they could be equally or even more numerous in other plant species with large complex genomes like sugarcane (Bennetzen, 2000;Mao et al., 2000).In contrast, retrotransposon expression has been demonstrated for only very few higher plants, probably due to the fine transcriptional control which makes difficult to find the temporal and spatial conditions for expression to occurs (Takeda et al., 1999;Grandbastien et al., 1998), denoting an intelligent strategy to perpetuate them selves without compromising host genome.Based on this considerations, our results agree with the expectation about finding more expressed transposons (54%) than retrotransposon (46%).
How does sugarcane support such a large number of expressed TEs? Plants adapt to genome perturbation like changes in ploidy, over 50% of all plant species are polyploid or have undergone periods of polyploidy during their evolutionary history.Silenced transposable elements in a diploid may be activated in the new genetic environment of the polyploid since genetic redundancy in the polyploid may buffer the potential deleterious effects of transposition (Voytas and Naylor, 1998).On the other hand, plant ge-nome expansion is an inevitable consequence of TEs insertion, the accumulation of retrotransposons blocks between genes is a major factor in the size difference between the maize genome and those of its smaller relatives (Federoff, 2000).
Genomic studies in rice by Turcotte et al. (2001) and Mao et al. (2000) report that Gypsy/Ty3 retrotransposons are more abundant than Copia/Ty1 elements.In contrast, our results show that, in sugarcane, the number of expressed Copia/Ty1 elements was significantly higher than Gypsy/Ty3.We also found no non-LTR retrotransposons in our search of SUCEST database, which indicates that these TEs are poorly expressed in sugarcane, moreover, they showed to be few represented in grass genomes as presented in Table II and previously described in rice (Mao et al., 2000).
In this paper we have reported 21 different expressing TEs in sugarcane.This finding strongly suggests that these elements are transpositionally active but it is important to point out that defective TE can also be expressed as seems to be the case for the Ac homologue analyzed here.If we consider that, prior to the SUCEST project, sugarcane was a largely unexplored species and our research concerned transcriptionally active elements only, the diversity of sugarcane TEs presented in this paper is comparable to maize, where to date 39 different elements have been described (Table II).
Our novel approach enabled us to have a genomic overview of potentially active transposable elements in contrast to the classic strategies where TEs were first described at DNA level, and eventually, the transcription and transposition were studied.
In sugarcane, somaclonal variation is a common phenomena caused either by tissue culture procedures or vegetative propagation (Irvine et al., 1991).In view of the fact that our results present a width spectrum of potentially active previously unknown TEs in sugarcane, we can hypothesize that some of the somaclonal variation events reported in the literature can be a result of TEs activity.
Recently, reports were published evaluating the utility of TEs as source to develop molecular markers showing that TEs are highly polymorphic in terms of insertion sites and their internal structure (Ellis et al., 1998).Furthermore, TE-derived markers efficiently distinguished rice cultivars (Fukuchi et al., 1993).Our preliminary analyses with the Ac homologue revealed element transposition during cultivar establishment and showed that TE-mediated fingerprinting could become a powerful tool for sugarcane.

Table I -
Transposable element content of the SUCEST database.
a Unclassified.

Table II -
Transposable element spectrum in sugarcane and the related species rice, maize and sorghum.