Identification and frequency of transposable elements in Eucalyptus

Transposable elements (TE) are major components of eukaryotic genomes and involved in cell regulation and organism evolution. We have analyzed 123,889 expressed sequence tags of the Eucalyptus Genome Project database and found 124 sequences representing 76 TE in 9 groups, of which copia, MuDR and FAR1 groups were the most abundant. The low amount of sequences of TE may reflect the high efficiency of repression of these elements, a process that is called TE silencing. Frequency of groups of TE in Eucalyptus libraries which were prepared with different tissues or physiologic conditions from seedlings or adult plants indicated that developing plants experience the expression of a much wider spectrum of TE groups than that seen in adult plants. These are preliminary results that identify the most relevant TE groups involved with Eucalyptus development, which is important for industrial wood production.

Based on the mode by which TE can move from one location to another in the genome, they are divided into two classes. Class I TE are transcribed to RNA intermediates, reverse-transcribed and integrated in a new genome site. Because of this activity, members of this class are also called retroelements, which are further subdivided into retrotransposons (e.g. copia-like and gypsy-like) that have long terminal repeats (LTRs), as well as the so-called non-LTR retroelements (e.g. long (LINE) and short (SINE) interspersed nuclear elements). Class II TE are those presenting terminal inverted repeats (TIR) and capable of moving from one site to another in the genome through a 'copy-and-paste' process that involves the action of transposases, which interact with TIR (Berg and Howe, 1989). Class II is further subdivided accordingly to TE structure and sequence or features of the target duplication site generated upon insertion (Capy et al., 1996). Examples of groups from Class II are CACTA, which is flanked by inverted repeats that terminate in a conserved CACTA motif (Wicker et al., 2003); MuDR, which codes for a transposase (Benito and Walbot, 1997) or other genes (Lisch et al., 1999) whose function remains unknown and control the expression of TE in the MU system, which is the most active and mutagen transposable element described in plants (Rossi et al., 2004); hAT, which is widespread among fungi, plants and animals (Rubin et al., 2001); and IS (insertion sequence), which is commonly found in bacteria (Schnetz and Rak, 1995). discovery of new TE such as those in the Jittery group, which are homologous to FAR1 and FHY3 genes (FAR1 family) and present regulatory functions (Hudson et al., 2003). Regulatory function of genes in the FAR1 family is proposed to involve a mechanism which is similar to that occurring in the linkage of transposase to the TIR of a Class II TE in the MuDR group (Hudson et al., 2003). Therefore, many of the genes which are currently classified in distinct groups of TE are likely to be involved in cell regulatory processes, so that by following TE expression patterns in different physiologic conditions one could identify putative regulatory genes.
In the present investigation, we have searched for TE in different tissues and physiological conditions in Eucalyptus, which is an important source of wood for industrial purposes. To accomplish that, we identified TE in libraries of the Eucalyptus Transcriptome Project (FORESTS ,  Table 1) through the utilization of a keyword search in the FORESTS database, which was carried out with keywords 'transposon,' 'transposase' or names of each group of TE. Only the retrieved EST (EST = expressed sequence tags) showing e-values £ 10 -5 were considered. An additional search for sequences of TE in the FORESTS database was done through a blastn (Basic Local Alignment Search Tool, nucleotide-nucleotide) utilizing the query sequences from the GenBank, representing every group of TE. Then, sequences of TE that were retrieved from the FORESTS databank were classified based on their similarity to previously described sequences in the GenBank. To achieve that, each Eucalyptus transposable element sequence was utilized as a query in a blastx (translated queries against the GenBank protein database) analysis and protein sequences showing e-values £ 10 -5 and scores ³ 80 were considered.
This procedure for identification and classification of TE resulted in 124 EST in 16 libraries of the FORESTS database, and these EST were grouped in 76 clusters ( Table 2). Many of these clusters (57) were found as singletons, which may correspond to rarely expressed genes, although it is likely that some of these singletons represented different regions of a single gene, considering the high gene size of TE (up to 9 kb (Capy et al., 1996)) compared to sizes of EST that were generated in the FORESTS project (~800 bp). The remaining 19 identified clusters were composed of 2 to 9 EST each (Table 2), and 16 of these clusters were composed of EST from more than one library, which indicates that some of the identified TE are expressed in more than a single plant tissue.
Most of the identified TE were in Class I (retroelements) and belonged to the FAR1 (29.8%) or copia (22.6%) groups while other retroelements, such as LINE and gypsy, were poorly represented (4.0 and 2.4%, respectively). Within the Class II TE, MuDR (16.9%) and hAT (12.1%) groups prevailed, followed by CACTA (4.8%), non-classified LTR (4.0%) and IS (3.2%) groups (Table 2). These results are in agreement with the high amount of expressed MuDR found in sugarcane (Rossi et al., 2001).    Total  2  13  1  12  12  1  22  3  4  8  5  8  4  9 1 1 9 1 2 4 *Clusters are divided in groups of TE. Cluster numbers are correlated to FORESTS codes as described in http://omega.rc.unesp.br/ transposable/tabela.php. + non-classified LTR However, to our knowledge, the presence of TE in the IS group has never been reported in plants. Elements of this group are quite common in bacteria, where they act as enhancers (e.g. Schnetz and Rak, 1995). It remains to be seen whether elements in the IS group, which were found in libraries from roots and leaves, have a regulatory function in Eucalyptus.
In order to better compare the relative amounts of distinct groups of TE in Eucalyptus, we have calculated the TE frequency F by the equation F = (n / N) 10 3 , where n is the number of EST in a given group of TE in a given library (values in Table 2) and N is the number of sequenced EST in this library (i.e. each of the 19 values in Table 1). The frequency FG was calculated for groups of TE, according to the equation FG = SF G , where F G represents each of the F values within a given group of TE. FG values indicated FAR1 as the most frequent group, followed by copia, MuDr, hAT and CACTA groups, non-classified LTR and gypsy, LINE and IS groups (Figure 1).
To compare the relative amounts of TE in different libraries, we have calculated FL, using FL = SF L , F L representing each of the F values within a given library. Our results are shown in Figure 2, which also shows the relative contribution of each of the groups of TE to FL values. These values vary from zero to 1.96, with a mean value of 0.92 (calculated for the FL values represented in Figure 2 plus the zero FL value of LV1, SL6 and ST7 libraries). This finding indicates that FORESTS libraries contained, on average, close to 1 transposable element per 1,000 EST, suggesting that a comparable expression rate may occur in Eucalyptus. This average value is very low, considering that TE usually represent 50-90% of plant genomes (Flavel, 1986;SanMiguel et al., 1996). This finding indicates that only a small fraction of TE that can be expressed in Eucalyptus is efficiently transcribed. Inhibition of expression of TE has been called silencing (Okamoto and Hirochika, 2001) and found to be widespread within many organisms such as maize (Fedoroff and Chandler, 1994;Rudenko et al., 2003), Arabdopsis (Hirochika et al., 2000;Steimer et al., 2000) or Drosophila melanogaster (Jensen et al., 1999;Malinsky et al., 2000).
It is unclear whether the calculated frequencies are related to the expression levels of TE; however, the FL values that we found are indicative of a highly variable expression pattern in distinct Eucalyptus tissues or even within single tissues submitted to distinct physiological conditions. For instance, CL1 and CL2 libraries, which were obtained from E. grandis calli, in the dark or in the presence of light, respectively, presented a dramatic difference in FL. CL1 presented a high FL value and contained four different groups of TE (especially MuDR and hAT, yet small amounts of FAR1 and CACTA) and CL2 showed a low FL value and contained TE only in the copia group.
In addition, libraries from seedlings (SL1, SL4, SL5, SL7 and SL8) contained almost all groups of Eucalyptus TE identified in our investigation, except members of LINE and IS groups. This pattern strongly contrasts that found in the BK1 library, which was obtained from eight-year-old trees and contained only FAR1 representatives. Similarly, we found a greater variety and frequency of groups of TE in the RT3 library (which was made from seedlings and contained almost all groups of TE that were identified in Eucalyptus, except CACTA and gypsy) than those found in the RT6 library (which was made from trees and contained only small amounts of copia and MuDr representatives). Finally, libraries from seedling stalk (ST2 and ST6) also had a great variety of groups of TE, although no transposable element was detected in any library from adult tree stalk (ST7). Taken together, these findings suggest that developing plants experience the expression of a much wider spectrum of TE than that seen in adult plants. 638 Eucalyptus transposable elements  Table 2 were converted to frequency values within each of the groups of TE as described in the text. nc: non-classified LTR.  Table 2 were converted to frequency values within each of the libraries as described in the text. Libraries LV1, SL6 and ST7 did not contain TE and therefore were not represented. Individual contribution of each of the groups of TE for the FL values are represented by different colors.
Our current mining efforts have identified the clusters and by consequence the FORESTS clones which contain genes of TE that may be differentially expressed in distinct tissues or physiological conditions in Eucalyptus. Starting from this preliminary information, further studies on the expression of these genes can be carried out in order to identify the most relevant TE involved in plant development, which is important for wood production on Eucalyptus plantations.