Evolutionary history of the Tip100 transposon in the genus Ipomoea

Tip100 is an Ac-like transposable element that belongs to the hAT superfamily. First discovered in Ipomoea purpurea (common morning glory), it was classified as an autonomous element capable of movement within the genome. As Tip100 data were already available in databases, the sequences of related elements in ten additional species of Ipomoea and five commercial varieties were isolated and analyzed. Evolutionary analysis based on sequence diversity in nuclear ribosomal Internal Transcribed Spacers (ITS), was also applied to compare the evolution of these elements with that of Tip100 in the Ipomoea genus. Tip100 sequences were found in I. purpurea, I. nil, I. indica and I. alba, all of which showed high levels of similarity. The results of phylogenetic analysis of transposon sequences were congruent with the phylogenetic topology obtained for ITS sequences, thereby demonstrating that Tip100 is restricted to a particular group of species within Ipomoea. We hypothesize that Tip100 was probably acquired from a common ancestor and has been transmitted vertically within this genus.


Introduction
Transposable elements (TE), which are also referred to as "jumping genes", due to their ability to move around inside the genome, are important sources of genetic variability that have contributed to genome evolution (Biémont and Vieira, 2006;Slotkin and Martienssen, 2007;Naito et al., 2009;Blumenstiel, 2010). Through their being extremely variable in sequence, molecular organization and replication mechanisms, these characteristics have been used to classify TEs in a hierarchical manner (Wicker et al., 2007). Some transposable elements can also be domesticated by their host genomes, thereby contributing to important processes in the organism (Knon el al., 2009).
The transposable element Tip100 was initially identified in Ipomoea purpurea by Habu et al. (1998). It is a class II transposable element that moves through a DNA intermediary, and is classified in the order TIR and the superfamily hAT (Wicker et al., 2007). It possesses 11 bp-long TIRs (terminal inverted repeats), produces 8 bp target site duplications (TSDs) as co-products of mobilization, and has a conserved hATC (hAT family dimerization domain) protein domain in the transposase, all these being characteristic features of the hAT superfamily (Kempken and Windhofer, 2001;Rubin et al., 2001;Arensburger et al., 2011).
Tip100 is an autonomous, freely moving element in I. purpurea (Ishikawa et al., 2002), to which has been attributed the color variegation patterns observed in flowers of some strains. Habu et al. (1998) demonstrated that this TE is inserted into either the 5' regulatory region, or the intron of the Chalcone Synthase D gene (CHS-D). Its presence in this gene, which encodes the enzyme responsible for the first step of anthocyanin production, can induce modification of colors in flowers. Recurrent somatic excision of Tip100 in the CHS-D gene can generate the variegated patterns observed in some I. purpurea plants . Likewise, many other transposable elements are capable of affecting the genes of the anthocyanin pathway .
The genus Ipomoea is a member of the Convolvulaceae, one of the large families of Solanales. It includes numerous species that are mainly distributed in the Americas (Austin and Huáman, 1996;Austin and Bianchini, 1998). Some are simply weeds, whereas others are economically important, viz., sweet potatoes and ornamental plants, such as the morning glories I. purpurea and I. nil. Plants of this genus are appropriate biological models for research, through presenting exceptional morphological and habitat-use diversity, whereby their extensive experimental versatility (Stefanovi et al., 2003;Clegg and Durbin, 2003).
In plants, nuclear ribosomal internal transcribed spacers (ITS) comprise one of the most useful sequences for phylogenetic studies at the species level (Feliner and Rosselló, 2007). Results from previous studies on Ipomoea using ITS sequences (Miller et al., 1999(Miller et al., , 2004Manos et al., 2001), are congruent with those based on morphological characteristics (McDonald and Mabry, 1992;Austin and Huáman, 1996;Austin and Bianchini, 1998).
In the present study, Tip100-related elements in ten Ipomoea species and five commercial cultivars were investigated. These species are representative of the three subgenera of Ipomoea, namely Eriospermum, Ipomoea and Quamoclit (Austin, 1975;Austin and Huáman, 1996;Austin and Bianchini, 1998). Our aim was to shed light on how the Tip100 transposable element is distributed among Ipomoea species, and how it may have evolved, by comparing the phylogenetic relationships of TE sequences with the host-species phylogeny.

DNA extraction
DNA was extracted from 0.1 g of germinated plantleaf tissue, according to the protocol described by Oliveira et al. (2009). The species examined in this study and their origins are shown in Table 1.

PCR cloning and sequencing of Tip100 sequences
Primers, designed with Oligo 4.1 software (Rychlik, 1992) were based on the Tip100 sequence from I. purpurea (Habu et al., 1998). The forward primer (5'-CGTTCTCC TTTTGTTGGTGT-3') anneals in the putative regulatory region of the element at positions 621-640 and the reverse primer (5'-GCTTCTCAATGGGGCACTTC-3') does so in the first region of the transposase ORF at positions 1526-1545. A non-coding sequence region was chosen, as this part is expected to be more variable, and so, phylogenetically more informative. PCR assays were performed in 10 mL volumes with 20 ng of genomic DNA, 0.2 U Taq DNA polymerase (Invitrogen), 1X Reaction Buffer, 1.5 mM of MgCl 2 and 200 pmol of each primer. The following thermocycler amplification process was used: 94°C for 5 min, 30 cycles at 94°C for 45 s, 55°C for 30 s and 72°C for 60 s, followed by a final extension cycle at 72°C for 7 min. The amplified fragments were cloned using the TA Cloning Kit pCR 2.1 Vector (Invitrogen). Plasmid DNA was isolated by miniprep alkaline lysis (Sambrook and Russel, 2001), and then precipitated with 13% PEG and 1.6 M NaCl. 35 plasmids from all the species and varieties were selected, for direct sequencing of the two strands in a MegaBACE 500 automatic sequencer. The dideoxy chaintermination reaction was implemented with the DYEnamic ET kit (GE Healthcare). To obtain sequences for each clone, reads, were assembled using Gap4 software from the Staden Package (Staden, 1996), with assembly continuing until a confidence value higher than 30 was obtained. The Tip100 sequence described by Habu et al. (1998) (GenBank AB004906) was also included in the analysis. All the new sequences obtained in this study were deposited in GenBank (Accession No: HM014415-HM014422).

Analysis of transposon sequences
The identity of the cloned sequences was determined by Blast searches (Altschul et al., 1990) in the NCBI and RepBase databases. Nucleotide sequences were aligned using Clustal W (Thompson et al., 1994), with default parameters. Cons software (Rice et al., 2000) was used to obtain consensus sequences of clones that presented divergences of less than 8.5%, and belonged to the same species or variety. Mega 4 software (Tamura et al., 2007) was used to obtain divergences for sequences with Tamura 3 parameters.

PCR and sequencing of internal transcribed spacers (ITS)
The primers used to amplify ITS sequences, viz., ITS92 (5'-AAGGTTTCCGTAGGTGAAC-3') and ITS75 (5'-TATGCTTAAACTCAGCGGG-3'), had already been described by Baldwin (1992). The amplified region corresponded to the two internal spacers (ITS1 and ITS2), as well as the complete 5.8S ribosomal gene region between these. PCR conditions were similar to those used for the Tip100 PCR runs, except for the temperature cycles which were as follows: 94°C for 5 min, 35 cycles at 94°C for 40 s, 55°C for 30 s and 72°C for 80 s, followed by a final cycle of 72°C for 7 min. The resultant PCR fragments were purified with 13% PEG and 1.6 M NaCl, and directly sequenced in a MegaBACE 500 automatic sequencer. The dideoxy chain-termination reaction was carried out with a DYEnamic ET kit (GE Healthcare). An ITS sequence for Merremia tuberosa (AF110909), obtained from GenBank, was used as outgroup during analysis. The newly obtained sequences were deposited in GenBank (Accession No: HM14423-HM14437).

Analysis of ITS sequences
ITS sequence-processing was the same as that for Tip100 sequences, except that sequence-distance calculations were performed using a Tamura Nei model in Mega 4 (Tamura et al., 2007), and Bayesian analysis with a GTR+G model.

The Tip100 transposon
The molecular investigation of Tip100 homologous sequences in ten different Ipomoea species and five Ipomoea commercial varieties, lead to identification of the transposon in four species and four varieties, through positive PCR amplification of the expected 900 bp fragment (Table 1).
Sequence analysis showed the different cloned elements to be very similar, with levels of divergence varying from 0.0% to 2.8% ( Table 2). The only exception was the Tip100 sequence in I. alba, which was more divergent (14.9%) from that in other species. The second highest divergence was 2.8% between I. nil 'Candy Pink' and the Tip100 sequence described by Habu et al. (1998) for I. purpurea (Tip100-AB004906). The lowest levels of divergence were found between I. nil and I. nil 'Candy Pink' (0.1%), among I. purpurea 'Kniolas Black Knight', I. purpurea and I. purpurea 'Split Personality' (0.1%), and between I. purpurea and I. purpurea 'Split Personality' (0.0%).
The complete Tip100 transposase CDS contains 2,426 bp that encode 808 amino acids. The region analyzed in this study covers the first 268 bp of the 5' end of the transposase CDS, corresponding to 73 amino acids. In this region, amino acid sequences are well conserved among the different Ipomoea. Although some nucleotide changes were found, amino acid sequences and physiochemical properties remained conserved in the analyzed region. The only exception was nucleotide loss at position 30 of the transposase ORF in I. nil and I. nil 'Candy Pink sequences', thereby causing amino acid deletion ( Figure S1, Supplementary Material).
Bayesian analysis indicated three clusters in an unrooted tree. As expected, the most divergent clade was formed by I. alba Tip100 (Figure 1). The second clade included the two transposons in I. nil and I. nil 'Candy Pink'. Posterior probability (1.00) conferred strong support for this clade. The third clade, also well supported (0.98), was formed by the Tip100 sequences in I. indica and I. purpurea, Tip100-AB004906 and I. purpurea commercial varieties.

ITS variability
PCR amplification of ITS sequences was uniform and positive for all the species and varieties tested ( Table 1). The obtained PCR fragments matched the expected fragment size of 550 bp for the ITS1 and ITS2 spacers, and the 5.8S sequence.
Comparison among sequences indicated the largest divergence to be between Merremia tuberosa and I. quamoclit (36.2%). No sequence difference was observed between I. purpurea and the I. purpurea varieties (I. purpurea 'Kniolas Black Knight', 'Light Blue Star' and 'Split Personality') (Table 3).
ITS sequences appeared to be good markers for reconstructing the phylogenetic history of Ipomoea, since all the clades received highly satisfactory statistical support (Figure 2). Miller et al. (2004) proposed that Ipomoea is formed by two principal clades. Our results are in partial 462 Tip100 transposon in the genus Ipomoea The other species that were studied here, but were not included in the phylogenetic analysis done by Miller et al. (2004), are I. batatas, I. triloba and I. carnea. These three species appear to be basal to Clades I and II. The basal clade of I. batatas and I. triloba is strongly supported.

Discussion
Numerous transposable elements known to be involved in the process of variegation in Ipomoea, thereby leading to wide diversification in flower pigmentation, also represent an important evolutionary process. One of these elements is Tip100, which is inserted in the CHS-D gene. After extensive searches in the NCBI database with Blastn, Blastx and tBlastx, no similarities between the Tip100 sequence and other transposable elements came to light. Habu et al. (1998) classified Tip100 as a member of the Ac/Ds family (Kunze et al.,1997). However, according to more recent criteria for TE classification (e.g., Wicker et al., 2007), "two elements belong to the same family if they share at least 80% of sequence identity in their coding domain, or within their terminal repeat regions, or in both". Hence, Tip100 would not belong to the Ac/Ds family, since no close similarity was found between Tip100 and the other transposons of this family. Nevertheless, Tip100 sequences and structural characteristics clearly place this element in the hAT superfamily (Kempken and Windhofer, 2001;Rubin et al., 2001). Therefore, we propose that Tip100 belongs to a new TE family, which, to date, has only been observed in the genus Ipomoea.
Recently, Arensburger et al. (2011) undertook a rigorous phylogenetic analysis of the hAT superfamily. They discovered that this superfamily is formed by two large families, namely Buster and AC, and even indicated the existence of a third clade, maybe a new family, which currently contains only three members, viz., Tip100 of Ipomoea and two Tip100-related sequences, one from a hydra (H. magnipapillata) and the other from zebrafish (Danio rerio). These findings give to understand that this possibly new family may be widely distributed.
The species included in this study are representatives of three Ipomoea subgenera. The well-supported, mutual phylogenetic relationships established by ITS analysis are congruent with other studies based on morphological and molecular data (McDonald and Mabry, 1992;Austin and Huáman, 1996;Austin and Bianchini, 1998;Miller et al., 1999Miller et al., , 2002Miller et al., , 2004Stefanovi et al., 2003), whereby, I. batatas, I. triloba and I. carnea were identified as members of the subgenus Eriospermum, I. purpurea, I. nil and I. indica as members of the subgenus Ipomoea, and I. cairica, I. coccinea, I. quamoclit and I. alba as part of the subgenus Quamoclit. In the present analysis, we found that I. alba is more closely related to species of the subgenus Ipomoea, rather than to Quamoclit, as previously proposed by Miller et al. (1999;2004). Furthermore, I. cairica, formally in the subgenus Ipomoea, appears as outgroup to the clade formed by the subgenera Quamoclit and Ipomoea, although the statistical support for this branch (0.82) is less than for the other branches.
There is significant consistency between the phylogeny built with species with Tip100 sequences and the one constructed with ITS data, the latter including more species representing the host phylogeny. As Tip100 was found only in a restricted clade in the ITS phylogeny (Figure 2, Clade II), we propose that this element was present in an ancestor of these related species, thereby implying that Tip100 was vertically transferred during evolution of the genus. Thus, it was more effectively maintained in the subgenus Ipomoea, were it apparently remains more conserved. Although, Tip100 was more divergent in I. alba, this is consistent with its basal position in relation to the subgenus Ipomoea, possibly through more available time to diverge from other species of Ipomoea subgenus. Nevertheless, why Tip100 is restricted to only one cluster in Ipomoea is unknown A possible explanation for the emergence of this element in this species could be horizontal transfer of Tip100 from an unknown donor to an ancestor of I. alba, I. indica and I. nil, and the I. purpurea cluster (Clade II, Figure 2). Horizontal transfer of TEs has been recognized as an important evolutionary force in eukaryotes (Keeling and Palmer, 2008), although few examples have been encountered in plants (Diao et al., 2006;Roulin et al., 2009). As an alternative explanation, the element was present in all the other clusters of the genus Ipomoea, but could have been stochastically lost.
A more plausible explanation for this peculiar TE occurrence could be the presence of Tip100 sequences in other species of the genus that have diverged throughout the evolution and expansion of these plants, since the evolutionary history of the genus Ipomoea is relatively recent, i.e.,approximately 35 to 40-million-years, as calculated by molecular clock inference (Clegg and Durbin, 2003). Hence, additional studies are required to determine whether TE arrival in this genus was through horizontal transfer, or whether it is an ancient genome component.