Regulatory elements involved in the post-transcriptional control of stage-specific gene expression in Trypanosoma cruzi: a review

Trypanosoma cruzi, a protozoan parasite that causes Chagas disease, exhibits unique mechanisms for gene expression such as constitutive polycistronic transcription of protein-coding genes, RNA editing and trans-splicing. In the absence of mechanism controlling transcription initiation, organized subsets of T. cruzi genes must be post-transcriptionally co-regulated in response to extracellular signals. The mechanisms that regulate stage-specific gene expression in this parasite have become much clearer through sequencing its whole genome as well as performing various proteomic and microarray analyses, which have demonstrated that at least half of the T. cruzi genes are differentially regulated during its life cycle. In this review, we attempt to highlight the recent advances in characterising cis and trans-acting elements in the T. cruzi genome that are involved in its post-transcriptional regulatory machinery.

Trypanosoma cruzi is an intracellular parasite that is transmitted by at least 40 different blood-sucking triatomine species to over 1,000 mammalian species (Brener et al. 2000). This parasite must adapt to enormous changes in its extracellular milieu, such as changes in environmental temperature as well as in the available nutrients inside each host. The parasite has a complex life cycle that is characterised by four stages; epimastigotes and metacyclic trypomastigotes are present in the insect vector, whereas intracellular amastigotes and bloodstream trypomastigotes are present in the mammalian host. Thus, this species must develop a broad set of molecular tools that allow it to multiply in the insect gut, to invade and multiply inside a large number of distinct mammalian cell types and to circumvent host immune defence systems. To meet such phenotypic plasticity, T. cruzi relies on unique mechanisms that can control the expression of its repertoire of about 12,000 genes. Because its genome is constitutively transcribed into long polycistronic primary transcripts, mRNAs for proteincoding genes must be processed through trans-splicing and polyadenylation reactions. The mRNAs must also interact with different protein factors in a complex posttranscriptional regulatory machinery that determines the levels of their protein product according to the cellular demands of the parasite in each stage of its life cycle.
In the same issue that the T. cruzi genome was published (El-Sayed et al. 2005a), a study describing its proteome was also reported (Atwood et al. 2005). Proteins extracted from whole-cell and subcellular lysates of the four stages of T. cruzi were analysed by mass spectrometry (epimastigotes, metacyclic trypomastigotes, amastigotes and trypomastigotes), which identified 2,784 proteins belonging to the 1,168 protein groups in the annotated T. cruzi genome. Although about 30% of the identified proteins were found at all life-cycle stages, at least 248 proteins were only expressed at one stage, thereby demonstrating significant changes in the relative abundance of T. cruzi proteins throughout its life cycle. One of the main findings in these proteomic analyses was that the four parasite stages use distinct energy sources (Atwood et al. 2005); intracellular amastigotes upregulated proteins involved in lipid-dependent energy metabolism, such as enzymes of the citric acid cycle, whereas enzymes capable of catalysing the conversion of histidine to glutamate were more abundant in the insect epimastigote stage. Heat-shock proteins (HSP) and proteins involved in vesicular trafficking were also preferentially detected in amastigotes. Furthermore, enzymes involved in antioxidant defence were upregulated during the transformation of epimastigotes into invasive metacyclic trypomastigotes, whereas bloodstream trypomastigotes upregulated the surface expression of several large gene families that are known to be involved in interacting with the mammalian host (Atwood et al. 2005).
In agreement with this proteomics data, global genomic analyses using microarray technology partially confirmed by real-time polymerase chain reaction (RT-PCR) showed that transcript levels of at least 50% of the T. cruzi genes are significantly regulated during its life cycle (Minning et al. 2009). When these authors compared genes that were upregulated in only two stages, they found that 76% of the transcripts were upregulated similarly in stages that exist in the same host (vertebrate host -amastigotes and bloodstream trypomastigotes; insect host -epimastigotes and metacyclic trypomastigotes). Likewise, stages with similar biological functions shared a 20% overlap in gene upregulation. For in-stance, genes involved in DNA repair were upregulated in epimastigotes and amastigotes, which are the only dividing stages (Minning et al. 2009). Although these analyses demonstrate that the expression of most of the T. cruzi genome is under stringent control throughout its life cycle, relatively few studies have investigated the mechanisms that regulate stage-specific gene expression. Here, we review some of these studies and attempt to highlight the most recent advances in characterising cis and trans-acting elements in the genome that have been identified as part of the parasite posttranscriptional regulatory machinery.
T. cruzi genome and gene expression mechanisms -Sequencing of the T. cruzi CL Brener diploid genome, estimated to be 110.7 Mb, was completed in 2005(El-Sayed et al. 2005a. Because it is a hybrid genome with a repetitive content of 50%, the CL Brener genome sequence has not been fully assembled and is instead represented by a redundant dataset distributed through several contigs, which is in contrast to the genomes of Trypanosoma brucei and Leishmania major that were published in the same issue of Science (Berriman et al. 2005, Ivens et al. 2005. Because homologous regions that display high levels of polymorphism were assembled separately, sequences from most of the genome are found in two contigs, each one corresponding to one haplotype (see www. TriTrypDB.org). More recently, Weatherly et al. (2009) generated consensus versions of each homologous chromosome pair using synteny maps for the T. brucei chromosomes. Ultimately, 41 chromosomes were assembled, which is well above the predicted number of T. cruzi chromosomes based on the early studies of pulsed-field gel electrophoresis analyses (Cano et al. 1995, Vargas et al. 2004. Although separated by hundreds of millions of years of evolution, genome organisation in T. cruzi is largely syntenic with the other two trypanosomatid genomes, collectively known as the Tri-Tryp genomes (T. brucei and L. major), and most of the species-specific genes, such as large surface protein gene families, occur at non-syntenic chromosome-internal and subtelomeric regions (El-Sayed et al. 2005b).
Similar to the two other Tri-Tryp genomes, the protein-coding genes are organised into long polycistronic transcription units. Transcription is initiated bi-directionally between two divergent gene clusters (Martínez-Calvillo et al. 2003 to produce polycistronic pre-mRNAs that are subsequently processed (Figure). With the exception of the spliced leader (SL) promoter, no promoter that is recognised by RNA polymerase II has been identified, and only a few transcription factors have been described (Cribb & Serra 2009, Cribb et al. 2010). Notably, even though orthologs for all of the conserved components of the RNA polymerase II complex have been identified in the Tri-Tryp genome (Ivens et al. 2005), the transcription of two trypanosomatid genes, variant surface protein and procyclin genes in T. brucei, as well as several exogenous genes that are transfected into T. cruzi, is mediated by RNA polymerase I (Palenchar & Bellofatto 2006). Once the polycistronic pre-mRNA is produced, the following two coupled reactions allow for the generation of mature monocistronic transcripts: trans-splicing and polyadenylation (Teixeira & DaRocha 2003). Every trypanosomatid mature mRNA possesses a SL sequence, which is a capped sequence that contains the same 39 nucleotides on the 5' end, and this SL is attached to the transcript in a process called transsplicing (Liang et al. 2003). While no sequence consensus for polyadenylation or SL addition have been found in trypanosomatids, several studies have demonstrated that polypyrimidine-rich regions within intergenic regions guide SL addition and the polyadenylation of upstream and downstream genes, respectively, resulting in the generation of mature mRNAs (Hartmann et al. 1998, López-Estraño et al. 1998. The sequence requirements for T. cruzi, T. brucei and Leishmania mRNA processing were initially investigated by comparing expressed sequence tags and/or cDNAs with genomic sequences (Benz et al. 2005, Campos et al. 2008, Smith et al. 2008 and, more recently, they have been investigated using deep sequencing RNA (Kolev et al. 2010, Nilsson et al. 2010, Siegel et al. 2010. Campos et al. (2008) reported the average distance from the polypyrimidine tract to the SL addition site and to the polyadenylation site as well as the median lengths of the 5' and 3' untranslated regions (UTR) in T. cruzi genes. The median lengths of the 5' and 3' UTR sequences in T. cruzi were measured to be 35 and 264 nucleotides, respectively (Campos et al. 2008). These predictions became useful tools for optimising transfection vectors in trypanosomatids. Similar analyses of the composition of the UTRs from T. cruzi genes have revealed that simple sequence repeats can account for up to 20% of the nucleotide composition in the 5′ UTR and up to 90% in the 3′ UTR (Brandão & Jiang 2009). Kolev et al. (2010) used next-generation sequencing, or RNA-seq, to produce a single-nucleotide resolution genomic map of the T. brucei transcriptome, which re-mRNA transcription and processing in trypanosomatids. Genes clustered in the genome are transcribed as polycistronic pre-mRNAs and processed by trans-splicing and polyadenylation reactions. Polypyrimidine tracts present in intergenic regions guide the insertion of a capped-spliced leader (SL) sequence at 5' end and the poly-A tail at the 3' end of transcripts, generating monocistronic mature mRNAs that accumulate at different levels in the cytoplasm. vealed 1,114 new transcripts including 103 untranslated regions (ncRNAs). In addition, Nilsson et al. (2010) discovered more than 2,500 alternative splicing events in T. brucei, many of which appear to be under stage-specific regulation. Similar analyses of the T. cruzi transcriptome are currently underway and these studies will not only offer a unique opportunity to establish a complete catalogue of all parasite mRNAs with a precise determination of their relative levels and their 5' and 3' processing sites but may also reveal new and important RNA molecules, such as ncRNAs.
One of the key characteristics that was revealed from sequencing the complete T. cruzi genome was the dramatic expansion of gene families that encode surface proteins. Heterogeneous arrays as large as 600 kb for genes that encode surface proteins were found to be clustered in the T. cruzi genome (El-Sayed et al. 2005b). In addition, long terminal repeat (LTR) and non-LTR retro-elements as well as other subtelomeric repeats accounted for the large proportion of repetitive sequences (50%) in this genome. The largest protein gene family in this genome encodes a group of surface proteins known as trans-sialidases (TS), with 1,430 members. TS are surface molecules that have been identified as virulent factors in T. cruzi and are responsible for transferring sialic acid from sialoglycoconjugates in the host to parasite mucins. Mucin-associated surface proteins (MASP) represent the second largest T. cruzi gene family with a total of 1,377 members. Although MASP sequences account for ~6% of the parasite diploid genome, they were only identified during the annotation of the T. cruzi genome. MASPs are glycosylphosphatidylinositol (GPI)-anchored surface proteins that are preferentially expressed in trypomastigotes and are characterised by highly conserved N and C-terminal domains and strikingly variable and repetitive central regions (Bartholomeu et al. 2009). Together with the mucin and GP63 gene families, these four gene families account for ~17% of the protein-coding genes. They are organised in clusters of tandem and interspersed repeats. Other large gene families include the previously described retrotransposon hot spot and dispersed gene family protein-1 families that encode for proteins with unknown functions and are found mostly at subtelomeric regions, similar to the TS genes. The T. cruzi genome also contains large gene families that encode glycosyltransferases, protein kinases, protein phosphatases, kinesins, amino acid transporters and helicases as well as several gene families that encode for hypothetical proteins (El-Sayed et al. 2005b). In addition to the evasion of the host immune system, the existence of highly repetitive genes in the T. cruzi genome may also serve to increase gene expression levels in the absence of strong promoters.
Stage-specific gene expression and cis-acting regulatory elements in T. cruzi -As a consequence of polycistronic transcription and the lack of typical RNA polymerase II promoters, the expression of most trypanosomatid genes is regulated at the post-transcriptional level. Regulatory sequences that are present in the UTRs of different genes, mainly in the 3' UTRs, act as protein-binding sites, and these sequences are key ele-ments in modulating individual mRNA levels during the parasite life cycle (De Gaudenzi et al. 2003). Before the availability of the whole genome, a combination of experimental approaches, such as parasite transfection with reporter genes and studies on a select group of T. cruzi genes, were used to identify various elements in the 3' UTRs that control mRNA abundance in response to changes in the parasite life cycle. That group includes members of the TS gene family (Weston et al. 1999, Jäger et al. 2008, Gentil et al. 2009), amastins (Coughlin et al. 2000, alpha and beta tubulins (Bartholomeu et al. 2002, da Silva et al. 2006, mucins (Di Noia et al. 2000) and HSP ). Table provides a list of genes that have fully or partially characterised regulatory sequences. A more detailed description for some of these genes is given below.
TS are enzymes that catalyse the transfer of sialic acid from host cells to mucins located on the parasite surface membrane. Incorporation of sialic acid protects the parasite from the host immune system and facilitates the adhesion of Trypanosoma to mammalian host cells (Frasch 2000). The multigenic superfamily that encodes TS members can be classified into three groups, as previously described (Frasch 2000). TS expressed by epimastigotes, denoted eTS, have trans-sialidase activity but lack the SAPA repeat domain (Briones et al. 1995). TS expressed by trypomastigotes (tTS) have SAPA repeats and trans-sialidase activity, except for some members that possess a single mutation (Y to H) that encodes the enzyme without TS activity (Cremona et al. 1995). Using a luciferase reporter system, we have recently demonstrated that adding the 3' UTR from a tTS gene, which exhibits higher expression in trypomastigotes, resulted in increased luciferase activity in trypomastigotes without a corresponding increase in luciferase mRNA levels. These data indicate that this 3' UTR contains elements that mediate mRNA-specific translational control (Araújo et al. 2011).
FL-160 proteins belong to the tTS group and localise to the trypomastigote flagellum and flagellar pocket. Although transcription of FL-160 genes occurs constitutively, their transcripts are more abundant in trypomastigotes due to longer mRNA stability at this stage, and this has also been observed for other T. cruzi genes. Elements in the 3' UTR of FL-160 have been shown to regulate the differential expression of their mRNAs in transfection experiments (Weston et al. 1999). Analyses of 3' UTRs from other TS mRNAs showed that the 3' UTRs of tTS and eTS transcripts are highly conserved within each group but differ greatly between groups (Jäger et al. 2008). Similar to the results obtained by our group, the 3' UTRs of tTS mRNAs (but not eTS mR-NAs) promote high green fluorescent protein (GFP) expression in trypomastigotes and amastigotes when the parasites were transfected with a vector containing GFP cloned upstream of either a eTS or a tTS 3' UTR (Jäger et al. 2008). Although these studies suggest that the 3' UTRs of these TS genes contain elements that can control mRNA levels and translation in a stage-specific manner, the precise regulatory sequence and the exact mechanisms involved remain elusive. Avila et al. (2001) showed that polysomal mobilisation of mRNA is also important in controlling the expression of stage-specific genes. These authors analysed the differential expression of T. cruzi genes during metacyclogenesis and showed that although the mRNA of the metacyclogenin gene is expressed at similar levels in replicating and differentiating epimastigotes, there was a marked increase in mRNA associated with polysomes as well as an increase in protein levels in differentiating parasites.
Gp82, another member of the TS family (El-Sayed et al. 2005a), is recruited during host cell invasion to activate signalling cascades and efficiently trigger Ca 2+ signals in metacyclic trypomastigotes and host cells (Yoshida et al. 2000). Gp82 expression has been shown to be greatly reduced in the non-virulent T. cruzi strain, CL-14, thereby supporting the hypothesis that this protein is required for cell invasion (Atayde et al. 2004). Quantitative RT-PCR and dot-blot hybridisation experiments have shown that metacyclic trypomastigotes upregulate gp82 mRNA (Songthamwat et al. 2007, Gentil et al. 2009). In contrast to nuclear run-on assays, which showed that the transcriptional rates of gp82 genes were similar between epimastigotes and metacyclic trypomastigotes, northern blot assays showed that the half-life of gp82 mRNA is dramatically reduced in epimastigotes. In addition to mRNA stability, gp82 expression also seems to be augmented by increased translational efficiency, similar to the metacyclogenin mRNA, because gp82 mRNA was preferentially associated with polysomes in metacyclic trypomastigotes but not in epimastigotes (Gentil et al. 2009).
Recently, a novel protein family, T. cruzi trypomastigote, alanine, serine and valine rich proteins (TcTASV), that is encoded by 38 copies in the CL Brener genome was described by sequencing an epimastigote-subtracted trypomastigote cDNA library (García et al. 2010). Most of the clones that were identified in the subtracted cDNA library encoded transcripts that were preferentially expressed in the trypomastigote stage of T. cruzi. Notably, all TcTASV members as well as other trypomastigotespecific genes contained a conserved 280-bp element in their UTRs. However, it remains unclear whether this conserved element represents a regulatory signal.
Alpha and beta tubulins are the components of microtubules (Nogales et al. 1999), which are the sole cytoskeletal constituents in subpellicular microtubules, flagellar axoneme, the basal body and the mitotic spindle in trypanosomatids (Kohl & Gull 1998, Gull 2001). T. cruzi tubulin genes are organised as a cluster of alternating copies of alpha and beta tubulins (Maingon et al. 1988, Soares et al. 1989). Although transcription rates of tubulin genes are similar in epimastigotes and amastigotes, the alpha and beta tubulin transcript levels are three to six-fold higher in epimastigotes compared to trypomastigotes and amastigotes (Gonzalez-Pino et al. 1997, Bartholomeu et al. 2002 due to the increased mRNA half-life in epimastigotes (Bartholomeu et al. 2002). Alpha and beta tubulin mRNA levels and translation rates undergo a marked reduction during the differentiation of dividing epimastigotes into dividing metacyclic trypomastigotes (Rondinelli et al. 1986). Transient transfection assays using the luciferase gene cloned upstream of the alpha tubulin 3' UTR demonstrated that elements in this region are responsible for the increased stability of alpha tubulin mRNA in epimastigotes (da Silva et al. 2006); elements with similar functions may also be present in the 3' UTR of beta tubulin mRNA (Bartholomeu et al. 2002). Higher levels of unpolymerised tubulins were found in amastigotes and trypomastigotes, which suggests that there is an autoregulatory mechanism to induce tubulin mRNA destabilisation in these two stages of the parasite life cycle (da Silva et al. 2006). Furthermore, treatment of epimastigotes with vinblastine, a drug that prevents microtubule assembly, reduced alpha and beta tubulin mRNAs levels, whereas the levels of GAPDH mRNA were unchanged (Urményi et al. 1992, da Silva et al. 2006). In an attempt to identify regulatory elements involved in modulating the half-lives of alpha and beta tubulin transcripts in response to changes in microtubule dynamics, epimastigotes were transiently transfected with vectors containing the luciferase gene associated with different regions of these mRNAs. These analyses clearly indicated the presence of elements in the 3' UTR of alpha tubulin transcripts that are responsible for the increased stability of this mRNA in epimastigotes (Araújo et al. 2011). Moreover, these studies suggested the involvement of other sequences, possibly in the coding region, that acted as additional cis-acting elements to modulate tubulin mRNA stability in response to changes in microtubule dynamics (da Silva et al. 2006).
Amastins are surface glycoproteins that are most abundant in amastigotes ). The T. cruzi genome contains 12 amastin members (El-Sayed et al. 2005a), eight of which are arranged in a tandem alternative cluster with tuzin genes. Tuzin is a G-like protein whose mRNA level is constitutive during the T. cruzi life cycle (Teixeira et al. 1995). Although the transcription rate is similar between epimastigotes and amastigotes, amastin mRNA abundance is 50-fold higher in amastigotes compared to epimastigotes , which is a result of the longer amastin transcript half-life in amastigotes (Coughlin et al. 2000). Linker scanning mutagenesis identified a 203-bp element in the 3' UTR of amastin mRNA that conferred elevated stability to its transcript as well as to a luciferase reporter in the intracellular stage. Luciferase activity assays performed after transient transfection of amastigotes not only confirmed the presence of regulatory sequences in this 203-bp element, but also demonstrated that this element can act in an orientational and positional manner (Coughlin et al. 2000).
Mucins are surface glycoproteins that act as sialic acid receptors in reactions catalysed by TS (Buscaglia et al. 2006). They also act as highly effective stimulators of the innate immune response (Junqueira et al. 2010). Amastigotes and bloodstream trypomastigotes express a group of T. cruzi mucins that vary in size (80-200 kDa). Epimastigotes and metacyclic trypomastigotes express smaller mucins (35-50 kDa), named T. cruzi small mucin-like gene family (TcSMUG) (Frasch 2000), which are classified into two groups (S and L) based on their transcript size. These proteins have conserved N and Cterminal portions, but the central portion, as well as the 3' UTR of TcSMUG transcripts, is variable. Similar to all T. cruzi genes described so far, the transcription rates of TcSMUG genes are similar across the different parasitic stages, but mRNA stabilisation mechanisms result in increased TcSMUG mRNA levels in epimastigotes (Di Noia et al. 2000). AU-rich elements (ARE) are one of the most prominent cis-acting regulatory elements in the 3' UTRs of eukaryotic mRNAs (Gingerich et al. 2004) and they were found in the 3' UTRs of both S and L TcSMUG mRNAs. Di Noia et al. (2000) have shown that metacyclic trypomastigotes transfected with a vector containing the chloramphenicol acetyl transferase (CAT) reporter gene cloned upstream of an ARE-truncated version of the L TcSMUG 3' UTR exhibited higher CAT activity compared to the full-length L TcSMUG 3' UTR. This element was found to act as a destabilising cis-acting factor in metacyclic trypomastigotes because no difference was observed in CAT activity in epimastigotes transfected with either of these vectors (Di Noia et al. 2000). As discussed in the next section, TcUBP1, an RNA-binding protein (RBP), recognises the ARE site in TcSMUG mRNA and forms a large protein complex with TcUBP2 and the poly binding protein to control stage-specific expression by changing mRNA stability in T. cruzi (D'Orso & Frasch 2002).
Trypanosomatids are subjected to heat shock when they leave the insect vector and infect mammalian cells. The heat-shock response may be part of the differentiation process and may be a protection mechanism for parasite survival (McNicoll et al. 2006). HSPs are involved in protein translation, folding, unfolding, translocation and degradation. Analyses of the whole genomes of T. cruzi, T. brucei and Leishmania demonstrated the presence of various HSP families (Folgueira & Requena 2007). HSP70 binds to unfolded proteins and its activity is controlled by ATP cleavage. Heat shock has been shown to induce the expression of HSP70 and DnaJ proteins in T. cruzi (de Carvalho et al. 1990, Tibbetts et al. 1998. HSP70 mRNAs exhibited four-fold higher expression in epimastigotes after heat shock (Requena et al. 1992). This upregulation is associated with a twofold increase in the HSP70 mRNA half-life from 60 min in epimastigotes at 29ºC to 120 min in epimastigotes at 37ºC . The temperature-dependent regulation of HSP70 mRNA stability is dependent on both UTRs, as shown by CAT assays with protein extracts from epimastigotes that were transiently transfected with the CAT gene flanked by the HSP70 5' and/or 3' UTRs . A U-rich region in the 3' UTR of HSP70 mRNA was also identified as one of the cis-acting elements involved the stabilisation of RNA at 37ºC. Using in vitro degradation assays,  also demonstrated that RNAs containing the full length 3' UTR or the U-rich region transcribed in vitro were degraded after incubation with epimastigote extracts that had previously been incubated at 37ºC.

RNA-binding proteins are essential trans-acting elements for controlling T. cruzi gene expression -The
T. cruzi genome encodes a large variety of RBPs that likely play a major role in the post-transcriptional regulation of mRNA levels as well as other aspects of RNA metabolism (El-Sayed et al. 2005a). These RBPs may show diverse affinity and specificity for target RNAs and may mediate distinct protein-protein interactions through specific domains, thereby allowing them to be versatile regulators of gene expression. Despite the large number of examples illustrating the importance of the post-transcriptional control of mRNA levels and translational efficiency in T. cruzi and despite genomic data demonstrating the abundance of sequences that encode RBPs, only a few published studies have provided experimental evidence for the role of an RBP in controlling gene expression in T. cruzi (Kramer & Carrington 2011). Recent proteomic analyses of complexes that bind to polyadenylated mRNAs have identified several proteins that are known to be involved in mRNA metabolism, such as PABP1, CCR4, CAF1, NOT1, TIA-1 like protein, TcUBP1, TcPUF6 and TcDHH1. Protein complexes that contain these proteins were shown to interact with mRNAs derived from exponentially growing epimastigotes or epimastigotes under nutritional stress. During stress, alterations in these protein complexes could mediate mRNA storage or degradation to reduce translational rates ). Among the limited number of RBPs involved in the T. cruzi mRNA metabolism that have been characterized are poly(A)-binding protein (Batista et al. 1994), TcUPB1 (D'Orso & Frasch 2001, SLRBP (Xu et al. 2001), Pumilio proteins (PUF) (Dallagiovanna et al. 2005, Caro et al. 2006, zinc finger proteins (Espinosa et al. 2003, Mörking et al. 2004) and the RNAhelicase TcDHH1 (Holetz et al. 2007).
Transcripts that are not translated or targeted for degradation are directed to "P-bodies" (processing bodies) and stress granules to compartmentalise the mRNA pool, which is also a mechanism for post-transcriptional control of gene expression (Kulkarni et al. 2010). TcD-HH1 is a DEAD-box RNA helicase that can also localise to cytoplasmic stress granules (Holetz et al. 2007) and proteins that are involved in translation, metabolism, cytoskeleton assembly or heat-shock defence can associate directly or indirectly with TcDHH1. Immunoprecipitation with an antibody against TcDHH1 combined with microarray hybridisation demonstrated that TcDHH1associated transcripts in epimastigotes are mainly expressed in other stages of the parasite life cycle, such as MASP and amastin mRNAs ). In addition, tandem-affinity purification and mass spectroscopy analyses have shown that a Pumilio protein, TcPUF6, participates in a protein complex containing TcDHH1 (Dallagiovanna et al. 2008).
PUF proteins are characterised by the presence of eight PUF repeats, each of which consists of 40 amino acids (Edwards et al. 2001, and have been found in all analysed eukaryotic organisms (Wickens et al. 2002, Quenault et al. 2011. The repeat region is essential and sufficient to bind specific RNA sequences. The outer surface of these proteins can interact with different proteins while the inner surface is composed of aromatic and charged amino acids that bind RNA (Edwards et al. 2001, Wickens et al. 2002. The T. cruzi genome encodes 10 PUF proteins (Caro et al. 2006), which are also present in T. brucei and Leishmania infantum (Luu et al. 2006, Folgueira et al. 2010. TcPUF1 and TcPUF2 share many similarities with higher eukaryotic PUF proteins, whereas TcPUF7, TcPUF8 and TcPUF10 are more divergent. In silico analyses and triple-hybrid assays have indicated that T. cruzi PUF proteins can be classified into the following two subgroups: (i) those that can bind sites nanos responsible elements in Drosophila, such as TcPUF1 and TcPUF2 and (ii) those that may interact with UGUR sequences, such as TcPUF3, TcPUF4, TcPUF5, TcPUF6 and TcPUF9.
TcPUF6 is a constitutively expressed protein that is found in discrete cytoplasmic foci and does not associate with polysomes or ribosomes. Several RNAs have been identified by the tandem affinity purification (TAP) methods to interact with TcPUF6. TAP performed on epimastigotes that overexpressed TcPUF6 demonstrated that approximately 270 transcripts that interact with TcPUF6 exhibited decreased levels, such as mRNAs encoding proteases, kinases and transporters. These results indicate that TcPUF6 may be a destabilising factor that downregulates mRNA levels in a specific manner (Dallagiovanna et al. 2008). Importantly, the interaction between TcPUF6 and TcDHH1 was only observed in epimastigotes but not in metacyclic trypomastigotes (Dallagiovanna et al. 2008).
Proteins containing an RNA recognition motif (RRM) are the most abundant RBPs in eukaryotes (Cléry et al. 2008). The RRM domain can bind to cisacting elements ranging from two-eight nucleotides in length, and proteins with this domain exhibit a large range of affinities and specificities. RRMs are found in proteins involved in various post-transcriptional events, such as RNA processing, transport, translation, degradation and stability (Maris et al. 2005). However, few RRM-containing proteins have been characterised in T. cruzi. TcRBP19 is a 17-kDa protein with a single RRM and it is preferentially expressed in amastigotes at low levels (Pérez-Díaz et al. 2007). Electrophoretic mobility shift assays revealed that TcRBP19 exhibits an affinity for cytosine-rich sequences. TcRBP28 is an RBP that contains an Ala-Lys-Pro-rich repetitive motif at its Nterminus, which is also present in other T. cruzi proteins. Notably, TcRBP28 was recognised by antibodies from patients with Chagas disease (Pais et al. 2008). Although its function in T. cruzi is unknown, it shares homology with two nuclear RBPs that are involved in the nuclear export pathway in T. brucei, p34 and p37 (Hellman et al. 2007). The best-characterised RRM-proteins in T. cruzi belong to the TcUBP family, which consists of six distinct members that contain a single RRM as well as different auxiliary domains that may be involved in proteinprotein interactions. The 3' UTR of TcUBP mRNAs only share 25% homology, thereby suggesting that these proteins are differentially expressed during the T. cruzi life cycle (De Gaudenzi et al. 2003). In fact, TcUBP1 is more highly expressed in amastigotes and trypomastigotes, which is also the case for TcUBP6, the most divergent member of the family. TcUBP5 is more highly expressed in trypomastigotes, whereas TcUBP2 and TcUBP3 are more highly expressed in epimastigotes compared to the other life cycle stages (De Gaudenzi et al. 2003).
TcUBP1 is the first trans-acting element that was identified in trypanosomatids, and it recognises the ARE in the 3'UTR of TcSMUG mRNAs as well as GUrich sequences. Overexpression of this protein in trypomastigotes resulted in a decreased TcSMUG transcript half-life (D'Orso & Frasch 2001). Together with TcUBP2 and polyadenylation binding protein, TcUBP1 is part of a ribonucleoprotein (mRNP) complex that recognises Urich motifs (D'Orso & Frasch 2002). RNA-protein interactions were analysed by mRNP immunoprecipitation with specific antibodies against each RRM-type protein, which identified 24 mRNAs that encoded known proteins as well as 10 hypothetical genes. The mRNA that associated with TcUBP1 was found to encode mu-cin, amastin and proteases, such as cruzipain and GP63, as well as members of the TS family. In contrast, the transcripts that interacted with TcUBP3 were found to encode mainly ribosomal proteins (Noé et al. 2008). A stem-loop structure of 30-35 bases was identified in the majority of TcUBP-associated mRNAs and this element was more frequently represented in the 3' UTRs of mRNAs and may represent the TcUBP1 binding site (Noé et al. 2008). Although TcUBP1 has been shown to be predominantly cytoplasmic (De Gaudenzi et al. 2003), all TcUBP members were found to associate with mRNA granules in stressed epimastigotes, suggesting that T. cruzi makes use of mRNA granules for transient transcript protection (Cassola et al. 2007). Low levels of TcUBP1 and TcUBP2 have also been found in the nucleus under normal conditions and these proteins have been shown to accumulate in the nucleus with their target mRNAs under arsenite stress (Cassola & Frasch 2009).
In addition to RBPs, distinct subclasses of noncoding RNAs (ncRNAs) have been found to play im-have been found to play important regulatory roles in gene expression in eukaryotes. A recent comparative genomic analysis revealed several novel ncRNAs that are conserved across multiple trypanosomatid genomes (Doniger et al. 2010). Although the T. cruzi genome lacks most elements of the RNA interference machinery (DaRocha et al. 2004, El-Sayed et al. 2005a, the existence of ncRNA in this parasite has not been ruled out yet. The search for functional ncRNAs in T. cruzi is currently underway in a few laboratories and this may add yet another layer of complexity to this multifaceted system for controlling gene expression in this parasite.
Conclusions and future perspectives -In the last 10 years, there have been major advances in understanding the mechanisms that control gene expression in T. cruzi due to the availability of its sequenced genome and to the improvements in protocols used for the genetic manipulation of this parasite. After initial studies demonstrating that individual genes are differentially expressed during the T. cruzi life cycle, genome sequence analyses have revealed several RBPs that are involved in many aspects of RNA metabolism, such as regulating T. cruzi mRNA stability and translation. These proteins are now being characterised in much more detail using a combination of bioinformatics and genetic manipulation approaches. With the advent of next-generation sequencing technologies, such as RNA-sequencing, we will enter into a new era providing more powerful methodologies to study gene expression, such as genome-wide studies within a single experiment. Therefore, we can expect to have a much more complete analysis of the complex transcriptional landscape of the T. cruzi genome, which will include the discovery of new genes as well as the generation of a comprehensive list of genes with stage-specific expression, differential splicing, RNA editing and allele-specific expression. Moreover, these new methodologies may also allow for the identification of new elements, such as ncRNAs, which have already been described in other organisms as trans-acting regulators of gene expression.