Analysis of energetically biased transcripts of viruses and transposable elements

RNA interference (RNAi) is a natural endogenous process by which double-stranded RNA molecules trigger potent and specific gene silencing in eukaryotic cells and is characterized by target RNA cleavage. In mammals, small interfering RNAs (siRNAs) are the trigger molecules of choice and constitute a new class of RNA-based antiviral agents. In an efficient RNAi response, the antisense strand of siRNAs must enter the RNA-induced silencing complex (RISC) in a process mediated by thermodynamic features. In this report, we hypothesize that silent mutations capable of inverting thermodynamic properties can promote resistance to siRNAs. Extensive computational analyses were used to assess whether continuous selective pressure that promotes such mutations could lead to the emergence of viral strains completely resistant to RNAi (i.e., prone to transfer only the sense strands to RISC). Based on our findings, we propose that, although synonymous mutations may produce functional resistance, this strategy cannot be systematically adopted by viruses since the longest RNAi-refractory sequence is only 10 nt long. This finding also suggests that all mRNAs display fluctuating thermodynamic landscapes and that, in terms of thermodynamic features, RNAi is a very efficient antiviral system since there will always be sites susceptible to siRNAs.

RNA interference (RNAi) is a process by which double-stranded RNA molecules (dsRNAs) trigger potent, specific post-transcriptional gene silencing in eukaryotic cells and is characterized by target RNA cleavage (Fire et al., 1998;Martinez et al., 2002). RNAi has been successfully applied in reverse genetics (for gene function identification), biotechnology (development of genetically modified plants and animals) and experimental therapeutics (by silencing key replication genes in pathogens) (Castanotto and Rossi, 2009). Some of the natural roles of RNAi include controlling viral replication (Ding, 2010) and transposon mobilization within the cellular environment (Levin and Moran, 2011).
Small interfering RNAs (siRNAs), the triggers of choice for gene knockdown in mammalian cells, consist of two short (21 nucleotides, nt; Figure S1) RNA strands that promote effective cleavage of target RNAs (Elbashir et al., 2001). Once a viral sequence (a gene or the complete genome) is available, specific siRNAs can be designed against it to control pathogen replication in vitro or in vivo.
The basic mechanism of siRNA action has been established. Inside the infected cell, siRNAs are transferred to an endogenous protein complex known as RISC (RNAinduced silencing complex) where one strand is cleaved (referred to as the passenger strand) while the other is retained intact (referred to as the guide strand). RISC then patrols the cytoplasm to search for transcripts with perfect complementary sequences to the guide strand, after which the identified RNA targets are then cleaved, i.e., silenced (Castanotto and Rossi, 2009).
Efficient gene silencing requires that the antisense RNA strand (complementary to the targeted transcript) be the guide strand. This strand choice is not random in the siRNA pathway since the strand with the thermodynamically more unstable 5' terminus is preferentially retained in the RISC (Khvorova et al., 2003;Schwarz et al., 2003). This characteristic is determined by the four base pairs (bp) at the two termini of the siRNA (hereafter simply referred to as 'termini'). The energy difference between these termini determines which strand will be the guide strand.
To date, three resistance mechanisms are known for artificial siRNA-based agents: 1) virally-encoded suppressor proteins (Hao et al., 2011), 2) negative regulators such as eri-1 protein, non-toxic concentrations of cadmium and low temperatures (Kameda et al., 2004), and 3) point mutations in the targeted transcript (Holz et al., 2012). Suppressor proteins and negative regulators can sequester/cleave siRNAs or hinder their biosynthesis, whereas point mutations hinder target cleavage by RISC because of imperfect base pair complementarity. In the latter situation, resequencing the targeted site allows the design of a new effective siRNA (restoration of perfect base pair complementarity).
Viruses are under selective pressure since RNAi is a natural antiviral system. We therefore reasoned that under such conditions viruses could escape through a different mechanism, i.e., synonymous mutations that invert the thermodynamic energy within the target site ( Figure S1). This special case differs entirely from a simple point mutation since re-sequencing of the targeted site would not allow the design of an effective siRNA because the guide strand would be the sense strand.
We decided to investigate whether, under hypothetical continuous selective pressure, viral genomes could evolve to develop a biased energy gradient along their entire sequences, i.e., be able to transfer only the sense strands to RISC ( Figure S2). If this were possible then such viruses would be refractory to RNAi defense systems. To test this hypothesis, we undertook computational analyses with a newly developed computer program designed to generate energetically biased transcripts (EBTs), i.e., an RNA from which all derived siRNAs tend to transfer the sense strand to RISC.
One means of generating an EBT is to align siRNA termini in a biased fashion ( Figure S2). Initially, to evaluate 4-mer terminus length sequences (the natural size of siRNA termini), we calculated the thermodynamic values based on Khvorova et al. (2003) for all possible 1,024 5-mer RNA sequences. For this, let N be the vector of nucleotides {A, U, G, C}, Z the vector of 2-mer RNA sequences {AA, AC, AG, AU, CA, CC, CG, CU, GA, GC, GG, GU, UA, UC, UG, UU} and Q(Z x Z y ) the thermodynamic value for each 2-mer RNA within the 5-mer sequence. The algorithm initially evaluated whether this sequence showed increasing or decreasing order of Q values and then advanced one nucleotide N j forward in the 5-mer sequence (Z i+1 ). From this new nucleotide position, new Q values were calculated for all 2-mers backwards along the sequence (Z i-3 ...Z i+1 ). If, after these calculations, the sequence retained its increasing or decreasing ordering then the algorithm included a new nucleotide ahead of the sequence, otherwise it stopped. This analysis was done to include all of the nucleotides over time in N j (Figure 1). The algorithm was implemented in a computer program (T-ASSEMBLER) using Perl scripts and is available from: http://www.laboratoriomultiusuario.com.br/index.php?op-tion=com_content&view=article&id=75&Itemid=74.
We also determined the biased alignment by considering termini of different sizes, ranging from 4 to 10 base pairs. This upper limit was based on the fact that conventional siRNA consists of a central region of 19 bp such that the longest possible non-overlapping terminus is 9 bp in size. Alignments were done using T-ASSEMBLER after changing the corresponding parameters. The EBT sequences generated were translated in all possible six frames using the freeware program Gene Runner ver. 3.05 (Hastings Software, Inc.).
Our hypothesis was that EBTs would emerge from the selective constraint that RNAi exerted upon viruses and transposable elements. A BLAST search (Altschul et al., 1990) for short, nearly exact matches was used to determine whether our computer-generated EBTs were detectable in these two classes of genetic elements. In view of the time required for processing, only the longest generated EBT Secolin et al. 869 Figure 1 -Algorithm workflow. The algorithm calculates the free energy of each siRNA terminus (5 bp) in a stepwise fashion. For example, the free energy of the hypothetical terminus ACCGAU corresponds to the sum of the free energies of the following 2-mer RNA sequences (Z): AC+CC+CG+GA+AU. These Z values were originally described by Khvorova et al. (2003). Once all possible termini have their free energies determined, the algorithm aligns them based on nucleotide sequence and decreasing and increasing free energies ( (10 nt in length) was run in searches against viral sequences. The BLAST search was optimized by altering some parameters in which the virus name was indicated in "Organism Optional" and the "Low Complexity Regions" filter was removed. The viruses searched were human immunodeficiency virus (GenBank no. 11676), hepatitis C virus (GenBank no. 11103), rabies virus (GenBank no. 11292), influenza A virus (GenBank no. 11320) and human herpesvirus 1 (GenBank no. 10298). We tested seven sequences (all of which were the longest EBTs for conventional termini) for transposon-related hits in the human genome. The search was done by indicating 'RNA reference sequences (refseq_rna)' as the "Database", 'Homo sapiens' in "Organism Optional", 'transposon'/'retrotransposon' in "Entrez Query Optional" and removing the "Low Complexity Regions" filter. All other parameters were set as default. We considered only hits that displayed complete identity to the query.
Computational analyses using T-ASSEMBLER revealed that for 4 bp-termini the number of termini with unique nucleotide sequences was 1,024; the number of termini with unique free energy (G) values was 72 and the number of unique longest termini was 7. However, these EBTs can be no longer than ten nucleotides (Table 1). Each one of these decanucleotides consisted of the biased alignment of seven 4-bp termini.
As expected, an increase in terminus size was accompanied by increases in almost all of the other parameters, including the G values (Table 1). For 5-bp termini, for example, the number of termini with unique sequences quadruplicated, the number of domains with unique G values increased by 26 units, the longest aligned sequence was extended by 3 bp and the range of energy difference increased by 2.2 units (from 9.3 kcal/mol in 4-bp termini to 11.5 kcal/mol in 5-bp termini). In a hypothetical 9-bp termini scenario, although the number of termini with unique sequences expanded to over one million, the number with unique G values increased only to 285. More importantly, the longest aligned sequence was only 19 nt.
Only one of the seven longest EBTs (UUUUAGGGUG) identified in 4-bp termini was used as a query in the search for identical hits in GenBank. This sequence showed total identity to two sites ("envelope glycoprotein" and "truncated gag protein") in human immunodeficiency virus and to three sites in hepatitis C virus: (polyprotein gene, genomic sequence and E1-E2 region). Three other sites were identified in influenza virus A: hemagglutinin, neuraminidase and segment 4. Only partial identities were obtained with two other viruses: rabies virus (genomic sequence; 9 out of 10 nt) and human herpesvirus 1 isolate (UL4 gene; 9 out of 10 nt).
There were no perfect matches (hits) for transposable elements. Most hits were only partial (7-9 nt), as was the case for the "Homo sapiens piggyBac transposable element derived 3 (PGBD3)" for which the sequence UUUUACCCUC showed partial identity with TTTTACCCT. This set of seven EBTs frequently displayed hits with the same group of transposable elements, namely, Homo sapiens piggyBac transposable element derived (PGBD), SET domain and mariner transposase fusion gene, Homo sapiens tigger transposable element derived 2 and Homo sapiens harbinger transposase derived 1 (HARBI1). A search for retrotransposon sequences produced similar results, as shown in Table 2.
RNAi is a natural immune system (Ding, 2010) that acts by dicing the invading viral dsRNAs. The result is a pool of virus-derived siRNAs that in turn is used to load RISCs against this same replicating pathogen until it is eliminated. Alternatively, synthetic siRNAs may be designed and administered as an antiviral therapy.
A careful analysis of literature reports yielded experimental data in accordance with our basic hypothesis. One of these examples involved hepatitis B virus (HBV) that 870 Escaping RNAi via silent mutations was isolated from patients, multiplied in cell cultures and challenged with an RNAi-based treatment (Wu et al., 2005). A plasmid expressing a short hairpin RNA (shRNA) was designed against a conserved region encoding two overlapping genes ("Pol" and "S") in the same strand (frame +1 and +3). This process led to the selection of a resistant variant after a few cycles of replication in vitro. Sequencing of these clones revealed that they displayed a single point mutation in the 3' terminus of the target site (corresponding to the 5' terminus of the antisense strand of the siRNA). More intriguingly, this mutation was silent for both open reading frames and thus affected the fifth nucleotide in the terminus.
Calculation of the free energy associated with a 4-bp terminus requires the use of five base pairs (Khvorova et al., 2003). When the specific mutation within the HBV sequence was analyzed the original siRNA used to control viral replication was found to have a DG of -0.2 kcal/mol (DG = the G value of the antisense strand minus the G value of the sense strand). This value was close to zero, indicating that both sense and antisense strands had nearly the same chance of being retained in the RISC. However, the point mutation created a DG of -1.0 kcal/mol that led to retention of the sense strand in RISC. This report by Khvorova et al. (2003) provided very clear experimental evidence of the basic principle investigated here, namely, inversion of the thermodynamic features by a silent mutation in viruses escaping siRNA-based therapies.
Since RNAi exerts selective pressure on viruses and transposable elements it would be expected that, in the long term, these synonymous mutations would convert their genomes into inverted energetically-biased sequences (EBTs). Such viruses would be refractory to RNAi. To test this hypothesis, we used computational analyses to generate EBTs and verify whether they occurred within viral and transposable element-related sequences.
Although the number of termini with unique sequences for conventional 4-bp termini was large (1,024), the longest EBTs were restricted to only ten nucleotidesshorter than a single siRNA. Consequently, it is impossible to have viral transcripts (or genomes) that are refractory throughout their entire sequences. If the siRNA structure were different, i.e., termini 5-9 bp in length, the number of unique sequences would be 4,096 and 1,048,576 respectively. However, although this parameter increases geometrically, all the others expand arithmetically (Table 1). Thus, even in an extreme situation in which the siRNA terminus was extended to its maximum length (half of the entire structure -9 bp), EBTs would be not longer than 18 bp. Secolin et al. 871   Consequently, although synonymous mutations capable of inverting terminal energetics represent a potentially new mechanism of resistance, they can be used for only a limited number of sites within viral transcripts, without systematically covering the entire viral genome. Based on these considerations, we predict that a considerable number of sites in a pathogen's genome will always remain susceptible to siRNAs as it is not possible to create a long EBT. Hence, in terms of thermodynamics, RNAi is indeed a very efficient antiviral system.
One of the EBTs identified (UUUUAGGGUG) was found in all of the viruses analyzed (some of the most relevant to human medicine), with all of the viruses having 2-3 sites with full identity to this sequence. Additionally, all seven decanucleotide EBTs displayed hits to exactly the same restricted group of transposable elements. Although these sites were identified, they did not correspond to RNAi-refractory domains since they were shorter than a single siRNA (21 nt). These regions probably emerged as a result of other evolutionary forces rather than selection for refractoriness to gene silencing.
Previous work has shown that several RNAs display "thermodynamic landscapes" that continuously fluctuate between positive and negative values (Pereira et al., 2007). These landscapes do not exhibit long regions of uninterrupted ascending (or descending) values, i.e., a thermodynamic gradient, and the positive/negative values correlate directly with the accessibility/inaccessibility to siRNAs. Consequently, all of the RNAs analyzed displayed amenable/refractory regions to RNAi.
The findings of the present study agree with the observation in the preceding paragraph. Since our computational analyses revealed that it was not possible to align thermodynamic domains over long extensions, i.e., it was impossible to generate long EBTs, then RNAs will naturally fluctuate between positive and negative thermodynamic values. More importantly, since we undertook extensive, complete alignment procedure, our current findings generalize the previous observations by providing evidence that no long EBTs exist and that all RNAs therefore have fluctuating thermodynamic landscapes. In thermodynamic terms, all RNAs display sites amenable to RNA interference.
It is noteworthy that greater differences between the thermodynamic stabilities of termini result in a trend of only one strand preferentially entering RISC. However, cases of miRNA precursors in which both strands (5' and 3') occur in RISC have been reported. This may occur because: (a) the difference between termini stabilities is close to zero (such that both strands tend to enter RISC) or (b) because accessory proteins help to load the unfavored strand. This situation does not invalidate our hypothesis since condition (a) is also takes into consideration termini thermodynamics and situation (b) may apply only to endogenous miRNA sequences and not to viral-derived siRNAs.
The first evolutionary resistance strategies may have been based on synonymous mutations that inverted the termini thermodynamics. However, as shown here, this explanation cannot be systematically applied to the entire genome so that other more elaborate strategies were required. Suppressor proteins, which represent a much more challenging evolutionary acquisition, must have emerged in the face of such pressure.
The new mechanism of resistance to siRNAmediated silencing proposed here is based on certain silent mutations that cause the inversion of thermodynamic features within the targeted site, leading to inefficient loading of RISC with the sense strand. We believe that this strategy could be adopted by viruses of any nature (singlestranded/double stranded viruses, DNA or RNA viruses and viruses that infect insects, plants or animals) since RNAi acts on messenger RNAs that are present in any viral life cycle.
An analysis of experimental data reported in the literature clearly supports the concept of synonymous mutations as a strategy to escape siRNAs. Although RNAi exerts selective pressure on viruses, our computational analyses revealed that this strategy cannot be systematically adopted by viruses since the longest possible EBT is only 10 nt long. Our data therefore do not support our initial hypothesis that viral genomes developed a biased energy gradient along their entire sequences. Nevertheless, our findings confirm previous observations and allow the generalization that all RNAs display fluctuating thermodynamic landscapes. They also reveal that, thermodynamically, RNAi is a very sophisticated antiviral system from which pathogens cannot systematically escape by synonymous mutation.