Identification and characterization of two critical sequences in SV40PolyA that activate the green fluorescent protein reporter gene

Alu repeats or Line-1-ORF2 (ORF2) inhibit expression of the green fluorescent protein (GFP) gene when inserted downstream of this gene in the vector pEGFP-C1. In this work, we studied cis-acting elements that eliminated the repression of GFP gene expression induced by Alu and ORF2 and sequence characteristics of these elements. We found that sense and antisense PolyA of simian virus 40 (SV40PolyA, 240 bp) eliminated the repression of GFP gene expression when inserted between the GFP gene and the Alu (283 bp) repeats or ORF2 (3825 bp) in pAlu14 (14 tandem Alu repeats were inserted downstream of the GFP gene in the vector pEGFP-C1) or pORF2. Antisense SV40PolyA (PolyAas) induced stronger gene expression than its sense orientation (PolyA). Of four 60-bp segments of PolyAas (1F1R, 2F2R, 3F3R and 4F4R) inserted independently into pAlu14, only two (2F2R and 3F3R) eliminated the inhibition of GFP gene expression induced by Alu repeats. Deletion analysis revealed that a 17 nucleotide AT repeat (17ntAT; 5′-AAAAAAATGCTTTATTT-3′) in 2F2R and the fragment 3F38d9 (5′-ATAAACAAGTTAACAACA ACAATTGCATT-3′) in 3F3R were critical sequences for activating the GFP gene. Sequence and structural analyses showed that 17ntAT and 3F38d9 included imperfect palindromes and may form a variety of unstable stem-loops. We suggest that the presence of imperfect palindromes and unstable stem-loops in DNA enhancer elements plays an important role in GFP gene activation.


Introduction
Historically, considerable attention has been given to proteins and their encoding genes. However, with completion of the human and mouse genomes and a better understanding of eukaryotic gene expression, the noncoding sequences of genes have attracted increasing attention. Noncoding sequences are widespread in eukaryotic genomes and contain important genetic information (Eggleston, 2005;Maeshima and Eltsov, 2007;Satzinger, 2008;Depken and Schiessel, 2009) that includes promoters, enhancers and insulators (Tour and Laemmli, 1988), noncoding RNA that directs DNA methylation (Furey and Haussler, 2003), the regulation of axon formation (Dietzel and Belmont, 2001), and small RNA genes (Eggleston, 2005).
Alu and Line-1 repeat elements represent about 10% and 17% of the whole human genome, respectively, and are the most important noncoding sequences (Belgnaoui et al., 2006;Polak and Domany, 2006). Alu elements were initially considered to have no role in gene stability and expression, but recent work has shown that these elements can extensively influence gene expression. In previous work, we have shown that Alu tandem repeats and Line-1-ORF2 (ORF2) inhibited green fluorescent protein (GFP) gene expression when inserted downstream of this gene in the pEGFP-C1 vector (Wang et al., 2009a,b). Downstream noncoding gene sequences are highly structured and contain important regulatory elements such as 3' UTRs, transcription termination signals (Andreassi and Riccio, 2009) and enhancers (Mao et al., 2010). than its sense orientation (PolyA). We also examined the effects of small fragments of PolyAas on GFP gene expression to identify which PolyAas sequences activated this gene and found that two fragments were critical for activating GFP gene expression. The two fragments both include imperfect palindromes and may form incomplete stem-loop structures that are described as a mechanism for activating GFP gene expression.

Construction of expression vectors
The pAlu14 and pORF2 expression vectors were constructed as described elsewhere (Wang et al., 2009a,b) by inserting 14 head-to-tail tandem Alu (283 bp) elements or an ORF2 (3825 bp) downstream of the GFP gene in the pEGFP-C1 vector.
Primers were designed with sites for restriction enzymes (EcoR I or Hind III/Xba I; Kpn I/Nhe I) and the polymerase chain reaction (PCR) was used to amplify the synthetic DNA sequences (as templates) that contained mutated sites and fragments of PolyAas DNA. The PCR products were digested with restriction enzymes and inserted between the GFP gene and Alu repeats in pAlu14 or between the GFP gene and ORF2 in pORF2. When the compatible ends of the DNA fragments digested with Xba I and Nhe I restriction enzymes were ligated by T4 DNA ligase both of the recognition sites for Xba I and Nhe I were destroyed. Using this approach, the expression vectors of two tandem insertion sequences were obtained. The primers and templates used for construction of the expression vectors are shown in Tables 1 and 2, respectively. Wang et al. 397  Underlined sequences indicate restriction enzyme cleavage sites.

Cell culture and transfection
HeLa cells were cultured in Dulbecco's modified Eagle's medium (DMEM) with 10% fetal calf serum. Cells were plated in each well of a 24-well plate at 0.9 x 10 5 cells/well and cultured at 37°C in 5% CO 2 for 30-36 h. The cells were transiently transfected with 0.4 mg of expression vector DNA using 2 mL of Lipofectamine2000 reagent (Invitrogen, USA), according to the manufacturer's instructions, and subsequently cultured for an additional 30-36 h. The transfected cells were used for RNA extraction and fluorescence assays.

Assessment of GFP fluorescence
Transfected HeLa cells were fixed in 4% paraformaldehyde and the expression of GFP protein was assessed by using fluorescence microscopy (Nikon TE2000-U, Japan). Images were obtained under normal and fluorescent illumination.

Northern blotting
Total RNA from transfected cells was extracted with Trizol ® reagent (Invitrogen, USA). RNA was electrophoresed in 1.2% agarose gels containing 0.4 M formaldehyde and then transferred to nylon membranes (pore diameter 0.45 mm; Osmonics, USA). A 590-bp fragment from the GFP gene in the pEGFP-C1 vector was amplified by PCR using the forward primer 5'-GGGCGAGGGCGATG-3' and the reverse primer 5'-CTTGTACAGCTCGTCCAT GC-3'. The PCR product was purified by agarose gel electrophoresis and radiolabeled with [a-32 P]-dCTP (Furui, China) using the random primer labeling system (TaKaRa, Japan). The nylon membranes blotted with RNA were hy-bridized with a-32 P-radiolabeled DNA probes at 42°C in 50% formamide containing 5x SSC (saline sodium citrate), 5x Denhardt's solution and 100 mg of salmon sperm DNA/mL for 24 h in a UL2000 hybriLinker (UVP, USA). The membranes were washed twice at room temperature with a solution of 1x SSC-0.1% SDS and then washed three times with a solution of 0.1x SSC-0.1% SDS at 68°C prior to autoradiography. The membranes were subsequently stripped by washing twice at 80°C for 1 h in a solution containing 50% formamide-5% SDS-50 mM Tris (pH 7.4), and then hybridized with a-32 P-radiolabeled probe for neoRNA (containing the cassette for neomycin resistance). A 671-bp fragment from the neo gene in the pEGFP-C1 vector was amplified by PCR using the forward primer 5'-CACAACA GACAATCGGCTGCT-3' and the reverse primer 5'-AGC GGCGATACCGTAAAAGCAC-3'. The probe for neoRNA was prepared using the random primer labeling system and the 671-bp neo fragment as the template.

PolyA and PolyAas eliminate the repression of GFP gene expression induced by Alu repeats or ORF2
Northern blotting showed that there was almost complete repression of GFP expression in HeLa cells transfected with the expression vectors pAlu14 and pORF2 ( Figure 1A The neo gene was used as a control to assess the efficiency of transfection with the GFP gene. The occurrence of both genes on the same expression vector eliminated the possibility that variation in the efficiency of transfection contributed to the differences observed in the experimental results.

The effects of PolyAas segments on GFP gene expression
To determine which segments in PolyAas eliminated the repression of GFP gene expression caused by Alu repeats we produced four 60-bp segments of PolyAas (1F1R, 2F2R, 3F3R and 4F4R; Figure 2A) that were then inserted between the GFP gene and Alu repeats in the pAlu14 vector used to transiently transfect HeLa cells. Northern blotting showed that 1F1R and 4F4R did not stimulate GFP gene expression ( Figure 2B, lanes 1 and 4 vs. lane 6) whereas 2F2R and 3F3R did ( Figure 2B, lanes 2 and 3 vs. lane 6). Wang et al. 399 The effects of 2F2R and its deleted fragments on GFP gene expression To determine which fragments of 2F2R were responsible for the activation of GFP gene expression we deleted selected regions of the 2F2R DNA ( Figure 3A). The bases in the 3' end of 2F2R were deleted and the single sequence or double tandem sequences of deleted 2F2R (45R, 30R, 22R, 19R and 16R) were inserted into pAlu14. Fragments 45R, 30R and 22R activated GFP gene expression ( Figure  3B, lanes 2, 3 and 4 vs. lane 13, and lanes 8, 9 and 10 vs. lane 13), whereas 19R and 16R induced weaker GFP gene expression ( Figure 3B, lanes 5, 6, 11 and 12 vs. lane 13). The double tandem sequences of 2F2R and their deleted sequences induced stronger GFP gene expression than the corresponding single sequences ( Figure 3B, lanes 7-12 vs.  lanes 1-6, respectively). Although 45R, 30R and 22R all enhanced GFP gene expression, the activation of 45R was weaker than that of 2F2R ( Figure 3B

The effects of the 22R fragment and its deleted sequences on GFP gene expression
The deletion of three upstream bases (5' -GTG) and two downstream bases (GT-3') of fragment 22R yielded a 17 nucleotide repeat of AT (17ntAT; sequence and position shown in Figure 3A). 17ntAT activated GFP gene expression to the same extent as 22R ( Figure 4B, lane 2 vs. lane 5), indicating that the five deleted bases were not important for GFP gene activation. The 19R fragment, i.e., 22R from which three downstream bases (TGT-3') had been deleted, caused much lower GFP gene expression ( Figure 4B, lane 3 vs. lane 5), indicating an important role for these bases in GFP gene activation. Double tandems of 17ntAT produced more transcripts than the corresponding single sequence ( Figure 4B, lane 2 vs. lane 4). Figure 4A shows the sequences inserted into pAlu14. 400 GFP gene activation by SV40PolyA fragments

The effects of 3F3R fragments on GFP gene expression
To identify which fragments in 3F3R enhanced GFP gene expression, we deleted sections of 3F3R DNA ( Figure  5A). The single sequences of deleted 3F3R (3F46, 3R49, 3F135 and 3F235) were inserted into pAlu14. Figure 5B shows that 3F46 activated GFP gene expression, whereas 3R49, 3F135 and 3F235 did not.

The effects of 3F46 deletions on GFP gene expression
To identify the 3F46 cis-element responsible for gene activation we deleted the nucleotides upstream of 3F46 ( Figure 6A) and constructed expression vectors. Northern blotting showed that 3F46d2-3F46d8 (deletion of 2-8 bases upstream of 3F46) still activated GFP gene expression ( Figure 6B, lanes 1-7 vs. lane 11), whereas 3F46d9 caused only weak activation ( Figure 6B, lane 8 vs. lane 11) and 3F46d10 produced hardly any activation ( Figure 6B, lane 9 vs. lane 11). Wang et al. 401

The effects of base deletions downstream of 3F38 (3F46d8) on GFP gene expression
The deletion of selected nucleotides was used to establish the downstream boundary for GFP gene activation by fragment 3F38 ( Figure 7A). Northern blotting showed that 3F38d1-3F38D6, 3F38d8 and 3F38d9 activated the GFP gene ( Figure 7B, lanes 1-8 vs. lane 14) whereas 3F38d10-3F38d13 did not ( Figure 7B, lanes 9-12 vs. lane  14). These results identified 3F38d9 as the critical sequence of 3F38 for GFP gene activation.

The effects of mutations in 22R DNA on GFP gene expression
Analysis of the DNA sequence of 22R indicated that this fragment may form an incomplete stem-loop structure that included a loop (3 nt), an initial stem (3 bp), a bulge (2 nt) and a second stem (3 bp) ( Figure 8A). We examined the influence of loop base type and loop length on the ability of 22R to influence GFP gene activation by introducing mutations in these regions and inserting the fragment into the vector pAlu14 for transfection in HeLa cells. The loop base combination TGC (22R*2, wild type) induced the strongest GFP gene expression ( Figure 8C, lane 9 vs. lanes 1-5), whereas changing the loop base number from 0 nt to 6 nt showed that a 3 nt loop (22R*2, wild type) produced the strongest GFP gene expression ( Figure 8C, lane 9 vs.  lanes 6, 7, 8, 10, 11 and 12). Although most of the loop mutants were able to enhance GFP gene expression they were generally less effective than the wild type fragment (22R) ( Figure 8C, lanes 2, 4, 5, 6, 7, 8, 11 and 12 vs. lane 13). This finding indicates that if there are no changes in the palindromes flanking the loop then many types of loops can enhance GFP gene expression.

Discussion
SV40PolyA activates luciferase reporter gene expression in HeLa cells (Zhi-Li et al., 2001). In this study, the insertion of sense or antisense PolyA between the GFP gene and Alu repeats or ORF2 in the vectors pAlu14 or pORF2 resulted in partial recovery of GFP gene expression repressed by Alu repeats or ORF2. This finding indicated that sense and antisense PolyA enhanced GFP gene expression, with PolyAas causing greater induction than PolyA. Nolan et al. (1996) found that reversing the orientation of DRE (a 27-bp enhancer) dramatically decreased growth hormone gene expression, indicating that the binding of transfactors to the DRE and the interaction of this complex with the TATA region are directional.
The wild-type SV40 enhancer contains a double tandem duplication (72-bp repeat). The single 72-bp repeat contains three functional elements (A, B and C) that range in size from 15 to 22 bp (Shepard et al., 1988). Although 402 GFP gene activation by SV40PolyA fragments  PolyA is a short sequence (240 bp) it contains various regions that may differ in their ability to activate genes. To examine this hypothesis, we produced four segments of PolyAas (1F1R, 2F2R, 3F3R and 4F4R) ( Figure 2A) and inserted them separately downstream of the GFP gene in pAlu14. 2F2R and 3F3R abolished the inhibition of GFP gene expression induced by tandem Alu repeats. To determine which portions of 2F2R activated the GFP gene, we deleted bases from the 3' end of this segment and found that fragment 22R activated the GFP gene, with double tandem sequences having a stronger effect than the corresponding single sequences. None of the other 2F2R fragments (19R, 16R, Secloop and Poly4) significantly activated the GFP gene.
The 5' and 3' UTRs of viral genomes are highly structured and are critical for controlling viral biological processes. The stem-loop structure is important for gene activation (Dai et al., 1997) and Bio-software predicts various stem-loop structures in these regions. Most of the stem-loop structures in viral genomes show bulge sequences in their stems (Yu and Markoff, 2005;Rosskopf et al., 2010;Nickens and Hardy, 2008). To explain the results obtained with the 2F2R fragments, we hypothesized that 22R contained an imperfect palindrome and formed an incomplete stem-loop structure that included a loop (3 nt), an initial stem (3 bp), a bulge (2 nt) and a second stem (3 bp). Fragment 19R [22R with three downstream bases (TGT) deleted] produced fewer transcripts ( Figure 4B, lane 3 vs. lane 5), whereas 17ntAT [22R with three upstream bases (GTG) and two downstream bases (GT) deleted] activated the GFP gene when inserted into pAlu14, indicating that the third base (T) downstream of 22R is important for GFP gene activation. 17ntAT was the smallest sequence in 22R to form an incomplete stem-loop structure. The stem-loop structures were destroyed in 19R and 16R, and neither fragment activated the GFP gene significantly, which suggested that an incomplete stem-loop structure ( Figure 8A) was important for GFP gene activation. Examination of the 17ntAT sequence (5'-AAAAAAATGCTTTATTT) suggested that it was capable of forming a variety of incomplete, unstable stem-loops. Figure 8A shows one of the presumed stem-loop structures.
To determine the 3F3R sequences involved in GFP gene activation we produced four overlapping fragments (3F46, 3R49, 3F135 and 3F235) of this segment ( Figure  5A). The four fragments were inserted separately between the GFP gene and the Alu repeats in the pAlu14 vector that was then used to transfect HeLa cells. Only 3F46 activated the GFP gene. Sequential (one by one) deletion of bases upstream of 3F46 showed that removal of the first eight bases (3F46d8) had little influence on GFP gene activation, whereas elimination of the ninth base (3F46d9) markedly attenuated this activation, indicating a critical role for this base ( Figure 6B, lane 8). Sequential (one by one) deletion of the downstream bases of 3F38 (fragment 3F46 in which eight bases were deleted) showed that removal of the first nine bases did not markedly affect GFP gene activation whereas the removal of bases 10-13 eliminated the activation of this gene. Together, these findings indicated that the critical sequence in 3F3R for GFP gene activation was 3F38d9 (5'-ATAAACAAGTTAACAACAACAATTGC ATT-3'). This sequence contained 29 bases (A = 15, C = 5, G = 2, T = 7), with the fragment from A12 to C20 containing three AAC repeats that were flanked by GTT and TTG sequences which formed stem-loop structures with the AAC repeats. Fragment 3F38d9 was thus similar to 17ntAT in that both of them formed unstable stem-loop structures.
Base mutations and variations in the number of bases (from 0 nt to 6 nt) in the 22R loop showed that the TGC Wang et al. 403 loop (22R, wild type) induced the strongest gene expression, although most of the loop mutants showed some ability to induce this gene ( Figure 8C). This finding suggested that an unstable stem-loop structure was required for GFP gene activation by 22R, with many loops partly satisfying this criterion. Changes in loop bases influence the stability of stem-loop structures (Lamoureux et al., 2006), and stem-loop structures with 3-4 base loops may be specifically stabilized and have lower folding times (Kuznetsov et al., 2001(Kuznetsov et al., , 2008. These findings may help to explain the importance of 3nt loops in GFP gene activation. DNA cruciform structures can be formed when intrastrand pairing occurs between complementary bases of inverted repeat sequences in double-stranded DNA. Cruciform formation is energetically less favorable than B-form DNA so that the extrusion of these structures from duplex DNA requires the driving energy provided by negative supercoiling (Sean et al., 2009). Hairpin structures in the cruciform promoter for the bacteriophage N4 virion RNA polymerase are extruded at physiological superhelical density (Chou et al., 1999). The palindromes in doublestranded DNA may form incomplete stem-loop structures within small scope (Darlow and Leach, 1998). For this reason, the structures formed by these palindrome sequences may play an important role in regulating gene expression in cells.
The critical sequences of 2F2R and 3F3R involved in gene activation have two characteristics in common, namely, (1) they can form various stem-loop structures that increase the probability of creating stem-loops by random impact and (2) the stem-loop structures are incomplete and unstable, which ensures that stem-loops promptly revert to a double helix state. Based on these findings, we propose that sequences containing suitably imperfect palindromes activate gene expression by dynamic fluctuations between unstable stem-loop structures and double-strand forms. Additional experiments are required to confirm this hypothesis.