The histone genes cluster in Rhynchosciara americana and its transcription profile in salivary glands during larval development

Abstract In this work we report the characterization of the Rhynchosciara americana histone genes cluster nucleotide sequence. It spans 5,131 bp and contains the four core histones and the linker histone H1. Putative control elements were detected. We also determined the copy number of the tandem repeat unit through quantitative PCR, as well as the unequivocal chromosome location of this unique locus in chromosome A band 13. The data were compared with histone clusters from the genus Drosophila, which are the closest known homologues.


Introduction
Rhynchosciara americana is a dipteran belonging to the family Sciaridae, commonly known as dark-winged fungus gnats.These are known for its exuberant polytene chromosomes and developmentally regulated DNA amplification loci present in several tissues, the so-called DNA puffs.R. americana has been the object of research since the 1950s (Nonato and Pavan, 1951), but besides the wellknown physiology of its polytene chromosomes and their puffs (Santelli et al., 1991(Santelli et al., , 2004;;Frydman et al., 1993, Penalva et al., 1997), the underlying overall molecular biology and evolution are poorly understood.
In R. americana salivary glands, the last cycle of DNA replication encompasses a gene amplification phenomenon generating DNA puffs (Breuer and Pavan, 1955;Ficq and Pavan, 1957;Rudkin and Corlette, 1957).This well-documented polyteny cycle lasts for 5 or 6 days.(Machado-Santelli andBasile,1973, 1975).Due to its long duration, it is possible to precisely follow the replication and transcription processes.However, several structural characteristics of these unusual chromosomes present in Sciaridae are still unknown, such as the amplification control elements.
We started to focus on the histone class of genes as an approach to better understand chromatin structure aspects in the salivary glands of Rhynchosciara, and to provide molecular markers for this species.The replication-dependent histones and their variants are common chromatin components of great interest because of their involvement in the modulation of the chromatin transcriptional status and with the replication process.
Data from several laboratories have implicated regulatory regions of the histone genes as targets for factors acting under the control of the Cdk2/cyclin E complex (Ewen, 2000).In mammals, the factor NPAT, identified as substrate of that complex, is a linker element between cell cycle regulators and histone gene transcriptional activation (Zhao et al., 2000;Gao et al., 2003).Identification of Cajal bodies (CB) participating in this process, as well as the Stem Looping Binding Protein (SLBP), brought new elements to the understanding of the cell cycle-dependent activation of the transcriptional program (G1-S transition) (Marzluff, 2005).
Histones, as the main chromatin organizing proteins, are extremely conserved among eukaryotes.Due to their slow rate of modification through evolution they became landmarks for phylogenetic analysis of distant organisms (Thatcher and Gorovsky, 1994).Usually, histone genes are arranged in two forms: (1) in a cluster containing the five histone genes repeated in tandem several times, such as in Drosophila and Chironomus, or (2) dispersed in the genome forming incomplete clusters or as single, isolated genes, like in humans (Matsuo and Yamazaki, 1989;Nagoda et al., 2005;Hankeln and Schmidt, 1991).
Previous studies have identified the R. americana histone gene cluster using a D. hydei probe, and semiquantitative Southern blotting experiments were able to estimate its copy number as approximately 150 tandemly arranged units.Fluorescent in situ hybridization analysis localized its chromosomal site in region 13 of chromosome A, near the pericentromeric region (Santelli et al., 1996).
To advance the study of R. americana histone genes, we now report the complete sequence of the R. americana histone gene cluster and the description of its canonical and putative regulatory elements.The data are also compared with counterparts in the genus Drosophila genus, due the phylogenetic proximity between the two genera.

Library construction
A recombinant phage lDASHII (Stratagene) -RaHis (Santelli et al., 1996) previously isolated using a Drosophila hydei histone probe was cut with NotI restriction enzyme and the isolated insert was extracted with phenol.Approximately 50 mg of this DNA was sheared using the nebulization method (shotgun strategy) and fragments were separated in low melting point agarose and size selected (1.0 -1.5 kb).Approximately 2 mg of DNA were then treated with Klenow/PNK enzyme (Amersham/Pharmacia) in a blunting reaction and cloned in pUC18 (Sigma) using T4 DNA Ligase (Gibco).

Sequencing and assembly
Recombinant E. coli DH5a selected by IPTG/XGAL grown in 1 mL of Circle Grow (USB) medium were pelleted (2,200 x g for 10 min) in microplates, following SDS/NaOH (10%/0.2M) lysis and potassium acetate (3 M) debris precipitation.The lysate was filtered using a MAGV N22 filter plate (Millipore).The plasmid solution was precipitated with isopropanol, washed with ethanol 80% and finally suspended in 30 mL of 10 mM Tris.Sequencing reactions were performed in a PCR GeneAmp 9700 thermocycler (Perkin Elmer), using BigDye terminator technology (96 °C for 1 min, 35 cycles of 96 °C for 20s, 50 °C for 1min and-60 °C for 4 min).The samples were precipitated with ethanol/Na acetate (3M) supplemented with glycogen (1g/L) and run in an ABI 377 automatic sequencer (Applied Biosystems).
The electropherograms were analyzed and assembled using Phred, Phrap and Consed software packages (Ewing et al., 1998;Ewing and Green, 1998;Gordon et al., 1998).We adopted the standard Phrap assembly parameters and considered a sequence as of high quality if it had at least 150 contiguous bp with quality ³ 20 inferred by Phred.The error estimated by the Phred/Phrap/Consed suite for this assembly was 1/10,000 bp.

Sequence analysis
The initial analyses were carried out through Genscan software (Burge and Karlin, 1997), posterior sequence comparisons, dot-maps and pattern searches were performed with Emboss packages (Rice et al., 2000).ClustalX was utilized to build alignments (Thompson et al., 1997).
The sequences of histone gene clusters from Drosophila hydei (X17072), Drosophila melanogaster (X14215) and Chironomus thummi (X56335) were downloaded from NCBI and used for comparisons.

Quantitative PCR
The repeat unit copy number was estimated through real time PCR.The primers H2Aq1 (TTTCGGCAAT AGGACTGCTT) and H2Aq2 (ATTGGAATTGGCTGG TAACG) were designed to amplify a histone cluster fragment and RaCHAP-Q1 (GCTTAACAAATGAATCA GTC) and RaCHAP-Q2 (ACTCATTAAACAAAAG GTCA) to amplify a T-complex Chaperonin 5 fragment (Rezende-Teixeira, personal communication), a gene present in single copy in Drosophila.The SYBR Green reactions were assembled as instructed by the kit manufacturer (ABgene), with template DNA extracted from testis of adult flies according to Rezende-Teixeira et al. (2012).Amplification rates were estimated considering Chaperonin as a single copy gene also in Rhynchosciara, using the mathematical method described by Pfaffl (2001).
The experiments were performed in triplicate and dilutions of the cloned histone cluster itself served to establish the reference curve (all the 5.1 kb cloned in pBluescript KS, Stratagene).The primers listed above were designed to produce amplification products containing amplicons of 100-200 bp, all of which were specific products, which was especially important in the case of histone RaH3, which shares many similarities with RaH3.3 (Siviero et al., 2006).

In situ hybridization
Salivary gland chromosomes fixed with ethanolacetic acid (3:1) were squashed in 45% acetic acid.The coverslips were removed by freezing the slides on dry ice.The preparations were then washed in PBS, denatured for 3-5 minutes in NaOH (0.07 M), washed in a series of alcohol and dried.Probes were labelled using the dUTP-digoxigenin Random-primer kit (Roche) according to manufacturer instructions and denatured by heating just before the hybridisation step.The hybridized probes were revealed by fluorescein labelled anti-digoxigenin serum (Roche) and the chromosomes were counterstained with propidium iodide.The preparations were analysed in a laser scanning confocal microscope, LSM-510 (Zeiss), and were considered positive the regions labelled in most of the chromosome optical sections.

General sequencing results
A large insert of the histone gene cluster repeat was sub-cloned from a previously isolated recombinant phage and sequenced by shotgun technique.The histone repeat unit in R. americana has 5,131 bp (GenBank accession number: AF378198), containing single sites for different restriction enzymes: for EcoRV in the RaH1-RaH3 spacer; for HindIII in the RaH3 coding region, and for PvuI in the RaH2B portion (Figures 1 and 2).As commonly observed in other histone clusters, the intergenic spacers are AT-rich.The orientation of the individual histone genes is similar to those in Drosophila melanogaster and Drosophila hydei, except for the H1 gene (Figures 1 and 2).The R. americana cluster also shares other similarities with its D. hydei homologue: like its length, which is only 22 bp shorter, a CG content, 37% for R. americana and 38% for D. hydei (Table 1), and a strong nucleotide identity in the coding regions (Table 2).
Through quantitative real time PCR it was possible to obtain an estimate of 159 copies of the histone cluster unit (± 24 Median Absolute Deviation) in the haploid genome of R. americana, which is similar with observations for the genus Drosophila (110-150 copies).

5' UTR elements
Among the predicted promoter regions, canonical TATA-boxes were found only upstream of the RaH3, RaH4 and RaH2B genes, while RaH1 and RaH2A presumptive promoters have TATA-like sequences.In the putative leader sequences of the genes RaH1, RaH3, RaH4 and RaH2B, the element AGTGAA (or AGTG-like), was found near the start codon of RaH3 and RaH2B, and in the promoter region of RaH1, RaH4 and RaH2B.As observed in D. hydei (Kremer and Hennig, 1990), only RaH3 showed an AGTGA block near the start codon (Figure 1).These AGTG-like elements were suggested to have a specific function related to histone expression in the genus Drosophila (Matsuo and Yamazaki, 1989;Matsuo, 2003).This hypothesis is now reinforced by the presence of the same AGTG-like elements in the R. americana histone repeat unit.The present data actually suggest that the role of these elements can be extrapolated for the Sciaridae family as a whole.

3' UTR elements
In the 3' untranslated region (3' UTR) of each histone gene one palindrome or near-palindrome sequence block was found.These are the hairpin formers (Figures 1 and 3) involved with the regulation of mRNA processing of the cell-cycle dependent histone genes (Birchmeier et al., 1982).The inverted repeat present in RaH1 gene is more divergent when compared with the palindromes of the other core histones genes, as was also observed in D. melanogaster and D. hydei (Kremer and Hennig, 1990).The RaH1 distance from the stop codon (48 bp) is only slightly longer than the one observed in the core histones genes (RaH3: 23 bp; RaH4:40 bp; RaH2A:44 bp; RaH2B: 39bp), and is almost half the distance when compared to the Drosophila homologs.
Interestingly, with the exception of RaH4, these inverted repeats found in R. americana can be grouped by similarity with their homologs in the Drosophila genus (Figure 3), indicating a possible common origin of these elements.The palindromes for H3 histones from R. americana and H3/H4 from D. melanogaster and D. hydei are a perfect match.The inverted repeat from RaH4 is imperfect, it is 1 bp shorter in the 5' extremity and similar to the palindromes found in the H2B genes of these three insects.Identity between hairpins of the H4 and H2B genes is also observed in Chironomus thummi, which is only a single base different from the correspondent hairpin in Rhynchosciara.
Interestingly, the element CAA(T/G)GAGA, which is common in the genus Drosophila and is related to the binding of snRNA, or other similar consensus blocks were not found in R. americana (Mowry and Steitz, 1987;Soldati and Schumperli, 1988).Further research will thus be needed to determine whether R. americana is able to use another sequence to bind snRNAs, or whether it has a mechanism for mRNA maintenance which is somewhat different from the usual one.
Poly-A signals were predicted for the five histone genes, but the prediction algorithm scores were too low for RaH1, RaH3 and RaH2A, so they will not be considered here.The poly-A signals found in RaH4 and RaH2B corresponded to the consensus sequence AATAAA, however no 3' U-rich regions were observed, even putatively down- stream CA poly-adenylation sites being present (Manley and Takagaki, 1996).In the annelid C. variopedatus, canonical poly-A signals were found for the five genes in the histone cluster (del Gaudio et al., 1998).The authors hypothesized that the presence of these termination signals in present in all ancestral histones was substituted during the evolution by the hairpin structure, which corresponds to the observation that these signals are unequally distributed among the histone genes in the Rhynchosciara cluster.Double termination signals were reported for the H2B, H3 and H4 genes in D. melanogaster by Akhmanova et al. (1997).Based on the present data, however, we cannot conclude whether these poly-adenylation signals are actually functional in R. americana.

Other interesting elements
Strikingly, the genes RaH4 and RaH2A share almost the same 44 bases region in the 5' portion (84% identity), starting 5 bp upstream of the start codon (GAAAA) and encompassing the coding segment for the first 13 amino acids, which are, as expected, also very similar (Figure 1A).This identity level is not observed in the same portions of the clusters from D. melanogaster, D. hydei or C. thummi.
These 44 bases, including the 5 non-coding base pairs, may reflect the a common origin of these genes, or at least the same origin of this initial domain.
Other noteworthy elements are two identical sequence blocks in the 3' UTR of RaH2A and RaH2B (AATAATAATAATACA), and two similar blocks in the 3' UTR of RaH4 (AAATAAATAATACA) and RaH1 (AATAATATAATACA), all positioned 38-45 bp downstream of the palindromes.These elements resemble A-boxes, and their position suggests that they may play a role in mRNA termination.

Coding regions
Codon usage determined for the histone genes of R. americana (Tables S2-S7) shows several points of similarity with the histone codon usage for D. hydei.A few differences lay on the Rhynchosciara preferential codons: UUG (Leu), CCA and CCG (for Pro), AAA (Lys), CAA (Gln) and GGU (Gly).The only stop codon used in these genes is UAA.
The GC content in the coding regions of the Rhynchosciara histone gene cluster ranges from 44.2% (RaH1) to 47.9% (RaH4) with a mean value of 45.9% (Supplementary material, Tables S2-S7).These values are lower than those observed in D. melanogaster (mean 51.4%) and D. hydei (mean 48.9%).The GC content at the 3 rd codon bases (GC3s) is also much lower than that observed in the genus Drosophila, 38.4% (average) (Table 3 and Table  S1); the lowest GC3s is 34.6% in the RaH3, lower than the GC3s from histone H3 of 16 Drosophila species analyzed by Matsuo (2003).In that study the author proposes that a reduction in the population size of a common ancestor of D.  hydei and D. americana may have relaxed the selective pressure, increasing the frequency of substitutions at the 3 rd codon bases for A and T.However, the intriguingly low GC3s content in Rhynchosciara suggests two possible explanations: 1) the genera Rhynchosciara and Drosophila would share a common ancestor, so the same conditions of selection pressure and relaxation that occurred for D. hydei and D. americana could affect R. americana; or 2) since Rhynchosciara belongs to an ancient branch divergence in Diptera, the common ancestor could already have had a low GC3s content at this locus.

Histone proteins
In other organisms, several motifs and domains were identified in histone DNA and protein sequences, such as promoters, nuclear localization signals (NLS) and functional histone domains.In R. americana, at least for H1, H3 and H2B, we have identified NLS motifs, and the linking motif of H1 fits all the parameters assumed for others sequences.As expected, all the core characteristic domains for this class of proteins (histone domain/InterPro Search) are present and are nearly identical when compared to Drosophila and Chironomus.

Localization of the histone repeat in Rhynchosciara
To determine the possible existence of other histone clusters spread across the R. americana chromosomes, fluorescent in situ hybridization assays were performed.These assays evidenced a single positive band in region 13 of chromosome A when using the repeat unit as probe.The same band was seen under a wide range of stringency conditions (Figure 4).This result suggests the inexistence of other clusters for the replication-dependent histone genes, as is also the case in D. virilis (Nagel and Grossbach, 2000) and in D. americana (Nagoda et al., 2005).This puts in evidence a strong synteny for the histone gene cluster in the genus Rhynchosciara, as the same locus was reported to contain the histone cluster in R. baschanti and R. hollaenderi (Stocker and Gorab, 2000).Southern blot analysis of DNA from salivary glands from a recently discovered species, Rhynchosciara sp., showed a unique HindIII 5.1 kb band labeling (result not shown) that corresponds to Siviero et al. 585  These results indicate that the unique cluster present in the 13A locus, containing all the replication-dependent histone genes, may be a common feature among Rhynchosciara species.

Transcription profile in salivary glands
To analyze the role of these canonical histones during polytene chromosome development, we determined the transcriptional profile in salivary glands of each of the histone genes present in the cluster of Rhynchosciara.The expression levels were measured by absolute quantification of the respective transcripts.We believe that this information can contribute to a better understanding of chromatin alterations occurring during the gene amplification or polytenization processes, and to the understanding the role of histone variants such as RaH3.3 detected in this tissue (Siviero et al., 2006).
The expression profiles (Figure 5) show that all histones have greatly increased transcription levels from the moment on when cocoon spinning starts.This was expected, since this period encompasses the last polytenization cycle and the beginning of gene amplification in certain puffs.
Interestingly, the RaH4 mRNA levels were much higher than those for the other histone genes at all stages an-586 Rhynchosciara histone genes cluster  alyzed.This may reflect an increased consumption of protein due to a higher turnover, or it may be a consequence of a shorter half-life of this mRNA, since in Rhynchosciara this gene has an imperfect hairpin in its 3' UTR portion.The lowest detected mRNA levels were observed for RaH2A and RaH2B, which may be a direct consequence of the use of other variants of these proteins (Park and Luger, 2008;Lindner, 2008), very common in other organisms, or can be caused by a longer half-life of the mRNA.
While the nucleosomal histones (core histones) presented a transcription peak in the initial cocoon-spinning period (IR), with a consequent decrease in other stages, the Histone RaH1 showed a higher level of transcription in the period of 3C puff expansion.Even in prepupae, the RaH1 levels were higher than during IR.This pattern of transcription may be related to chromatin modifications necessary for gland histolysis and later metamorphosis, such as higher chromosome condensation (Brandão et al., 2014).

Final remarks
The characterization of the R. americana histone gene cluster adds new insights to this evolutionarily and genomically poorly studied system.These data indicate that R. americana has a single cluster of the five histone genes repeated approximately 159 times.Sequencing and analysis of the histone clusters from the other known Rhynchosciara species (R. baschanti, R. hollaenderi, R. milleri, R. papaveroi and R. sp) will now be greatly facilitated and could introduce important molecular data for an evolutionarily much more ancient dipteran genus than Drosophila, possibly providing relevant data on the evolution of the Diptera branch as a whole.

582
Figure 1 -The Rhynchosciara americana histone genes.A) Repeat unit sequence, highlighted are putative control elements.B) Restriction map and GC content.

Figure 2 -
Figure 2 -Orientation of histone genes in clusters from different organisms.

Figure 4 -
Figure 4 -In situ hybridization using histone gene repeats as probe labeled with digoxigenin and revealed by a FITC-antidigoxigenin antibody, indicating a single locus in region 13 of chromosome A. A) four chromosomes from a single salivary gland cell.B) Split image of chromosome A showing labeled region A13.C) Chromosome counterstained with propidium iodide.D) merged images.

Figure 5 -
Figure 5 -Transcription profiles of the Rhynchosciara americana canonical histone genes during larval development through absolute quantification.The vertical axis shows transcript numbers for each developmental period: 2P -second period of the fourth Instar; IR -3 rd period of the fourth instar, corresponding to initial cocoon spinning; B -4 th period of the fourth instar, corresponding to Puff 2B formation; 5 th period of fourth instar corresponding to Puff 3C formation; and 6 th period of fourth instar corresponding the prepupal stage.Shown are means and standard errors of the mean for the triplicate samples.

Table 2 -
Nucleotide identities between R. americana and other Diptera histone coding regions and comparison among histone proteins from these organisms, denoting identity and similarity.(Dm-D.melanogaster, Dh -D.hydei, Ct -C.thummi)

Table 3 -
Bases distribution in the 3 rd codon position and GC content of the coding region and in the 3 rd codon base.
the size of the R. americana repeat.The new Rhynchosciara species was collected in Ubatuba, SP, and its proper description is in progress (Machado-Santelli G.M., unpublished results).