Identification and in silico analysis of the Citrus HSP 70 molecular chaperone gene family

The completion of the genome sequencing of the Arabidopsis thaliana model system provided a powerful molecular tool for comparative analysis of gene families present in the genome of economically relevant plant species. In this investigation, we used the sequences of the Arabidopsis Hsp70 gene family to identify and annotate the Citrus Hsp70 genes represented in the CitEST database. Based on sequence comparison analysis, we identified 18 clusters that were further divided into 5 subgroups encoding four mitochondrial mtHsp70s, three plastid csHsp70s, one ER luminal Hsp70 BiP, two HSP110/SSE-related proteins and eight cytosolic Hsp/Hsc70s. We also analyzed the expression profile by digital Northern of each Hsp70 transcript in different organs and in response to stress conditions. The EST database revealed a distinct population distribution of Hsp70 ESTs among isoforms and across the organs surveyed. The Hsp70-5 isoform was highly expressed in seeds, whereas BiP, mitochondrial and plastid HSp70 mRNAs displayed a similar expression profile in the organs analyzed, and were predominantly represented in flowers. Distinct Hsp70 mRNAs were also differentially expressed during Xylella infection and Citrus tristeza viral infection as well as during water deficit. This in silico study sets the groundwork for future investigations to fully characterize functionally the Citrus Hsp70 family and underscores the relevance of Hsp70s in response to abiotic and biotic stresses in Citrus.


Introduction
The Hsp70 family of stress-related proteins comprises a series of highly conserved ATPase molecular chaperones that are distributed ubiquitously in all prokaryotes and in the major subcellular compartments of eukaryotes.They have been implicated as key mediators of the correct folding of newly synthesized proteins, translocation of proteins across the membranes, assembly and disassembly of multimeric protein complexes, as well as disassembly of stress-induced protein aggregates and targeting of abnormal proteins to degradation (Hartl, 1996;Frydman, 2001).Hsp70s were first described as heat shock proteins induced by a rapid increase in temperature, but subsequently were recognized as general stress-related proteins that respond to a variety of stress conditions and are also required to assist proper folding of proteins under normal temperature.In general, the constitutively expressed members of the Hsp70 family, referred to as Hsc70 (heat shock cognate), assist in the proper folding of newly synthesized proteins and the translocation of proteins into organelles, whereas the stress-induced members are involved in the re-folding and degradation of misfolded proteins under adverse environmental conditions (Hartl, 1996;Miernyk, 1997;Frydman, 2001).Furthermore, eukaryotic cytosolic Hsp70s also play a role in controlling the activity of regulatory proteins, such as the transcriptional repression of HSF (Heat shock factor) (Shi et al., 1998) and the modulation of steroid receptor function (Morishima et al., 2000).
The Hsp70s are structurally organized into a highly conserved ATPase domain at their N-terminal portion and a peptide binding domain at their C-terminus.Successive cycles of peptide binding and release are coupled to the activity of ATPase that requires the participation of co-chaperones, such as Dnaj/Hsp40 and GPrE (Bukau and Horwich, 1998).
Among Hsps, the Hsp70 family encompasses the most conserved members exhibiting about 50% sequence identity between representatives of such diverse organisms as E. coli (DnaK) and eukaryotes (Hsp70s).In the same eukaryotic organism, Hsp70s from different compartments maintain high conservation of sequence and, in general, the function of the distinct isoforms may be predicted according to their subcellular localization.
In plants, several Hsp70 genes have been identified in the last decade (Boston et al., 1996;Guy and Li, 1998;Vierling, 1991) but a global picture of the plant Hsp70 family has emerged with the completion of the genome sequencing of Arabidopsis thaliana (Sung et al., 2001a;2001b;Borges et al., 2001;Cagliari et al., 2005).The fully sequenced Arabidopsis genome contains 18 identified Hsp70 genes, which, based on subcellular localization and sequence identity, have been categorized into five subfamilies, the cytosolic Hsp/Hsc70s, the plastid cpHsc70s, the mitochondrial mtHcs70s, the endoplasmic reticulum (ER) luminal BiPs and the subfamily of Hsp91.The plant Hsp/Hsc70 subfamily is highly conserved encompassing homologous genes that share more than 75% sequence identity.The members of both plastid and mitochondrial Hsc70 subfamilies are most related to DnaK, the bacterial Hsp70 homolog, whereas the ER luminal BiPs and Hsp70s from other compartments are closely related to the cytosolic Hsp70s (Wimmer et al., 1997).Of the 18 genes in Arabidopsis, 14 Hsp70 genes are E. coli DnaK homologs and four (4) genes are HSP110/SSE homologs (Lin et al., 2001).
Although the Hsp70 family maintains highly conserved structural and functional features across evolutionarily distant organisms, some structural characteristics are unique to plants.These include the presence of multiple BiP genes in the genome of some plant species and the presence of a predictive localization motif for organellar Hsp70 proteins at the C-terminus of the plant genes (Sung et al., 2001a).In fact, plant BiP subfamilies have been identified in tobacco comprising 5 genes, in soybean (4 genes), in Arabidopsis (3 genes), in Eucalyptus (5 genes) and in maize with at least 2 genes (Denecke et al., 1991;Wrobel et al., 1997;Cascardo et al., 2000;2001;Fontes et al., 1991;Sung et al., 2001b;Cagliari et al., 2005).As for plant Hsp70 localization motifs, they are contained in the unique and highly conserved C-terminal region of each Hsp70 from the same subgroup.They correspond to the EEVD motif for the cytosolic Hsp70, the HDEL sequence for the ER-resident Hsp70s (BiPs), the PEAEYEEAKK sequence for mitochondrial Hsp70s and the PEGDVIDADFTDSK motif for the chloroplast-localized homologs (Guy and Li, 1998).While the HDEL sequence functions as an ER retention signal, the other conserved sequences appear to act as an interaction site for co-chaperones that are specific to the organelles in which the Hsp70s are located.
Biotic and abiotic stresses, such as salinity, temperature extremes, oxidative stress, fungal and virus infection, represent major constraints for the agriculture and are responsible for 50% of crop loss worldwide (Wang et al., 2003).The elucidation of the mechanisms of plant response to stresses and how it acquires tolerance is absolutely relevant for breeding programs.Since Hsp70 plays a crucial role in protecting against stresses and in the re-establishment of cellular homeostasis, the characterization of the plant Hsp70 family and its complex regulation is likely to lead to new strategies to enhance crop tolerance to environmental stress.Consistent with this observation, overexpression of Hsp70 has been shown to be positively correlated to the acquisition of thermotolerance and to an increase in tolerance to water deficit and salt stress (Lee and Schöffl, 1996;Leborgne-Castel et al., 1999;Alvim et al., 2001;Sung and Guy, 2003).However, the underlying mechanisms of the Hsp70-mediated multiple cross tolerance and the role of Hsp70 during stress conditions have not been completely deciphered yet.Given the differential regulation of the plant Hsp70 gene family and the likely functional diversity of its members, a comprehensive characterization of the Hsp70 family in a plant species is required to understand how Hsp70s contribute to cell function and protection against specific environmental stressors.Here, we used an in silico approach to identify the Citrus Hsp70 genes and characterize the expression pattern of this family.We searched the CitEST database for tissue-specific abundance of Hsp70 mRNAs in libraries from different tissues and under different stress conditions.We identified at least 18 Hsp70 sequences from Citrus and showed that the Hsp70 isoforms display a diverse pattern of tissue-specific expression and are differentially regulated in response to biotic and abiotic stresses.

Material and Methods
In this investigation, we analyzed the CitEST database (http://biotecnologia.centrodecitricultura.br) to identify clusters that potentially encode Hsp70s.For the sequence analysis we used the BLAST program (Altschul et al., 1997) and we searched for each Citrus HSP70 gene using an Arabidopsis representative sequence of each Hsp70 subgroup (AT5G42020.1,AT1G16030.1,AT5G49910.1,AT1G79930.1,AT3G12580.1,AT4G37910.1)as prototypes.The searches were performed using tBLASTn with a score cutoff of E -60 (expected value).Partially overlapping ESTs were assembled into contigs and each derived cluster was considered a potential gene.The tentative consensus DNA sequences derived from the overlapping ESTs were used as prototypes for a new search cycle, which was performed using BLASTn with a cutoff of E -60 (expected value).
The possible genes were translated and the deduced protein sequences were analyzed.The percentage of amino acid sequence identity was determined using the ClustalW program (Thompson et al., 1994) or LALIGN program (Pearson, 1990).To generate the Dendogram, the sequences alignment in phylip format were obtained with ClustalW and the tree was generated using the Drawgram Program (Thompson et al., 1994).The subcellular localization of HSP70 genes was predicted by using the PREDOTAR software (http://urgi.infobiogen.fr/predotar/predotar.html).To evaluate gene expression in different libraries, the in silico transcriptional profiling or digital Northern analysis was performed by counting and dividing the number of sequenced ESTs for each annotated Hsp70 gene by the whole sequenced EST population of each library.The relative frequency of each gene was considered over a library size corrected to 10,000 ESTs and the differential gene expression was statistically analyzed as described by Audic and Claverie (1997).

Results and Discussion
The Citrus Hsp70 multigene family comprises at least five subfamilies The analysis of the CitEST database, which was generated through sequencing of ESTs from cDNA libraries prepared from different tissues and species of Citrus (Citrus sinensis, Citrus reticulata, Citrus limonia, Citrus aurantium and Citrus aurantifolia), suggests the existence of at least 18 Hsp70 genes in the Citrus genome (Table 1).As the libraries displayed great variation in size and tissue representation among the species, the generated data were from a collective and global analysis.While this strategy produced statistically meaningful data, it did not rule out the possibility that the highly conserved isoforms were allelic variations of the same gene present in distinct Citrus species.Based on sequence comparison (Figure 1), prediction of subcellular localization (Table 1) and the presence of characteristic domains of the different HSP70 subgroups from other plant species (Sung et al., 2001a), the Citrus Hsp70-encoded proteins were grouped into five distinct classes or subfamilies.The four major subfamilies are represented by eight cytosolic Hsp/Hsc70, one ER-resident BiP, four mitochondrial mtHsp70 and three csHsp70 from the chloroplast.The fifth subgroup is represented by two Hsp110/SSE-related high molecular weight Hsp70s (Table 1).Of the 18 probable genes in Citrus, only four represent full-length sequence contigs, Hsc70-5, mtHsp70-1, csHsp70-1 and Hsp70-1 (Table 1), and in the BiP-1 sequence, a few amino acid residues are missing at the N-terminal region, as compared to the Lycopersicon esculentum BiP deduced protein sequence (Genbank protein accession number, gi:1346172).
The Citrus Hsp70 categorization was consistent with the dendogram shown in Fig. 1  subclasses is clustered together with its Arabidopsis counterpart.Furthermore, the Hsp70s from the mitochondrial subfamily is most closely related to the members of the chloroplast subfamily, whereas BiP is most related to the cytosolic Hsp/Hsc subfamily, indicating that BiP diverged evolutionarily from the members of the other organellar subfamilies (mitochondria and chloroplast).The close relatedness between the plastid and mitochondrial Hsp70s seem to be a characteristic of the plant Hsp70 family as it has been also observed for the Spinach and Arabidopsis Hsp70 family (Guy and Li, 1998;Sung et al., 2001b).
All the other identified contigs represented truncated sequences of Hsp70s.As for the Hsc70-6, Hsc70-7 and Hsc70-8 genes, they possess reads that cover only the C-terminal portion of the proteins, whereas the reads of Hsc70-1, Hsc70-3 and Hsc70-4 cover only the N-terminal region of the protein.As the sequences of the N-terminal and C-terminal encoding contigs do not partially overlap, the possibility that some of these partial Hsp70 sequences represent the same gene still remains.Among the different Hsc70 isoforms, the sequence conservation reaches 93%-98% identity using the full-length Hsc70-5 as the basis for comparison (data not shown).Although these Hsc70 truncated isoforms are more closely related to the cytosolic full-length Hsc70-5 and hence a Neighbor-Joining analysis would predict they belong to the same cytosolic group, there is not enough sequence information to predict their correct subcellular localization.Nevertheless, the Hsc70-6 and Hsc70-8 isoforms contain an EEVD motif at its C-terminus that correspond to a plant Hsp70 cytoplasmlocalization signal (Guy and Li, 1998; Table 1) and, therefore, they may be assigned as a cytosolic Hsp70.
Similarly, for the other subfamilies, we identified C-terminal-covering reads for csHsp70-2 and mtHsp70-3, and also some reads restricted to N-terminal sequences for csHsp70-3, mtHsp70-2 and mtHsp70-4.We did not identify any read that could determine whether any of these terminal sequences completed each other to uncover a fully sequenced gene.Among the different mtHsp70 isoforms, the sequence identity varies from 82% to 98%, if the fulllength mtHsp70-1 is used as the basis for comparison.The isoforms from the chloroplast subfamily share 89 to 95% sequence identity and the two isofoms from the Hsp110/SSE-related subfamily are 95% identical.
Plant Hsp70s form a family of stress-related proteins that has been further subdivided into categories based on amino acid sequence homology and the subcellular locations of the proteins (Sung et al., 2001a).Although the size and complexity of the Citrus Hsp70 family described here closely resemble those described for this family in other plant species (Guy and Li, 1998;Sung et al., 2001b;Borges et al., 2001;Cagliari et al., 2005), the ultimate understanding of the entire complement of Citrus Hsp70 genes depends on the identification of the full-length sequences of all representatives of the family.

Tissue expression pattern of the Citrus Hsp70 genes
For insights into the diverse roles of Hsp70s and to determine whether the different Citrus isoforms are functionally redundant or divergent, we next conducted a digital northern analysis on the expression pattern of these identified Citrus genes in different organs and in response to abiotic and biotic stresses.We analyzed the frequency of the ESTs of the different Hsp70 isoforms and only those Hsp70s that were represented by at least five reads were considered (Table 1).Figure 2 shows a comparison of relative mRNA prevalence across representative organs.Except for the Hsc70-5 transcript that displayed a statistically significant prevalence in seeds as compared to the other organs analyzed, the transcripts of all the other isoforms were not detected in mature seeds.Similar transcriptional profiling has been observed for the members of the Arabidopsis family in which a cytosolic Hsp70 isoform (Hsc70-12) is predominantly expressed in seeds, whereas the luminal and organellar Hsp70 transcripts are barely detected in mature dry seeds (Sung et al., 2001b).The decline in the expression of Hsp70 genes at later stages of seed development may be offset by the increased expression of other chaperones as in the Citrus cytosolic Hsc70-5 isoform.Alternatively or additionally, it may implicate a minor role of Hsp70 genes in mature seeds.The expression of the luminal BiP has been monitored during seed development in several plant species (Carolino et al., 2003).In general, an increase in BiP expression during seed development parallels the accumulation of seed storage proteins, decreasing rapidly toward the end of seed maturation.This developmental regulation of BiP during seed maturation is coordinated with the cellular secretory activity of seeds and implies a  relevant role of BiP in newly-synthesized storage protein folding.
Although we did not observe statistically significant differences in the prevalence of other Hsp70 transcripts across the range of organs surveyed, BiP-1 and Hsc70-6 as well as csHsp70-1 and mtHsp70-1 displayed similar expression pattern in all organs, but their transcript levels predominated in flower.Other Hsp70 isoforms, such as csHsp70-3, Hsc70-4 and Hsp70-2, were dramatically underrepresented in all organs.Thus, considering the level of transcript abundance, several Hsp70 genes display a broad expression pattern consistent with a generalized housekeeping function, whereas other Hsp70 mRNAs are rare and may represent temperature-or other stress-induced forms.The tissue-specific libraries have representatives of different species (Citrus sinensis, Citrus reticulata, Citrus limonia, Citrus aurantium and Citrus aurantifolia) and although they were analyzed collectively, as required to obtain statistically meaningful data, we observed a differential expression of the Citrus Hsp70 family among the species (data not shown).The physiological implications of this finding are unknown and whether the different species display distinct tolerance levels to environmental stresses as a function of Hsp70 expression remains to be determined.

Hsp70 expression during biotic and abiotic stress
We were particularly interested in evaluating the expression of the Citrus Hsp70 gene family in response to physiological stresses that affect drastically the Citrus productivity, such as Xylella infection, viral infection and water deficit.The expression pattern of Hsp70 genes from Citrus sinensis and Citrus reticulata during the process of xylella infection is shown in Figure 3A.BiP-1, csHps70-3 and mtHsp70-1 were strongly induced during Xylella infection and their response was paralleled by the opposite trend with cytosolic Hsp70 representatives, in which the transcript levels declined upon infection.While induction of the csHps70-3 and mtHsp70-1 may be a direct result of oxidative stress caused by the bacterial infection, BiP induction may reflect the activation of the unfolded protein response (UPR) pathway.This signaling pathway, which is activated by ER stressors that cause the accumulation of unfolded proteins in the endoplasmic reticulum (ER) lu-  men, triggers the coordinated up-regulation of a set of folding enzymes and ER molecular chaperones, including BiP and the Ca +2 -binding protein calreticulin (Kaufman, 1999).
In order to determine whether Xylella infection provoked ER stress and hence activation of the UPR, we searched for the calreticulin ortholog in the CitEST database , as an additional representative of the UPR-activated genes (Martinez and Chrispeels, 2003).Consistent with activation of the UPR, the calreticulin gene is strongly up-regulated by Xylella infection in Citrus reticulata (data not shown) that is considered resistant to the bacterium.In contrast to the Hsp70 expression pattern elicited by Xylella infection, only the mtHsp70-1 transcript displayed a statistically significant increased level in the Citrus tristeza virus-infected Poncirus trifoliata leaves (Fig. 3B).The reason for this selective response to viral infection is not clear, but it may be related to an enhanced demand of chaperone function in the mitochondria of infected cells, linked to an exhaustive use of ATP-demanding biosynthetic pathways for replication and expression of the viral genome.Alternatively, it may represent a specific protection mechanism of plant cells against viral infection.
We also searched for Hsp70 transcript abundance in EST libraries prepared from water deficit-stressed root mRNA (Fig. 3C).Both mtHsp70-1 and Hsc70-5 mRNAs are selectively up-regulated in response to water dehydration.Induction of mtHsp70-1 may reflect a protective mechanism of plant cells against the deleterious effects caused by the excessive accumulation of reactive oxygen species (ROS) under dehydration (Bartels, 2001).The induction of Hsc70-5 under water deficit may be particularly and biologically relevant as it is also induced during late seed maturation (Fig. 2) when seed dehydration takes place under a desiccation-tolerant regime that retains seed viability.In fact, the pattern of Hsc70-5 expression resembles that of Lea (late embryogenesis abundant) and small Hsp genes involved in seed desiccation tolerance (Ingram and Bartels, 1996;Wehmeyer and Vierling, 2000).The accumulation of Lea proteins and smHsp occurs during seed maturation, desiccation and increases in vegetative tissue when plants are exposed to water deficit.The molecular mechanisms that confer desiccation tolerance and allow seeds to be dried without loss of viability are incompletely defined, although Lea proteins and small Hsps are believed to be involved (Ingram and Bartels, 1996;Wehmeyer and Vierling, 2000).Whether Citrus Hsc70-5 plays a specific role in seed desiccation tolerance or in the general protection of cellular components during plant cell dehydration will require further investigations.
amino acid signature for different subgroups of the Hsp 70 protein.aa:number of amino acid residues covered by the cluster/ Best hit full-length protein; ER: endoplasmic reticulum; ND: not determined.

Figure 1 -
Figure 1 -Evolutionary relationship (neighbor-joining analysis) of full length Citrus Hsp proteins with members from different subgroups of Arabidopsis thaliana Hsp 70 protein family.The dendrogram was constructed using the Clustal program.Each shaded area represents the five different subgroups from Hsp70 protein family.From the top: cytosol Hsp/Hsc70, BiP Endoplasmic Reticulum Hsp70, mitochondrial Hsp70, Plastid Hsp70, High molecular weight Hsp70 (Hsp110/SSE).The values on the branches indicate the number of bootstrap replicas supporting the branch.

Citrus HSP70 gene family 885 Figure 2 -
Figure 2 -Expression of the Citrus Hsp70 genes in the different tissues of Citrus.The expression pattern was calculated as indicated in the materials and methods and normalized according the library.An asterisk (*) indicates an 80% probability of increased Hsc70-5 expression in seeds as compared to the other organs analyzed.A plus sign (+) indicates more than 95% probability of increased expression of Hsc70-5 in seeds as compared to the seed expression of the other Hsp70 members analyzed.

Figure 3 -
Figure 3 -Expression of the Citrus Hsp70 genes in response to different stresses.(A) expression in response to Xylella infection.An asterisk (*) indicates an 80% probability of differential expression among the different infection stages.(B) Expression pattern in response to Citrus tristeza virus (CTV) infection.The asterisk (*) indicates an 80% probability for differential expression between infected and uninfected plants (C) Expression pattern in the Citrus root under drought stress.A plus sign (+) indicates more than 95% probability of increased expression in stressed roots as compared to unstressed roots.

Table 1 -
, which was constructed by comparing the Citrus full-length Hsp70 sequences with the Arabidopsis counterpart of each subgroup (http://www.arabidopsis.org).Each representative of the Citrus Hsp70 Citrus Hsp70 protein family.