Accessibility / Report Error

Survey of glycine-rich proteins (GRPs) in the Eucalyptus expressed sequence tag database (ForEST)

Abstract

The occurrence of quasi-repetitive glycine-rich peptides has been reported in different organisms. Glycine-rich regions are proposed to be involved in protein-protein interactions in some mammalian protein families. In plants, a set of glycine-rich proteins (GRPs) was characterized several years ago, and since then a wealth of new GRPs have been identified. GRPs may have very diverse sub-cellular localization and functions. The only common feature among all different GRPs is the presence of glycine-rich repeat domains. The expression of genes encoding GRPs is developmentally regulated, and also induced, in several plant genera, by physical, chemical and biological factors. In addition to the highly modulated expression, several GRPs also show tissue-specific localization. GRPs specifically expressed in xylem, phloem, epidermis, anther tapetum and roots have been described. In this paper, the structural and functional features of these proteins in Eucalyptus are summarized. Since this is the first description of GRPs in this species, particular emphasis has been given to the expression pattern of these genes by analyzing their abundance and prevalence in the different cDNA-libraries of the Eucalyptus Genome Sequencing Project Consortium (ForEST). The comparison of GRPs from Eucalyptus and other species is also discussed.

glycine-rich; GRP; Eucalyptus


RESEARCH ARTICLE

Survey of glycine-rich proteins (GRPs) in the Eucalyptus expressed sequence tag database (ForEST)

Silvia Nora BoccaI; Claudia MagioliI; Amanda MangeonI; Ricardo Magrani JunqueiraI; Vanessa CardealI; Rogério MargisI,II; Gilberto Sachetto-MartinsI

IUniversidade Federal do Rio de Janeiro, Instituto de Biologia, Departamento de Genética, Laboratório de Genética Molecular Vegetal, Rio de Janeiro, RJ, Brazil

IIUniversidade Federal do Rio de Janeiro, Instituto de Química, Departamento de Bioquímica, Rio de Janeiro, RJ, Brazil

Send correspondence to Send correspondence to Gilberto Sachetto-Martins Universidade Federal do Rio de Janeiro Ilha do Fundão, Instituto de Biologia Departamento de Genética Laboratório de Genética Molecular Vegetal CCS, Bloco A, sala A2-076 21944-970 Rio de Janeiro, RJ, Brazil Email: sachetto@biologia.ufrj.br

ABSTRACT

The occurrence of quasi-repetitive glycine-rich peptides has been reported in different organisms. Glycine-rich regions are proposed to be involved in protein-protein interactions in some mammalian protein families. In plants, a set of glycine-rich proteins (GRPs) was characterized several years ago, and since then a wealth of new GRPs have been identified. GRPs may have very diverse sub-cellular localization and functions. The only common feature among all different GRPs is the presence of glycine-rich repeat domains. The expression of genes encoding GRPs is developmentally regulated, and also induced, in several plant genera, by physical, chemical and biological factors. In addition to the highly modulated expression, several GRPs also show tissue-specific localization. GRPs specifically expressed in xylem, phloem, epidermis, anther tapetum and roots have been described. In this paper, the structural and functional features of these proteins in Eucalyptus are summarized. Since this is the first description of GRPs in this species, particular emphasis has been given to the expression pattern of these genes by analyzing their abundance and prevalence in the different cDNA-libraries of the Eucalyptus Genome Sequencing Project Consortium (ForEST). The comparison of GRPs from Eucalyptus and other species is also discussed.

Key words: glycine-rich, GRP, Eucalyptus.

Introduction

Glycine-rich proteins (GRPs) are characterized by the presence of domains that show little sequence conservation and are highly enriched in residues of the amino acid glycine. Typically, these Glycine-rich domains are arranged in (Gly)n-X repetitions. Although the first genes encoding GRPs have been isolated from plants, proteins with characteristic repetitive glycine stretches have been reported in a wide variety of organisms from cyanobacterias to animals (reviewed in Sachetto-Martins et al., 2000).

The structure and modulation of plant GRP genes have been intensively investigated showing that they are highly regulated during development as well as under the influence of several external stimuli. Also, in many cases, their expression pattern was demonstrated to be tissue-specific. These characteristics were the most intensively studied aspects of GRP genes since they point to the possible biotechnological application of their promoters.

Since the first reports describing plant GRPs as cell wall associated proteins (Showalter, 1993), many other GRPs with different domain organizations and sub-cellular localizations appeared in the literature. This diversity led to the concept that GRPs should not be considered as a family of related proteins but as a wide group of proteins that share a common structural domain (Sachetto-Martins et al., 2000).

The diverse but highly specific expression pattern of grp genes, taken together with the distinct sub-cellular localization of some GRP groups, clearly indicate that these proteins are implicated in several independent physiological processes (Condit, 1993; Keller and Heierli, 1994; Sachetto-Martins et al., 1995; Magioli et. al., 2001; Franco et al., 2002). Based on what is known about their general architecture, sequence motifs, sub-cellular localization, and gene expression pattern and modulation, some inferences can be made regarding their function.

GRPs can be classified into four major groups (Figure 1) based on their primary structure (reviewed in Sachetto-Martins et al., 2000 and Fusaro et al., 2001). GRPs from class I are know as classic GRPs. They may contain a signal peptide followed by a glycine-rich region with GGGX repeats. A structural function is attributed to proteins of this class due to their cell wall localization (Cassab, 1998). The class II GRPs may or may not have a signal peptide and contain a glycine-rich region followed by a cysteine-rich region at their C-terminus. For one member of this family, AtGRP-3, this cysteine-rich domain has been shown to interact with cell wall associated receptor kinases (WAKs) (Park et al., 2001). The class III GRP contains proteins with lower glycine content that show a great diversity of structures. The best known proteins from this class are oleosin GRPs. Oleosins are alkaline proteins on the surface of oil bodies in plants. They play a structural role in stabilizing the triacylglycerols of the oil bodies together with the phospholipid layer. Previous works demonstrate that many of the major pollen coat proteins are derived from an endoproteolytic cleavage of oleosin GRPs that originally accumulate within the large cytoplasmatic lipid bodies of tapetal cells (Ferreira et al., 1997; Murphy et al., 2001). GRPs from class IV are RNA-binding GRPs. Those GRPs may contain, besides the glycine-rich region, several motifs which include RNA-recognition motif, cold-shock domain and zinc fingers (Fusaro et al., 2001).


In this article, a search for GRPs in the Eucalyptus transcriptome is reported. Several GRPs were identified and classified into the major groups previously established. The survey was extended to proteins that, despite not being considered canonical GRPs, contain domains of limited extension that are rich in glycine.

Materials and Methods

Sequence data, alignment and phylogenetic analysis

Protein sequences of reported plant GRPs were used to query the ForEST expressed sequence tag (EST) database with the TBLASTN algorithm (Altschul et al., 1997). Since glycine-rich domains are low complexity sequences, the TBLASTN default parameters were used without filtering the query for low compositional complexity. The complete list of sequences used as baits include the 86 proteins reviewed in Sachetto-Martins et al. (2000), 8 sequences recently described from a complete survey of Arabidopsis glycine-rich RNA binding proteins (Lorkovic and Barta, 2002), a wheat cold shock domain GRP (Karlson et al., 2002), a Pinus taeda cell wall GRP (Allona et al., 1998), Arabidopsis UBA2 (Lambermon et al., 2002) and 4 Arabidopsis cold shock domain GRPs (Karlson and Imai, 2003). Additionally, several GRP sequences recently identified from a complete analysis of a sugarcane EST database were also selected to be used as baits (Fusaro et al., 2001). These sugarcane sequences belong to each of the different GRP classes and were chosen for being the less similar to other published GRPs among the complete sugarcane set. All GRP clusters found in Eucalyptus libraries were translated to obtain their putative protein sequences. When an evident frameshift was observed in the translation of the ORFs by an apparent sequencing error, a manual edition of the sequences was performed. Protein sequences obtained were used in a second round of TBLASTN search against the non-redundant protein database at the National Center for Biotechnology Information (NCBI) to identify their closest homologues. Additional domains were detected using the Prosite (http://bo.expasy.org/prosite) and Pfam (http://www.sanger.ac.uk/Software/Pfam/search.shtml) prediction programs. The possible presence of a signal peptide in the sequences was predicted with the signalP server (http://www.cbs.dtu.dk/services/SignalP).

Multiple alignments of proteins deduced from the ForEST clusters and bait sequences were performed using the ClustalW program (Thompson et al., 1994). Unrooted trees were calculated using the Molecular Evolutionary Genetics Analysis (MEGA) software (Kumar et al., 2000). The neighbor-joining and p-distance method were used with the pairwise deletion option for the treatment of amino acid gaps during the multiple alignment GRPs. For construction of the phylogenetic tree the confidence levels for the nodes were determined with 2000 replications using the Internal Branch test (Sitnikova et al., 1995).

Eucalyptus cDNA libraries

All Eucalyptus sequences used during this work were obtained from the Eucalyptus Genome Sequencing Project Consortium (ForEST) and derived from cDNA libraries specific to different Eucalyptus tissues, organs or conditions of growth (for detailed information see https://forests.esalq.usp.br/Librariesinfo.html). BK1 (stem from 8 year old E. grandis trees), CL1 (E. grandis dark-growth callus), CL2 (E. grandis light-growth callus), FB1 (flower buds, flowers and fruits), LV1 (young plant leaves), LV2 (leaves from adult trees with deficiency in phosphorous, boron), LV3 (leaves colonized by Thyrinteina), RT2 (roots from young plants), RT3 (roots from green houses cultivated young plants), RT4 (roots from water stress resistant young plants), RT5 (roots from water stress susceptible young plants), RT6 (roots from frost resistant and susceptible trees), SL1 (dark growth E. grandis seedlings exposed to 3 h of light), SL4 (dark growth E. globulus seedlings), SL5 (dark growth E. saligna seedlings), SL6 (dark growth E. urophylla seedlings), SL7 (dark growth E. grandis seedlings), SL8 (dark growth E. camaldulensis seedlings), ST1 (stem from young healthy plants), ST2 (stem from young plants susceptible to water stress, mRNAs between 0.6 to 2 kb), ST5 (stem from young healthy plants), ST6 (stem from young plants susceptible to water stress, mRNAs between 0.8 to 3 kb), ST7 (stem from frost-resistant and susceptible trees), WD2 (E. grandis wood).

Results and Discussion

Distribution of glycine-rich proteins genes on ForEST database

GRPs were previously subdivided into four major groups according to the presence of conserved domains and the pattern of sequence repeats. The four different classes of GRPs are shown in Table 1 and Figure 1. Three groups are based on the pattern of the glycine-rich repeats (class I, GGGX; class II, GGXXXGG; class III, GXGX) and the two other groups are based on the type of functional conserved motif (one sub-group from class III, the oleosin glycine-rich proteins and class IV, the RNA-binding GRPs).

The distribution of each EST sequence between the different ForEST libraries was also analyzed (see Tables 2 to 10). The ForEST database comprises 123,889 EST sequences, arranged in 33,080 clusters. These EST sequences (reads) came from 19 different cDNA libraries constructed from different plant tissues under different culture conditions. Since several GRP genes present tissue-specific expression in other plants, the distribution of the reads from each cluster per library was analyzed. All clusters that were found in only one or two libraries were considered as predominantly expressed in a tissue-specific pattern. Several clusters identified in this search presented this characteristic.

The search for genes encoding GRPs in Eucalyptus resulted in 153 potential genes (clusters) that were distributed in the classes mentioned above (Table 1). While no sequences were found to present the characteristic pattern of repeats GGXXXGG, our search retrieved a number of other Eucalyptus sequences having a mixed pattern of repeats (Table 6). Among these sequences, clusters with conserved motifs that characterize dehydrins were found (Table 7). As expected for an angiosperm with wet-type stigmas, no Eucalyptus ESTs with similarity to oleosin-GRPs were found.

The analysis was also extended to twelve other proteins that contain domains of limited extension that are rich in glycine even though these domains represented a small proportion of the complete protein (Table 10).

Eucalyptus clusters encoding GRPs with GGGX repeats

The repeats GGGX are frequently found in GRPs that present a high total content of glycines (40 to 70 %) distributed throughout the protein sequence (Table 2). This kind of GRP usually has a predicted signal peptide at their N-terminal end. The best characterized protein of this class is PvGRP1.8, a structural protein from bean specifically associated with the primary cell walls of elongating protoxylem elements (Keller et al., 1989). Recent studies using antibodies against PvGRP1.8 indicated that PvGRP1.8 form a three-dimensional protein network that stabilizes the protoxylem elements (Ryser and Keller, 1992; Ryser et al., 1997 and Ringli et al., 2001).

Thirty Eucalyptus clusters with GGGX repeats were found. Several clusters (11) encode GRPs that are highly enriched in histidine, resulting in a repetition pattern GGGH (Table 2). Fourteen clusters presented an apparent tissue specific expression, with 9 being expressed exclusively in one library. Interestingly, two clusters (EGEQWD 2247G05.g and EGEZWD2203C11.g) were observed only in libraries prepared from wood tissues making them interesting genes for study in relation to wood biogenesis.

As previously noted (Sachetto-Martins et al., 2000; Fusaro et al., 2001), this class of GRPs represents a rather heterogeneous set of proteins with sequence similarity limited to the repetitive glycine amino acids. The alignments obtained presented many gaps and regions with no sequence overlapping, which made the construction of a dendrogram impossible. The functional characterization of members of this class could help to establish a clear classification of these proteins.

Eucalyptus clusters encoding GRPs with C-terminal domains rich in cysteine

Some GRP proteins are grouped together based on the similarity of their N- and C-terminal domains with soybean nodulin 24 (Sandal et al., 1992). Usually, the C-terminal end of GRPs that are similar to nodulins are cysteine-rich and the glycine-rich repeats found in these sequences are GGXXXGG with Y, H, R, N or Q as the most frequent amino acids in the tripeptide between the glycine residues (Sachetto-Martins et al., 2000).

The direct interaction of AtGRP3, a protein belonging to this class of GRPs, with the cell wall-associated kinase WAK1 was recently demonstrated. The interaction occurs between the cysteine-rich C-terminal end of AtGRP3 and the extracellular domain of WAKs (Park et al., 2001). WAK1 is a member of the WAK receptor kinase family that links the plasma membrane to the extracellular matrix (Verica and He, 2002). WAK kinases are proposed to recognize different environmental signals through the interaction of their diverse extracellular domains with cell wall molecules and transduce those signals to the cell. Wak1 and Atgrp-3 are both induced by salicylic acid treatment. Moreover, exogenously added AtGRP-3 up-regulates the expression of Wak1, Atgrp-3 and PR-1 in Arabidopsis protoplasts. Taken together, this data suggest that AtGRP-3 regulates Wak1 function through binding to the cell wall domain of Wak1 and that the interaction of Wak1 with AtGRP-3 occurs in a pathogenesis-related process in planta (Park et al., 2001).

Ten GRPs containing C-terminal Cys-rich end were found in the ForEST database (Table 3). None of them presents the typical pattern of repetition GGXXXGG usually found in this group of GRPs. In order to analyze the similarities of these 10 sequences with the reported GRPs that are similar to nodulins, all the sequences were aligned and an unrooted tree was constructed (Figure 2). Seven clusters were found to be more related to petunia PtGRP-2 and tobacco gGRP-8; two other are closer to a group of GRPs sequences from Medicago sativa; and one seems to be more divergent from all the previously reported sequences of this group.


Eucalyptus clusters encoding GRPs with GXGX repeats

This last pattern of glycine repeats, GXGX, is generally observed in GRPs with an average glycine content of 20%. Similar to the GGGX group (Table 2) this GRP group shows a high degree of structural diversity and probably contains several different types of GRPs. In Eucalyptus, forty-six different clusters were identified encoding this type of GRP (Table 4 and Table 5).

As noticed for the Eucalyptus sequences with GGGX repeats, several sequences of this group are also rich in histidine, resulting in the repetition pattern GHGH. Three other clusters show Pro/Gly-rich sequences. Sequences that in addition to the glycine-rich domains are also enriched in different aminoacids (arginine, alanine or methionine) were also found (Table 4).

A predicted N-terminal signal sequence which may reflect their possible extracellular localization was observed in twelve clusters from the GXGX Eucalyptus GRPs (Table 4 and Table 5).

As occurs with all GRPs grouped only on the basis of their pattern of repeats, most of the GXGX GRP sequences comprise a heterogeneous group of proteins with no significant sequence similarity outside the Glycine-rich repetitive domains.

It is noteworthy that 3 GRP clusters with GHGH repeats share high sequence identity with a Gly/His-rich protein of an endosymbiotic fungus of Eucalyptus (Table 4). One could speculate that those sequences may represent fungal contamination in the plant mRNA population and should be considered as possible non-plant GRPs.

In several species, cell wall associated proteins with preferential expression in vascular tissues have been reported (Showalter, 1993). GRPs localized in vascular tissues are thought to provide elasticity and tensile strength during vascular development (Cassab, 1998) and most of the wood quality-related traits are linked to the properties of the cell wall during this process. Despite the economic importance of wood biogenesis, few reports exist to date on the role of cell wall associated proteins in the development of vasculature.

A GRP with GXGX repeats from Pinus taeda (Allona et al., 1998; Zhang et al., 2000), as well as its proposed orthologous in Pinus pinaster (Le Provost et al., 2003), were found to be differentially expressed in the xylem of different wood types. It has been proposed that both Pinus proteins, reported as GRPs, might be involved in the determination of wood properties (Le Provost et al., 2003). However, only the Pinus taeda protein (AAB66348) presents high glycine content with a pattern of GXGX repetitions. The protein from Pinus pinaster (AAF75823) was apparently misclassified as GRP on the basis of its partial similarity with the Pinus taeda sequence. Searching the ForEST database with the Pinus taeda GRP protein sequence allowed us to identify a closely related Eucalyptus cluster (EGJEST2212B07.g) with 61% similarity throughout 119 aminoacids, showing a high degree of similarity to the Arabidopsis gene At4g30460 (Table 5). Both Pinus and Eucalyptus proteins are rich in glycine and serine and present a predicted N-terminal signal peptide as expected for a putative cell-wall protein. The high degree of conservation between the Pinus and Eucalyptus sequences indicates that the Eucalyptus cluster identified may be the Pinus taeda ortholog and that this gene is an interesting candidate to be studied due to its possible involvement in wood biogenesis in conifers and angiosperm trees.

Eucalyptus clusters encoding GRPs with a mixed pattern of repeats

In addition to the classic repeats observed in the previous described plant GRPs, the ForEST database also contains a set of GRPs with a mixed pattern of repetition (Table 6).

Ten of them encode GRPs with GXGX repeats combined with domains that contain 8 to 15 tandem repeats of the pentapeptide GYPPX (where X is usually Q). Strictly, these proteins should be considered as glycine/proline-rich proteins (GPRPs). The motif XYPPX is found in a wide variety of proteins including annexin and the carboxy tail of certain rhodopsins. The motif was proposed to form polyproline beta-turn helices but its molecular function is unknown (Matsushima et al., 1990). Eucalyptus sequences with GYPPQ repeats may be functionally related to PtaADH1 (AF101786), a proline-rich sequence from Pinus taeda recently characterized as a cell wall structural protein with GYPQ repetitions. The observation that PtaADH1 mRNA is mainly expressed in vascular tissue and that its expression is modified in different types of wood led to a proposal that it may be involved in the process of wood biogenesis (Zhang et al., 2000).

Fifteen other sequences present a mixed pattern of GGGX and GXGX repeats, sharing identity with dehydrins (Table 7). Dehydrins are classified as the late embryogenesis abundant proteins group 2 (Wise, 2003). They are also termed responsive to abscisic acid (RAB). These proteins form a subset of evolutionarily conserved glycine-rich, hydrophilic proteins induced in maturing seeds or vegetative tissues following abscisic acid treatment as well as in response to salinity, dehydration or cold stress (reviewed in Allagulova et al., 2003). Dehydrins are characterized by the presence of a highly conserved Lys-rich 15 amino acids motif that appears repeated from 1 to 12 times in the C-terminus of the protein. This dehydrin motif, referred to as the K-segment (EKKGIMDKIKEKLPG), was found in 8 out of the 15 Eucalyptus GRP clusters that present sequence similarity with dehydrins (Table 7). The same clusters also present a conserved Ser stretch that is commonly found in many dehydrins and is thought to be involved in nuclear localization. The N-terminal sequence of many proteins of this group present a third conservative sequence termed the Y-segment (V/T DEYGNP).

It is known that some dehydrins are preferentially induced under specific stresses while others have a constitutive expression. Among the Eucalyptus GRPs identified as possible dehydrins, one cluster is strikingly over-expressed in libraries of stems of plants susceptible to dehydration (EGEQRT5201H10.g). Its closest similar sequence is RAB18, an A. thaliana dehydrin strongly induced both in water-stressed and ABA-treated plants but only slightly responsive to cold (Welin et al., 1994).

Eucalyptus clusters encoding RNA-binding GRPs

Several different types of plant RNA-binding GRPs have been identified. They contain an RNA-binding motif in their N-terminal half followed by a C-terminal region rich in glycine residues. Most of these proteins have the conserved RNA-binding motif termed RRM (RNA-Recognition Motif) encompassing 80-100 amino acid residues in which two short sequences, RNP-1 and RNP-2, are highly conserved regions (Alba and Pages, 1998). A different type of RNA-binding motif observed in the N-terminus of plant GRPs is the CSD (Cold-Shock Domain), with only the RNP-1 sequence conserved (Sachetto-Martins et al., 2000). In addition to their RNA-binding motifs, some GRPs contain a variable number of CCHC (CX2CX4HX4C) retroviral-like zinc-fingers inside the C-terminal glycine-rich region.

RNA binding GRPs can be classified in four different sub-classes based on the combination of the structural domains they present (Figure 1, Table 1). Proteins from the first sub-class show an RRM conserved motif at the N-terminal end, followed by a glycine-rich region with GGYGG repeats (Sachetto-Martins et al., 2000). GRPs from the second sub-class show a similar organization, but present a CCHC zinc finger inside their glycine-rich region. Proteins from the third sub-class are organized with a cold-shock domain at the N-terminus and a number of CCHC zinc fingers in their glycine-rich region that varies from 1 to 7 (Sachetto-Martins et al., 2000; Karlson and Imai, 2003). Finally, sub-class IV RNA-binding GRPs present two copies of the RRM motif followed by a C-terminal glycine-rich region, unlike the previously described proteins (Fusaro et al., 2001).

Twenty-seven Eucalyptus clusters encoding RNA-binding GRPs were identified and were classified according to the structural organization of their domains. In order to analyze the relationships between them and other related RNA-binding GRPs already characterized, a phylogenetic tree was constructed (Figure 3).


Sixteen clusters belong to the sub-class I (Table 8). Among these, 7 presented a pattern of expression limited to only one or two libraries indicating that they can probably represent tissue-specific genes. It was observed that sequences from Eucalyptus sub-class I of RNA-binding GRPs split into two separated groups (Figure 3). One group is closely related to the Arabidopsis glycine rich RNA-binding proteins (AtGR-RBPs) 2, 3, 4, 5 and 6. The other group is more related to genes coding for RNA binding proteins from Nicotiana sylvestris (RGP-1a, -1b and -1c), Nicotiana glutinosa (NgRBP) and Euphorbia (EeGRRBP-1 and -2). Interestingly, the N. sylvestris genes were reported to present tissue-specific alternative splicing and were suggested to produce truncated polypeptides as well as functional RNA-binding polypeptides (Hirose et al., 1993). The high number of clusters belonging to this sub-class of RNA-binding proteins and the close relationship they present may reflect that at least some of these sequences correspond to alternative spliced forms of the same gene. Both Eucalyptus groups of sub-class I RNA binding GRPs are more related to other previously reported sequences from dicot plants, while several sugarcane sequences included in the phylogenetic tree are preferentially related to sequences from monocot plants like Zea mays (MA16 and CHEM2) and Shorgum vulgare (S1 and S2).

RNA binding GRPs from sub-class II are the least abundant among all the RNA-binding GRPs and are apparently plant-specific (Lorkovic and Barta, 2002). The domain organization of these proteins presents a CCHC-type zinc finger inside the glycine-rich C-terminal domain in combination with the N-terminal RRM motif. Only two clusters were found in the ForEST database with these characteristics (Table 8). One of the clusters (EGJEFB 1029H07.g) is very similar to the tobacco nuclear protein RZ-1 (Hanano et al., 1996) while the other (EGSBSL 1048F09.g) has a close similarity with a still noncharacterized Arabidopsis protein (Table 8 and Figure 3).

Sub-class III RNA-binding GRPs were represented by 4 clusters in the ForEST database. Two of them were isolated from only one or two libraries corresponding to putative tissue-specific expressed genes (Table 8). Three clusters (EGJECL2215H02.g, EGEPRT3325H02.g and EGUT BK1006H11.g) grouped close to the Arabidopsis cold-induced proteins AtGRP-2 and AtGRP-2b, proteins that have two zinc fingers in their glycine-rich domains. The remaining cluster (EGEQFB1001F04) appears more related to two other sequences from Arabidopsis (At2g17870 and At4g36020) that were also shown to be cold-regulated (Karlson and Imai, 2003) but have a longer C-terminal end with 7 zinc fingers interspersed in the glycine-rich region. Two zinc fingers were observed in all the Eucalyptus sequences with the exception of one cluster that is incomplete in its C-terminal end which made the analysis of the zinc finger number of this cluster impossible.

Five Eucalyptus clusters encoding GRPs with multiple RRM domains were classified as belonging to sub-class IV (Table 8). Among them, two clusters (EGACST2105 B03.g and EGCEFB1016C10.g) share high similarity with Arabidopsis UBA1 proteins. Comparison analysis indicates that they group together with Arabidopsis UBA2c (Figure 3). UBA1 and UBA2 proteins bind RNA with specificity for oligouridylates in vitro and interact with UBP1, an hnRNP-like protein associated with poly(A)(+) RNA in the cell nucleus. It has been suggested that UBA proteins may act as components of a complex that recognizes U-rich sequences in plant 3'-UTRs, contributing to the stabilization of mRNAs in the nucleus (Lambermon et al., 2002). The three remaining clusters from the RNA-binding GRPs sub-class IV (EGCEST222E05.g, EGEQRT3201H05.g and EGUTLV1248B11.g) are similar to Arabidopsis heterogeneous nuclear ribonucleoproteins (hnRNPs), RNA-binding proteins that form complexes with RNA polymerase II transcripts and are proposed to regulate pre-mRNA processing (Krecic and Swanson, 1999). While metazoan hnRNPs have a Glycine-rich C-terminal domain in addition to the two N-terminal RRMs, only two out of the six Arabidopsis predicted hnRNPs have a C-terminal domain rich in glycine (Lorkovic and Barta, 2002). The only two sugarcane sequences identified as sub-class IV RNA-binding GRPs (Fusaro et al., 2001) grouped together with the hnRNP similar proteins.

In addition to sequences classified in the four previous described sub-classes of RNA-binding GRPs, 8 clusters encoding GRPs that present other conserved domains usually found in RNA-binding proteins were found in Eucalyptus (Table 9). One cluster (EGQHLV2253F10.g) has a conserved domain characteristic of Gar1, a small nucleolar RNP that possesses a typical glycine/arginine-rich domain and is required for pre-rRNA processing and pseudouridylation (Bagni and Lapeyre, 1998). Two clusters (EGEQFB1001B12.g and EGEPST6161C06.g) have a CCCH (CX8CX5CX3H) type zinc finger. It has been shown that different CCCH zinc finger-containing proteins interact with the 3' untranslated region of various mRNA. Three clusters (EGEQLV2200C06.g, EGEQRT3101G05.g and EGEQRT3300C03.g) were identified with a domain found in proteins that includes the HABP4 family proteins, and the PAI-1 mRNA-binding protein. HABP4 has been observed to bind hyaluronan as well as RNA, but the latter with a lower affinity. PAI-1 mRNA-binding protein specifically binds the mRNA of type-1 plasminogen activator inhibitor (PAI-1), and is thought to be involved in regulation of mRNA stability. Finally, one cluster (EGQHLV 2243F09.g) was found with the conserved LSM domain present in proteins that bind and stabilize snRNPs involved in pre-mRNA splicing.

Since proteins containing such domains as the unique RNA-binding motifs could not be predicted unequivocally as having an RNA-binding function, they were classified as putative RNA-binding GRPs. Particularly interesting is the cluster EGUTSL1044E03.g. It could be consider a true RNA-binding GRP since it presents an RRM motif, but unlike RNA-binding GRPs of classes I, II or IV this domain is located at the C-terminal end of the protein. The sequence with higher similarity to this cluster corresponds to a rice mRNA that encodes a glycine-rich protein with a C-terminal located RRM motif in combination with RanBP2 type zinc fingers at the N-terminal end. This kind of domain organization was never reported before for a GRP and could represent a new class of still uncharacterized RNA-binding GRPs. Since the Eucalyptus cluster is incomplete at the N-terminal the presence of zinc fingers could not be determined.

Eucalyptus clusters encoding proteins with glycine-rich domains

In addition to the GRPs showing glycine-rich domains with semi-repetitive structure described here, several proteins that present short domains with high glycine content and usually without a characteristic pattern of repetition were also found (Table 10). These proteins were classified as proteins with glycine-rich domains. Those clusters presented glycine-rich domains ranging from 32 to 130 aminoacids with 35-81% of glycine. Glycine-rich stretches shorter than 30 aminoacids were not included in this classification.

Out of the 16 Eucalyptus sequences that have glycine-rich domains in their structure, 7 are similar to known RNA binding proteins including the ribosomal RNA processing fibrillarin, several DEAD box RNA helicases, a nucleotide excision repair protein, a bHLH transcriptional regulator and a nucleolin-like protein. The presence of a short glycine-rich domain in a number of proteins involved in RNA metabolism suggests that this domain may play a role in the RNA binding function of these proteins.

Concluding Remarks

Although the number of genes encoding GRPs in plants is large up to date, only a few GRPs have been characterized so far and their functions remain speculative. However, it is becoming clear that GRPs exert important roles in very diverse processes such as signal transduction, stress response, transcriptional regulation and development.

The highly specific but diverse expression pattern of grp genes, taken together with the distinct sub-cellular localization of some GRP groups, clearly indicate that these proteins are implicated in several independent physiological processes. Notwithstanding the absence of a clear definition of the role of GRPs in plant cells, studies conducted with these proteins have provided new and interesting insights on the molecular and cell biology of plants. Complexly regulated promoters and distinct mechanisms of gene expression regulation have been demonstrated (Keller and Heierli, 1994; Franco et al., 2002). New protein targeting pathways, as well as the exportation of GRPs from different cell types have been discovered (Ryser et al., 1997; Murphy and Ross, 1998). These data show that GRPs can be useful markers for many physiological processes and/or models to improve the understanding of distinct aspects of plant biology (Sachetto-Martins et al., 2000). The results obtained here point to interesting roles for GRPs in plant physiology. The characterization of the grp genes in Eucalyptus could lead to new strategies for the manipulation of growth and stress signaling in this culture.

Acknowledgments

S.N.B is the recipient of a CNPq post-doctoral fellowship. C.M. is a recipient of the CAPES Prodoc fellowship. V.C.J. is supported by the PIBIC-CNPq fellowship. A.M. and R.M.J. were supported by Master degree fellowships from CAPES. G.S.M. are indebted to the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Funação Carlos Chagas de Amparo a Pesquisa do Rio de Janeiro (FAPERJ) for financial support.

Received: May 31, 2004; Accepted: April 1, 2005.

Associate Editor: Claudia Monteiro-Vitorello

  • Albà MM, Culiáñez-Macià FA, Goday A, Freire MA, Nadal, B and Pagès M (1994) The maize RNA-binding protein, MA16, is a nucleolar protein located in the dense fibrillar component. Plant J 6:825-834.
  • Albà MM and Pagès M (1998) Plant proteins containing the RNA-recognition motif. Trends Plant Sci 3:15-21.
  • Allagulova ChR, Gimalov FR, Shakirova FM, and Vakhitov VA (2003) The Plant Dehydrins: Structure and Putative Functions. Biochemistry (Mosc) 68:945-951.
  • Allona I, Quinn M, Shoop I, Swope K, St Cyr S, Carlis J, Rield J, Retzel E, Campbell M, Sedero R and Whetten RW (1998) Analysis of xylem formation in pine by cDNA sequencing. Proc Natl Acad Sci USA 95:9693-9698.
  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ (1997) Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25:3389-3402.
  • Bagni C and Lapeyre B (1998) Gar1p binds to the small nucleolar RNAs snR10 and snR30 in vitro through a nontypical RNA binding element. J Biol Chem 273:10868-10873.
  • Cassab GI (1998) Plant cell wall proteins. Annu Rev Plant Physiol Plant Mol Biol 49:281-309.
  • Condit CM and Meagher RB (1986) A gene encoding a novel glycine-rich structural protein of petunia. Nature 323:178-181.
  • Condit CM (1993) Developmental expression and localization of petunia glycine-rich protein 1. Plant Cell 5:277-288.
  • de Oliveira DE, Franco LO, Simoens C, Seurink J, Coppieters J, Botterman J and van Montagu M (1993) Inflorescence-specific genes from Arabidopsis thaliana encoding glycine-rich proteins. Plant J 3:495-507.
  • Ferreira MA, Almeira-Engler J, Miguens FC, van Montagu M, Engler G and de Oliveira DE (1997) Oleosin gene expression in Arabidopsis thaliana coincides with accumulation of lipids in plastids and cytoplasmic bodies. Plant Physiol Biochem 35:729-739.
  • Franco LO, de O Manes CL, Hamidi S, Sachetto-Martins G and de Oliveira E (2002) Distal regulatory regions restrict the expression of cis-linked genes to the tapetal cells. FEBS Lett 517:13-18.
  • Freire MA and Pages M (1995) Functional characterization of the maize RNA binding protein MA16. Plant Mol Biol 29:797-807.
  • Fusaro A, Mangeon A, Magrani Junqueira R, Benício Rocha CA, Cardoso Coutinho T, Margis R and Sachetto-Martins G (2001) Classification, expression pattern and comparative analysis of sugarcane expressed sequences tags (ESTs) encoding glycine-rich proteins (GRPs). Genet Mol Biol 24:263-273.
  • Hanano S, Sugita M and Sugiura M (1996) Isolation of a novel RNA-binding protein and its association with a large ribonucleoprotein particle present in the nucleoplasm of tobacco cells. Plant Mol Biol 31:57-68.
  • Hirose T, Sugita M and Sugiura M (1993) cDNA structure, expression and nucleic acid-binding properties of three RNA-binding proteins in tobacco: Occurrence of tissue-specific alternative splicing. Nucleic Acids Res 21:3981-3987.
  • Karlson D, Nakaminami K, Toyomasu T and Imai R (2002) A cold-regulated nucleic acid-binding protein of winter wheat shares a domain with bacterial cold-shock proteins. J Biol Chem 277:35248-35256.
  • Karlson D and Imai R (2003) Conservation of the cold shock domain protein family in plants. Plant Physiol 131:12-15.
  • Keller B, Schmid J and Lamb CJ (1989) Vascular expression of a bean cell wall glycine-rich protein - b-glucuronidase gene fusion in transgenic tobacco. Embo J 8:1309-1314.
  • Keller B and Heierli D (1994) Vascular expression of the grp1.8 promoter is controlled by three specific regulatory elements and one unspecific activating sequence. Plant Mol Biol 26:747-756.
  • Krecic AM and Swanson MS (1999) hnRNP complexes: Composition, structure, and function. Curr Opin Cell Biol 11:363-371.
  • Kumar S, Tamura K, Jacobsen I and Nei M (2000) MEGA2: Molecular Evolutionary Genetics Analysis, version 2.0. Pennsylvania and Arizona State Universities, University Park, Pennsylvania and Tempe, Arizona.
  • Lambermon MH, Fu Y, Wieczorek Kirk DA, Dupasquier M, Filipowicz W and Lorkovic ZJ (2002) UBA1 and UBA2, two proteins that interact with UBP1, a multifunctional effector of pre-mRNA maturation in plants. Mol Cell Biol 22:4346-4357.
  • Le Provost G, Paiva J, Pot D, Brach J and Plomion C (2003) Seasonal variation in transcript accumulation in wood-forming tissues of maritime pine (Pinus pinasterAit.) with emphasis on a cell wall glycine-rich protein. Planta 217:820-830.
  • Lorkovic ZJ and Barta A (2002) Genome analysis: RNA recognition motif (RRM) and K homology (KH) domain RNA-binding proteins from the flowering plant Arabidopsis thaliana Nucleic Acids Res 30:623-635.
  • Magioli C, Barrôco RM, Benício Rocha CA, de Santiago-Fernandes LD, Mansur E, Engler G, Margis-Pinheiro M and Sachetto-Martins G (2001) Somatic embryo formation in Arabidopsis and eggplant is associated with expression of a glycine-rich protein gene (Atgrp-5). Plant Sci 161:559-567.
  • Matsushima N, Creutz CE and Kretsinger RH (1990) Polyproline, beta-turn helices. Novel secondary structures proposed for the tandem repeats within rhodopsin, synaptophysin, synexin, gliadin, RNA polymerase II, hordein, and gluten. Proteins 7:125-155.
  • Murphy DJ and Ross JHE (1998) Biosynthesis, targeting and processing of oleosin-like proteins, which are major pollen coat components in Brassica napus Plant J 13:1-16.
  • Murphy DJ, Hernández-Pinzón I and Patel K (2001) Role of lipid bodies and lipid-body proteins in seeds and other tissues. J Plant Physiol 158:471-478.
  • Ni Z, Sun Q, Liu Z, Wu L and Wang X (2000) Identification of a hybrid-specific expressed gene encoding novel RNA-binding protein in wheat seedling leaves using differential display of mRNA. Mol Gen Genet 263:934-938.
  • Obokata J, Ohme M and Hayashida N (1991) Nucleotide sequence of a cDNA clone encoding a putative glycine-rich protein of 19.7 kDa in Nicotiana sylvestris Plant Mol Biol 17:953-955.
  • Park AR, Cho SK, Yun UJ, Jin MY, Lee SH, Sachetto-Martins G and Park OK (2001) Interaction of the Arabidopsis Receptor Protein Kinase Wak1 with a Glycine-rich Protein, AtGRP-3. J Biol Chem 276:26688-2669.
  • Ringli C, Keller B and Ryser U (2001) Glycine-rich proteins as structural components of plant cell walls. Cell Mol Life Sci 58:1430-1441.
  • Ryser U and Keller B (1992) Ultrastructural localization of bean glycine-rich protein in unlignified primary walls of protoxylem cells. Plant Cell 4:773-783.
  • Ryser U, Schorderet M, Zhao GF, Studer D, Ruel K, Hauf G and Keller B (1997) Structural cell wall proteins in protoxylem development: Evidence for a repair process mediated by a glycine-rich protein. Plant J 12:97-111.
  • Sachetto-Martins G, Fernandes LD, Felix DB and de Oliveira DE (1995) Preferential transcriptional activity of a glycine-rich protein gene from Arabidopsis thaliana in protoderm derived cells. Int J Plant Sci 156:460-470.
  • Sachetto-Martins G, Franco LO and de Oliveira DE (2000) Plant glycine-rich proteins: A family or just proteins with a common motif? Biochim Biophys Acta 1492:1-14.
  • Sandal NN, Bojsen K, Richter H, Sengupta-Gopalan C and Marcker KA (1992) The nodulin 24 protein family shows similarity to a family of glycine-rich plant proteins. Plant Mol Biol 18:607-610.
  • Showalter AM (1993) Structure and function of plant cell wall proteins. Plant Cell 5:9-23.
  • Sitnikova T, Rzhetsky A and Nei M (1995) Interior-branch and bootstrap tests of phylogenetic trees. Mol Biol Evol 12:319-333.
  • Thompson JD, Higgins DG and Gibson TJ (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673-4680.
  • Verica JA and He ZH (2002) The cell wall-associated kinase (WAK) and WAK-like kinase gene family. Plant Physiol 129:455-459.
  • Welin BV, Olson A, Nylander M and Palva ET (1994) Characterization and differential expression of dhn/lea/rab-like genes during cold acclimation and drought stress in Arabidopsis thaliana Plant Mol Biol 26:131-144.
  • Zhang YI, Sederoff R and Allona I (2000) Differential expression of gene encoding cell wall proteins in vascular tissues from vertical and bent loblolly pine trees. Tree Physiol 20:450-457.
  • Send correspondence to

    Gilberto Sachetto-Martins
    Universidade Federal do Rio de Janeiro
    Ilha do Fundão, Instituto de Biologia
    Departamento de Genética
    Laboratório de Genética Molecular Vegetal
    CCS, Bloco A, sala A2-076
    21944-970 Rio de Janeiro, RJ, Brazil
    Email:
  • Publication Dates

    • Publication in this collection
      04 Jan 2006
    • Date of issue
      2005

    History

    • Accepted
      01 Apr 2005
    • Received
      31 May 2004
    Sociedade Brasileira de Genética Rua Cap. Adelmio Norberto da Silva, 736, 14025-670 Ribeirão Preto SP Brazil, Tel.: (55 16) 3911-4130 / Fax.: (55 16) 3621-3552 - Ribeirão Preto - SP - Brazil
    E-mail: editor@gmb.org.br