Survey of glycine-rich proteins (GRPs) in the Eucalyptus expressed sequence tag database (ForEST)

The occurrence of quasi-repetitive glycine-rich peptides has been reported in different organisms. Glycine-rich regions are proposed to be involved in protein-protein interactions in some mammalian protein families. In plants, a set of glycine-rich proteins (GRPs) was characterized several years ago, and since then a wealth of new GRPs have been identified. GRPs may have very diverse sub-cellular localization and functions. The only common feature among all different GRPs is the presence of glycine-rich repeat domains. The expression of genes encoding GRPs is developmentally regulated, and also induced, in several plant genera, by physical, chemical and biological factors. In addition to the highly modulated expression, several GRPs also show tissue-specific localization. GRPs specifically expressed in xylem, phloem, epidermis, anther tapetumand roots have been described. In this paper, the structural and functional features of these proteins in Eucalyptusare summarized. Since this is the first description of GRPs in this species, particular emphasis has been given to the expression pattern of these genes by analyzing their abundance and prevalence in the different cDNA-libraries of the Eucalyptus Genome Sequencing Project Consortium (ForEST). The comparison of GRPs from Eucalyptusand other species is also discussed. The were calculated using method and bootstrap test with The analysis was performed on the of


Introduction
Glycine-rich proteins (GRPs) are characterized by the presence of domains that show little sequence conservation and are highly enriched in residues of the amino acid glycine. Typically, these Glycine-rich domains are arranged in (Gly)n-X repetitions. Although the first genes encoding GRPs have been isolated from plants, proteins with characteristic repetitive glycine stretches have been reported in a wide variety of organisms from cyanobacterias to animals (reviewed in Sachetto-Martins et al., 2000).
The structure and modulation of plant GRP genes have been intensively investigated showing that they are highly regulated during development as well as under the influence of several external stimuli. Also, in many cases, their expression pattern was demonstrated to be tissue-specific. These characteristics were the most intensively studied aspects of GRP genes since they point to the possible biotechnological application of their promoters.
Since the first reports describing plant GRPs as cell wall associated proteins (Showalter, 1993), many other GRPs with different domain organizations and sub-cellular localizations appeared in the literature. This diversity led to the concept that GRPs should not be considered as a family of related proteins but as a wide group of proteins that share a common structural domain (Sachetto-Martins et al., 2000).
The diverse but highly specific expression pattern of grp genes, taken together with the distinct sub-cellular localization of some GRP groups, clearly indicate that these proteins are implicated in several independent physiological processes (Condit, 1993;Keller and Heierli, 1994; GRPs can be classified into four major groups (Figure 1) based on their primary structure (reviewed in Sachetto-Martins et al., 2000 andFusaro et al., 2001). GRPs from class I are know as classic GRPs. They may contain a signal peptide followed by a glycine-rich region with GGGX repeats. A structural function is attributed to proteins of this class due to their cell wall localization (Cassab, 1998). The class II GRPs may or may not have a signal peptide and contain a glycine-rich region followed by a cysteine-rich region at their C-terminus. For one member of this family, AtGRP-3, this cysteine-rich domain has been shown to interact with cell wall associated receptor kinases (WAKs) (Park et al., 2001). The class III GRP contains proteins with lower glycine content that show a great diversity of structures. The best known proteins from this class are oleosin GRPs. Oleosins are alkaline proteins on the surface of oil bodies in plants. They play a structural role in stabilizing the triacylglycerols of the oil bodies together with the phospholipid layer. Previous works demonstrate that many of the major pollen coat proteins are derived from an endoproteolytic cleavage of oleosin GRPs that originally accumulate within the large cytoplasmatic lipid bodies of tapetal cells (Ferreira et al., 1997;Murphy et al., 2001). GRPs from class IV are RNA-binding GRPs. Those GRPs may contain, besides the glycine-rich region, several motifs which include RNA-recognition motif, cold-shock domain and zinc fingers (Fusaro et al., 2001).
In this article, a search for GRPs in the Eucalyptus transcriptome is reported. Several GRPs were identified and classified into the major groups previously established. The survey was extended to proteins that, despite not being considered canonical GRPs, contain domains of limited extension that are rich in glycine.

Materials and Methods
Sequence data, alignment and phylogenetic analysis Protein sequences of reported plant GRPs were used to query the ForEST expressed sequence tag (EST) database with the TBLASTN algorithm (Altschul et al., 1997). Since glycine-rich domains are low complexity sequences, the TBLASTN default parameters were used without filtering the query for low compositional complexity. The complete list of sequences used as baits include the 86 proteins reviewed in Sachetto-Martins et al. (2000), 8 sequences recently described from a complete survey of Arabidopsis glycine-rich RNA binding proteins (Lorkovic and Barta, 2002), a wheat cold shock domain GRP (Karlson et al., 2002), a Pinus taeda cell wall GRP (Allona et al., 1998), Bocca et al. 609 Arabidopsis UBA2 (Lambermon et al., 2002) and 4 Arabidopsis cold shock domain GRPs (Karlson and Imai, 2003). Additionally, several GRP sequences recently identified from a complete analysis of a sugarcane EST database were also selected to be used as baits (Fusaro et al., 2001). These sugarcane sequences belong to each of the different GRP classes and were chosen for being the less similar to other published GRPs among the complete sugarcane set. All GRP clusters found in Eucalyptus libraries were translated to obtain their putative protein sequences. When an evident frameshift was observed in the translation of the ORFs by an apparent sequencing error, a manual edition of the sequences was performed. Protein sequences obtained were used in a second round of TBLASTN search against the non-redundant protein database at the National Center for Biotechnology Information (NCBI) to identify their closest homologues. Additional domains were detected using the Prosite (http://bo.expasy.org/prosite) and Pfam (http://www.sanger.ac.uk/Software/Pfam/search. shtml) prediction programs. The possible presence of a signal peptide in the sequences was predicted with the signalP server (http://www.cbs.dtu.dk/services/SignalP).
Multiple alignments of proteins deduced from the ForEST clusters and bait sequences were performed using the ClustalW program (Thompson et al., 1994). Unrooted trees were calculated using the Molecular Evolutionary Genetics Analysis (MEGA) software (Kumar et al., 2000). The neighbor-joining and p-distance method were used with the pairwise deletion option for the treatment of amino acid gaps during the multiple alignment GRPs. For construction of the phylogenetic tree the confidence levels for the nodes were determined with 2000 replications using the Internal Branch test (Sitnikova et al., 1995).

Results and Discussion
Distribution of glycine-rich proteins genes on ForEST database GRPs were previously subdivided into four major groups according to the presence of conserved domains and the pattern of sequence repeats. The four different classes of GRPs are shown in Table 1 and Figure 1. Three groups are based on the pattern of the glycine-rich repeats (class I, GGGX; class II, GGXXXGG; class III, GXGX) and the two other groups are based on the type of functional conserved motif (one sub-group from class III, the oleosin glycine-rich proteins and class IV, the RNA-binding GRPs).
The distribution of each EST sequence between the different ForEST libraries was also analyzed (see Tables 2 to 10). The ForEST database comprises 123,889 EST sequences, arranged in 33,080 clusters. These EST sequences (reads) came from 19 different cDNA libraries constructed from different plant tissues under different culture conditions. Since several GRP genes present tissue-specific expression in other plants, the distribution of the reads from each cluster per library was analyzed. All clusters that were found in only one or two libraries were considered as pre-      dominantly expressed in a tissue-specific pattern. Several clusters identified in this search presented this characteristic.
The search for genes encoding GRPs in Eucalyptus resulted in 153 potential genes (clusters) that were distributed in the classes mentioned above (Table 1). While no sequences were found to present the characteristic pattern of repeats GGXXXGG, our search retrieved a number of other Eucalyptus sequences having a mixed pattern of repeats (Table 6). Among these sequences, clusters with conserved motifs that characterize dehydrins were found (Table 7). As expected for an angiosperm with wet-type stigmas, no Eucalyptus ESTs with similarity to oleosin-GRPs were found.
The analysis was also extended to twelve other proteins that contain domains of limited extension that are rich in glycine even though these domains represented a small proportion of the complete protein (Table 10).

Eucalyptus clusters encoding GRPs with GGGX repeats
The repeats GGGX are frequently found in GRPs that present a high total content of glycines (40 to 70 %) distributed throughout the protein sequence (Table 2). This kind Bocca et al. 615  of GRP usually has a predicted signal peptide at their N-terminal end. The best characterized protein of this class is PvGRP1.8, a structural protein from bean specifically associated with the primary cell walls of elongating protoxylem elements (Keller et al., 1989). Recent studies using antibodies against PvGRP1.8 indicated that PvGRP1.8 form a three-dimensional protein network that stabilizes the protoxylem elements (Ryser and Keller, 1992;Ryser et al., 1997 andRingli et al., 2001). Thirty Eucalyptus clusters with GGGX repeats were found. Several clusters (11) encode GRPs that are highly enriched in histidine, resulting in a repetition pattern GGGH (Table 2). Fourteen clusters presented an apparent tissue specific expression, with 9 being expressed exclusively in one library. Interestingly, two clusters (EGEQWD 2247G05.g and EGEZWD2203C11.g) were observed only in libraries prepared from wood tissues making them interesting genes for study in relation to wood biogenesis.
As previously noted (Sachetto-Martins et al., 2000;Fusaro et al., 2001), this class of GRPs represents a rather heterogeneous set of proteins with sequence similarity limited to the repetitive glycine amino acids. The alignments obtained presented many gaps and regions with no sequence overlapping, which made the construction of a dendrogram impossible. The functional characterization of members of this class could help to establish a clear classification of these proteins.

Eucalyptus clusters encoding GRPs with C-terminal domains rich in cysteine
Some GRP proteins are grouped together based on the similarity of their N-and C-terminal domains with soybean nodulin 24 (Sandal et al., 1992). Usually, the Cterminal end of GRPs that are similar to nodulins are cysteine-rich and the glycine-rich repeats found in these sequences are GGXXXGG with Y, H, R, N or Q as the most frequent amino acids in the tripeptide between the glycine residues (Sachetto-Martins et al., 2000).
The direct interaction of AtGRP3, a protein belonging to this class of GRPs, with the cell wall-associated kinase WAK1 was recently demonstrated. The interaction occurs between the cysteine-rich C-terminal end of AtGRP3 and the extracellular domain of WAKs (Park et al., 2001). WAK1 is a member of the WAK receptor kinase family that links the plasma membrane to the extracellular matrix (Verica and He, 2002). WAK kinases are proposed to recognize different environmental signals through the interaction of their diverse extracellular domains with cell wall molecules and transduce those signals to the cell. Wak1 and Atgrp-3 are both induced by salicylic acid treatment. Moreover, exogenously added AtGRP-3 up-regulates the expression of Wak1, Atgrp-3 and PR-1 in Arabidopsis protoplasts. Taken together, this data suggest that AtGRP-3 regulates Wak1 function through binding to the cell wall 616 Eucalyptus glycine-rich proteins  domain of Wak1 and that the interaction of Wak1 with AtGRP-3 occurs in a pathogenesis-related process in planta (Park et al., 2001). Ten GRPs containing C-terminal Cys-rich end were found in the ForEST database (Table 3). None of them presents the typical pattern of repetition GGXXXGG usually found in this group of GRPs. In order to analyze the similarities of these 10 sequences with the reported GRPs that are similar to nodulins, all the sequences were aligned and an unrooted tree was constructed (Figure 2). Seven clusters were found to be more related to petunia PtGRP-2 and tobacco gGRP-8; two other are closer to a group of GRPs sequences from Medicago sativa; and one seems to be more divergent from all the previously reported sequences of this group.

Eucalyptus clusters encoding GRPs with GXGX repeats
This last pattern of glycine repeats, GXGX, is generally observed in GRPs with an average glycine content of 20%. Similar to the GGGX group (Table 2) this GRP group shows a high degree of structural diversity and probably contains several different types of GRPs. In Eucalyptus, forty-six different clusters were identified encoding this type of GRP (Table 4 and Table 5).
As noticed for the Eucalyptus sequences with GGGX repeats, several sequences of this group are also rich in histidine, resulting in the repetition pattern GHGH. Three other clusters show Pro/Gly-rich sequences. Sequences that in addition to the glycine-rich domains are also enriched in different aminoacids (arginine, alanine or methionine) were also found (Table 4).
A predicted N-terminal signal sequence which may reflect their possible extracellular localization was observed in twelve clusters from the GXGX Eucalyptus GRPs (Table 4 and Table 5).
As occurs with all GRPs grouped only on the basis of their pattern of repeats, most of the GXGX GRP sequences comprise a heterogeneous group of proteins with no significant sequence similarity outside the Glycine-rich repetitive domains.
It is noteworthy that 3 GRP clusters with GHGH repeats share high sequence identity with a Gly/His-rich protein of an endosymbiotic fungus of Eucalyptus (Table 4). One could speculate that those sequences may represent fungal contamination in the plant mRNA population and should be considered as possible non-plant GRPs.
In several species, cell wall associated proteins with preferential expression in vascular tissues have been reported (Showalter, 1993). GRPs localized in vascular tissues are thought to provide elasticity and tensile strength during vascular development (Cassab, 1998) and most of the wood quality-related traits are linked to the properties of the cell wall during this process. Despite the economic importance of wood biogenesis, few reports exist to date on the role of cell wall associated proteins in the development of vasculature.
the Arabidopsis gene At4g30460 (Table 5). Both Pinus and Eucalyptus proteins are rich in glycine and serine and present a predicted N-terminal signal peptide as expected for a putative cell-wall protein. The high degree of conservation between the Pinus and Eucalyptus sequences indicates that the Eucalyptus cluster identified may be the Pinus taeda ortholog and that this gene is an interesting candidate to be studied due to its possible involvement in wood biogenesis in conifers and angiosperm trees.

Eucalyptus clusters encoding GRPs with a mixed pattern of repeats
In addition to the classic repeats observed in the previous described plant GRPs, the ForEST database also contains a set of GRPs with a mixed pattern of repetition ( Table 6).
Ten of them encode GRPs with GXGX repeats combined with domains that contain 8 to 15 tandem repeats of the pentapeptide GYPPX (where X is usually Q). Strictly, these proteins should be considered as glycine/proline-rich proteins (GPRPs). The motif XYPPX is found in a wide variety of proteins including annexin and the carboxy tail of certain rhodopsins. The motif was proposed to form polyproline beta-turn helices but its molecular function is unknown (Matsushima et al., 1990). Eucalyptus sequences with GYPPQ repeats may be functionally related to PtaADH1 (AF101786), a proline-rich sequence from Pinus taeda recently characterized as a cell wall structural protein with GYPQ repetitions. The observation that PtaADH1 mRNA is mainly expressed in vascular tissue and that its expression is modified in different types of wood led to a proposal that it may be involved in the process of wood biogenesis (Zhang et al., 2000).
Fifteen other sequences present a mixed pattern of GGGX and GXGX repeats, sharing identity with dehydrins (Table 7). Dehydrins are classified as the late embryogenesis abundant proteins group 2 (Wise, 2003). They are also termed responsive to abscisic acid (RAB). These proteins form a subset of evolutionarily conserved glycinerich, hydrophilic proteins induced in maturing seeds or vegetative tissues following abscisic acid treatment as well as in response to salinity, dehydration or cold stress (reviewed in Allagulova et al., 2003). Dehydrins are characterized by the presence of a highly conserved Lys-rich 15 amino acids motif that appears repeated from 1 to 12 times in the C-terminus of the protein. This dehydrin motif, referred to as the K-segment (EKKGIMDKIKEKLPG), was found in 8 out of the 15 Eucalyptus GRP clusters that present sequence similarity with dehydrins ( Table 7). The same clusters also present a conserved Ser stretch that is commonly found in many dehydrins and is thought to be involved in nuclear localization. The N-terminal sequence of many proteins of this group present a third conservative sequence termed the Y-segment (V/T DEYGNP).
It is known that some dehydrins are preferentially induced under specific stresses while others have a constitutive expression. Among the Eucalyptus GRPs identified as possible dehydrins, one cluster is strikingly over-expressed in libraries of stems of plants susceptible to dehydration (EGEQRT5201H10.g). Its closest similar sequence is RAB18, an A. thaliana dehydrin strongly induced both in water-stressed and ABA-treated plants but only slightly responsive to cold (Welin et al., 1994). Bocca et al. 619

Eucalyptus clusters encoding RNA-binding GRPs
Several different types of plant RNA-binding GRPs have been identified. They contain an RNA-binding motif in their N-terminal half followed by a C-terminal region rich in glycine residues. Most of these proteins have the conserved RNA-binding motif termed RRM (RNA--Recognition Motif) encompassing 80-100 amino acid residues in which two short sequences, RNP-1 and RNP-2, are highly conserved regions (Alba and Pages, 1998). A different type of RNA-binding motif observed in the N-terminus of plant GRPs is the CSD (Cold-Shock Domain), with only the RNP-1 sequence conserved (Sachetto-Martins et al., 2000). In addition to their RNA-binding motifs, some GRPs contain a variable number of CCHC (CX 2 CX 4 HX 4 C) retroviral-like zinc-fingers inside the C-terminal glycinerich region.

620
Eucalyptus glycine-rich proteins RNA binding GRPs can be classified in four different sub-classes based on the combination of the structural domains they present (Figure 1, Table 1). Proteins from the first sub-class show an RRM conserved motif at the Nterminal end, followed by a glycine-rich region with GGYGG repeats (Sachetto-Martins et al., 2000). GRPs from the second sub-class show a similar organization, but present a CCHC zinc finger inside their glycine-rich region. Proteins from the third sub-class are organized with a coldshock domain at the N-terminus and a number of CCHC zinc fingers in their glycine-rich region that varies from 1 to 7 (Sachetto-Martins et al., 2000;Karlson and Imai, 2003). Finally, sub-class IV RNA-binding GRPs present two copies of the RRM motif followed by a C-terminal glycine-rich region, unlike the previously described proteins (Fusaro et al., 2001).
Twenty-seven Eucalyptus clusters encoding RNAbinding GRPs were identified and were classified according to the structural organization of their domains. In order to analyze the relationships between them and other related RNA-binding GRPs already characterized, a phylogenetic tree was constructed (Figure 3).
Sixteen clusters belong to the sub-class I (Table 8). Among these, 7 presented a pattern of expression limited to only one or two libraries indicating that they can probably represent tissue-specific genes. It was observed that sequences from Eucalyptus sub-class I of RNA-binding GRPs split into two separated groups (Figure 3). One group is closely related to the Arabidopsis glycine rich RNA-binding proteins (AtGR-RBPs) 2, 3, 4, 5 and 6. The other group is more related to genes coding for RNA binding proteins from Nicotiana sylvestris (RGP-1a, -1b and -1c), Nicotiana glutinosa (NgRBP) and Euphorbia (EeGRRBP-1 and -2). Interestingly, the N. sylvestris genes were reported to present tissue-specific alternative splicing and were suggested to produce truncated polypeptides as well as functional RNA-binding polypeptides (Hirose et al., 1993). The high number of clusters belonging to this sub-class of RNA-binding proteins and the close relationship they present may reflect that at least some of these sequences correspond to alternative spliced forms of the same gene. Both Eucalyptus groups of sub-class I RNA binding GRPs are more related to other previously reported sequences from dicot plants, while several sugarcane sequences included in the phylogenetic tree are preferentially related to sequences from monocot plants like Zea mays (MA16 and CHEM2) and Shorgum vulgare (S1 and S2).
RNA binding GRPs from sub-class II are the least abundant among all the RNA-binding GRPs and are apparently plant-specific (Lorkovic and Barta, 2002). The domain organization of these proteins presents a CCHC-type zinc finger inside the glycine-rich C-terminal domain in combination with the N-terminal RRM motif. Only two clusters were found in the ForEST database with these characteristics (Table 8). One of the clusters (EGJEFB 1029H07.g) is very similar to the tobacco nuclear protein RZ-1 (Hanano et al., 1996) while the other (EGSBSL 1048F09.g) has a close similarity with a still noncharacterized Arabidopsis protein (Table 8 and Figure 3).
Sub-class III RNA-binding GRPs were represented by 4 clusters in the ForEST database. Two of them were isolated from only one or two libraries corresponding to putative tissue-specific expressed genes (Table 8). Three clusters (EGJECL2215H02.g, EGEPRT3325H02.g and EGUT BK1006H11.g) grouped close to the Arabidopsis coldinduced proteins AtGRP-2 and AtGRP-2b, proteins that have two zinc fingers in their glycine-rich domains. The remaining cluster (EGEQFB1001F04) appears more related to two other sequences from Arabidopsis (At2g17870 and At4g36020) that were also shown to be cold-regulated (Karlson and Imai, 2003) but have a longer C-terminal end with 7 zinc fingers interspersed in the glycine-rich region. Two zinc fingers were observed in all the Eucalyptus sequences with the exception of one cluster that is incomplete in its C-terminal end which made the analysis of the zinc finger number of this cluster impossible.
Five Eucalyptus clusters encoding GRPs with multiple RRM domains were classified as belonging to sub-class IV (Table 8). Among them, two clusters (EGACST2105 B03.g and EGCEFB1016C10.g) share high similarity with Arabidopsis UBA1 proteins. Comparison analysis indicates that they group together with Arabidopsis UBA2c (Figure 3). UBA1 and UBA2 proteins bind RNA with specificity for oligouridylates in vitro and interact with UBP1, an hnRNP-like protein associated with poly(A)(+) RNA in the cell nucleus. It has been suggested that UBA proteins may act as components of a complex that recognizes U-rich sequences in plant 3'-UTRs, contributing to the stabilization of mRNAs in the nucleus (Lambermon et al., 2002). The three remaining clusters from the RNA-binding GRPs sub-class IV (EGCEST222E05.g, EGEQRT3201H05.g and EGUTLV1248B11.g) are similar to Arabidopsis heterogeneous nuclear ribonucleoproteins (hnRNPs), RNAbinding proteins that form complexes with RNA polymerase II transcripts and are proposed to regulate pre-mRNA processing (Krecic and Swanson, 1999). While metazoan hnRNPs have a Glycine-rich C-terminal domain in addition to the two N-terminal RRMs, only two out of the six Arabidopsis predicted hnRNPs have a C-terminal domain rich in glycine (Lorkovic and Barta, 2002). The only two sugarcane sequences identified as sub-class IV RNAbinding GRPs (Fusaro et al., 2001) grouped together with the hnRNP similar proteins.
In addition to sequences classified in the four previous described sub-classes of RNA-binding GRPs, 8 clusters encoding GRPs that present other conserved domains usually found in RNA-binding proteins were found in Eucalyptus (Table 9). One cluster (EGQHLV2253F10.g) has a conserved domain characteristic of Gar1, a small nucleolar RNP that possesses a typical glycine/arginine-rich domain and is required for pre-rRNA processing and pseudouridylation (Bagni and Lapeyre, 1998). Two clusters (EGEQFB1001B12.g and EGEPST6161C06.g) have a CCCH (CX 8 CX 5 CX 3 H) type zinc finger. It has been shown that different CCCH zinc finger-containing proteins interact with the 3' untranslated region of various mRNA. Three clusters (EGEQLV2200C06.g, EGEQRT3101G05.g and EGEQRT3300C03.g) were identified with a domain found in proteins that includes the HABP4 family proteins, and the PAI-1 mRNA-binding protein. HABP4 has been observed to bind hyaluronan as well as RNA, but the latter with a lower affinity. PAI-1 mRNA-binding protein specifically binds the mRNA of type-1 plasminogen activator inhibitor (PAI-1), and is thought to be involved in regulation of mRNA stability. Finally, one cluster (EGQHLV 2243F09.g) was found with the conserved LSM domain present in proteins that bind and stabilize snRNPs involved in pre-mRNA splicing.
Since proteins containing such domains as the unique RNA-binding motifs could not be predicted unequivocally as having an RNA-binding function, they were classified as putative RNA-binding GRPs. Particularly interesting is the cluster EGUTSL1044E03.g. It could be consider a true RNA-binding GRP since it presents an RRM motif, but unlike RNA-binding GRPs of classes I, II or IV this domain is located at the C-terminal end of the protein. The sequence with higher similarity to this cluster corresponds to a rice mRNA that encodes a glycine-rich protein with a Cterminal located RRM motif in combination with RanBP2 type zinc fingers at the N-terminal end. This kind of domain organization was never reported before for a GRP and could represent a new class of still uncharacterized RNAbinding GRPs. Since the Eucalyptus cluster is incomplete at the N-terminal the presence of zinc fingers could not be determined.

Eucalyptus clusters encoding proteins with glycine-rich domains
In addition to the GRPs showing glycine-rich domains with semi-repetitive structure described here, several proteins that present short domains with high glycine content and usually without a characteristic pattern of repetition were also found (Table 10). These proteins were classified as proteins with glycine-rich domains. Those clusters presented glycine-rich domains ranging from 32 to 130 aminoacids with 35-81% of glycine. Glycine-rich stretches shorter than 30 aminoacids were not included in this classification.
Out of the 16 Eucalyptus sequences that have glycine-rich domains in their structure, 7 are similar to known RNA binding proteins including the ribosomal RNA processing fibrillarin, several DEAD box RNA helicases, a nucleotide excision repair protein, a bHLH transcriptional regulator and a nucleolin-like protein. The presence of a short glycine-rich domain in a number of pro-teins involved in RNA metabolism suggests that this domain may play a role in the RNA binding function of these proteins.

Concluding Remarks
Although the number of genes encoding GRPs in plants is large up to date, only a few GRPs have been characterized so far and their functions remain speculative. However, it is becoming clear that GRPs exert important roles in very diverse processes such as signal transduction, stress response, transcriptional regulation and development.
The highly specific but diverse expression pattern of grp genes, taken together with the distinct sub-cellular localization of some GRP groups, clearly indicate that these proteins are implicated in several independent physiological processes. Notwithstanding the absence of a clear definition of the role of GRPs in plant cells, studies conducted with these proteins have provided new and interesting insights on the molecular and cell biology of plants. Complexly regulated promoters and distinct mechanisms of gene expression regulation have been demonstrated (Keller and Heierli, 1994;Franco et al., 2002). New protein targeting pathways, as well as the exportation of GRPs from different cell types have been discovered (Ryser et al., 1997;Murphy and Ross, 1998). These data show that GRPs can be useful markers for many physiological processes and/or models to improve the understanding of distinct aspects of plant biology (Sachetto-Martins et al., 2000). The results obtained here point to interesting roles for GRPs in plant physiology. The characterization of the grp genes in Eucalyptus could lead to new strategies for the manipulation of growth and stress signaling in this culture.