Identification and characterization of homeobox genes in Eucalyptus

Homeobox genes encode transcriptional factors, usually involved in molecular control of plant developmental patterns. They can be divided into several classes according to conserved sequences within the homeobox region and the presence of specific additional sequences. Based on these conserved sequences, we developed a search procedure to identify possible homeobox genes in the Eucalyptus Genome Sequencing Project Consortium (FORESTs) database. We were able to identify 50 Eucalyptus sequences (EST-contigs) containing the homeodomain sequence. Phylogenetic analysis was applied to this ESTs-contigs and 44 of them were found to have similarities with one of three well-known homeobox classes: Bell, Knox and HD-Zip, and their sub-classes. However, no EST-contig grouped with the fourth important homeobox class, the PHD-finger homeobox. On the other hand, two sequences have showed pronounced similarity to the Arabidopsis thaliana Wuschel gene, considered an “atypical” homeobox gene. Hierarchical clustering analysis of the expression pattern of these putative Eucalyptus homeobox genes revealed the presence of ten distinct expression groups. Combining phylogenetic analysis and expression patterns for some of the Eucalyptus genes revealed interesting aspects about some of the potential homeobox genes, which might lead to a better understanding of the Eucalyptus biology and to biotechnological applications.


Introduction
The homeobox is a semi-conserved sequence motif of about 180 base pairs, which was first found in morphogenesis controlling genes of Drosophila melanogaster.Homeobox sequences encode 60 amino acid sequences, collectively referred as the homeodomain, which is highly conserved among animal, fungal, and plant proteins.The homeodomain sequences fold into a characteristic DNA-binding structure, composed of three a-helices separated by a loop and a turn (Gehring, 1987;Laughon, 1991;Scott et al., 1989).Homeodomain-containing proteins are basically transcription factors that control many developmental processes and other cellular mechanisms, such as positional information, spatial patterning, cell fate determi-nation and cell differentiation (Lawrence and Morata, 1994).
In plants, homeobox genes are involved in a variety of functions, including developmental programs and response to stress.They are usually divided into four main classes, namely Knox, Bell, PHD-finger and HD-Zip, based on some conserved features inside the homeodomain sequence, as well as on the presence of additional sequences outside the domain (Kerstetter et al., 1994).Members of each class show characteristic structural and functional properties and can be further grouped into sub-classes or families (Sessa et al., 1998).
Most of the members of the Knox (Knotted1-like) class are associated with the maintenance and growth of the shoot meristems (Bowman and Eshed, 2000).Knox genes harbor a second conserved region just upstream of the homeodomain, the ELK domain, the basis on which they may be classified in two families: Knox I and Knox II genes (Bharathan et al., 1999).Knox I, such as KNOTTED-1 (kn1) and SHOOTMERISTEMLESS (STM), are preferentially expressed in shoot and floral meristems (Reiser et al., 2000), while Knox II genes seem to have much wider expression domains.
In the Bell class, the homeodomain is located within the C-terminal third of the protein (Reiser et al., 1995).They usually show regions rich in serine/threonine and proline outside the homeodomain (Quaedvlieg et al., 1995), which may function as transcription-activating domains.The Bell1 and ATH1 are examples of Bell 1 class genes.They are required for integument development and signal transduction during photomorphogenesis, respectively (Quaedvlieg et al., 1995).
Proteins encoded by both Knox and Bell genes present a three-amino acid extension in the loop connecting the first and second helices of the homeodomain (Bürglin, 1997).They are thus classified in the TALE (Three Amino acid Loop Extension) super-class (Chan et al., 1998;Bellaoui et al., 2001;Becker et al., 2002).
The Plant Homeodomain (PHD) finger genes encode proteins containing a Cys4-His-Cys3 motif, which is thought to bind two zinc ions, even though the space between the cysteine/histidine residues may be too variable (Aasland et al., 1995).They are frequently associated with chromatin-mediated transcriptional regulation, as in the case of the Arabidopsis Pathogenesis-related homeodomain protein (PRHA), which regulates elicitor-mediated expression of the parsley pathogenesis-related protein 2 gene (PR2) (Korfhage et al., 1994).
The Homeodomain-leucine zipper (HD-Zip) gene class encodes to proteins in which the homeodomain is closely associated with a leucine zipper (Schena andDavis, 1992, 1994).They are separated into four different families (HD-Zip I, II, III and IV) depending on specific differences of the leucine zipper motif (Chan et al., 1998;Aso et al., 1999;Ingram et al., 2000;Sakakibara et al., 2001;Ageez et al., 2003).This kind of homeodomain organization is present only in plants and it is speculated that HD-Zip genes originated in plant lineage by exon exchange between a homeodomain gene and a leucine zipper containing sequence (Schena and Davis, 1992).Proteins belonging to HD-Zip I and II classes are very similar with respect to the architecture of the HD-Zip domain (Sessa et al., 1993).On the other hand, HD-Zip III proteins present an insertion of four amino acids both between helix 2 and 3 of the homeodomain and between helix 3 and the leucine zipper domain (Baima et al., 1995;Sessa et al., 1998), changing the spacing between the homeodomain and leucine zipper.The HD-Zip IV family has a leucine zipper motif separated in two sub-domains by a loop of 10 amino acid residues.Many examples of HD-Zip class genes are found in the well-known model plant Arabidopsis thaliana, performing a wide array of functions.The ATHB-1 (HD-Zip I) gene is associated with leaf development (Aoyama et al., 1995); the ATHB-2 (HD-Zip II) protein probably has a role in me-diating cell elongation (Carabelli et al., 1996); the ATHB-8 (HD-Zip III) protein acts in the differentiation of the vascular system (Baima et al., 1995) and the ATHB-10 (HD-zip IV) regulates trichome development and suppresses root hair formation (Di Cristina et al., 1996).
Finally, there are a number of genes that have been recognized as having the homeodomain motif which do not fit in any of these main classes.Examples include the atypical homeobox WUSCHEL protein from A. thaliana which is essential for shoot apical meristem formation and maintenance (Mayer et al., 1998).
In this work we searched for potential homeobox genes in the Eucalyptus Genome Sequencing Project Consortium (FORESTs) database and used phylogenetic methods to classify those among these well-established groups.We were able to identify 50 high quality assemblies of homeobox genes after an extensive search on the FOREST database.To better understand homeobox gene expression at plant system level and to identify differentially expressed and tissue-specific genes, we conducted a digital expression analysis.By clustering genes according to their relative abundance in the various EST libraries, expression patterns of genes across various tissues were generated and genes with similar patterns were grouped.This information can be useful in devising strategies for the biotechnological improvement of Eucalyptus trees for industrial use and may assist in the study of homeobox in other plant species.

Databases and procedures for searching Eucalyptus homeobox sequences
The primary data used in this work were the so-called EST-contigs sequences, from the FORESTs project database.In the FORESTs project, they have been assembled from approximately 130,000 ESTs obtained in the sequencing of 19 Eucalyptus cDNA libraries, corresponding to different tissues and physiological states (Detailed information on cDNA libraries, sequencing, clustering and other features of the FORESTs project may be found in https://forests.esalq.usp.br/).Besides Eucalyptus ESTcontigs, Arabidopsis thaliana homeobox sequences, obtained from The National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/) and from The Institute for Genomic Resources (TIGR, http://www.tigr.org/tdb/e2k1/ath1/), were used for comparison.
In any single FORESTs database search, just one target amino acid sequence was compared to all ESTcontigs nucleotide sequences in all six frames.This was done with the tBLASTn algorithm (Altschul et al., 1997) implemented in the FORESTs system.In the first search, the target amino-acid sequence was a consensus homeodomain sequence (qrcqyvvgreREELAKQLNLTERQVK VWFQNRRAKxKKdqsrdlekra) generated by the COBBLER program (Henikoff and Henikoff, 1997).The searches were conducted using the TBLASTN algorithm (Altschul et al., 1997) with the BLOSUM60 scoring matrix.Sequences with significant similarity to the COBBLER consensus (E-value of 10 -5 ) were retrieved from the database and were screened for the presence of homeodomain using the InterProScan (Apweiler et al., 2001) and PRODOM (Servant et al., 2002) programs.Sequences that met these criteria were then used as target amino-acid sequences in new searches against the FOR-ESTs database to locate further potential homeobox sequences.In this new round full EST-contigs were the target sequences, and the searches were conducted using the TBLASTN algorithm with the BLOSUM80 scoring matrix.Newly found EST-contigs harboring significant similarity (again E-value of 10 -5 ) to any of the previously found were also investigated using InterProScan and PRODOM programs.This procedure was repeated until no new contigs were found.Finally, sequences from each selected EST-contig were used for BLAST search against Genebank in order to verify its closest sequence deposited in the world data bank.

Phylogenetic and expression analyses
To classify the selected EST-contigs, we first performed a multiple alignment of their homeodomain region using the CLUSTALW program (Higgins et al., 1994).Phylogenetic analysis was conducted using the MEGA 2.1 software (Kumar et al., 2001).Among the various options available, we chose the Neighbor-joining method (Saitou and Nei, 1987), derived from a p-distance matrix, and the pair-wise deletion option was adopted, excluding amino acid gaps from the sequence alignment.
The expression analysis used the so-called 'digital northern blot' (Carraro et al., 2001;Lambais, 2001) to represent the expression profile of the potential homeobox genes and to identify differential expression among the diverse Eucalyptus tissues/treatments used in the FORESTs project.In this approach, the frequency of reads for each EST-contig in each selected library is computed and then normalized with respect to the number of reads in the library and to the total number of reads in all libraries.The obtained values form a matrix relating contigs and libraries.We then used the Cluster and Tree View programs (Eisen et al., 1998) to determine similarity in gene expression patterns among EST-contigs and among libraries.In both cases, aggregation was made by hierarchical clustering, based on Spearman Rank correlation matrix, and substituting formed clusters by their average pattern.Digital blot matrix was ordered, according to similarities in the patterns of gene expression and displayed as an array, where the normalized number of reads for each EST-contig in each specific library is represented in gray scale.

FORESTs database searches
In order to search for homeobox genes in the Eucalyptus transcriptome, a consensus homeodomain sequence was generated by the COBBLER program and used to screen the FORESTs dataset.The first round of searches in the FORESTs database, using the COBBLER consensus sequence, selected a total of 2336 putative homeobox EST-contigs, but only forty-three were found to have the homeodomain in the inspection using InterProScan and PRODOM programs.
Each of these forty-three ESTs-contigs were used to perform the second round of searches in the FORESTs database in which we found seven additional putative Eucalyptus homeobox sequences.An additional search was done using the conserved sequences outside of the homeodomain of the families HD-Zip, Knox, Bell and PHDfinger but no additional EST-contig was found.A total of 50 EST-contigs were found to contain the homeodomain region in the FORESTs database and, considering the extensive searches and distinct methodologies used, we judge that this should be very close to the actual number of homeobox sequences present in the FORESTs database.

Phylogenetics analysis
To be able to identify the phylogenetic groups based on the homeodomain regions of the 50 sequences from the FORESTs database, we aligned and clustered them together with nine A. thaliana homeobox genes, whose genetic, biochemical and taxonomic data have been extensively studied in the literature.Those insertions did not change the structure of the phylogenetic tree, since an identical tree can be obtained from the selected ESTcontigs alone (data not shown).The alignment of these sequences showed the existence of the highly conserved amino acid residues, including the absolutely conserved Trp-49 and a conserved secondary structure consisting of a helix-loop-helix-turn-helix motif (Figure 1).
The phylogenetic tree shows eight statistically wellsupported groups, which were named as Bell, Knox I, Knox II, HD-Zip I, HD-Zip II, HD-Zip III, HD-Zip IV and Wuschel-like, in agreement with sequence features and the presence of the A. thaliana "class marker" genes in each of them (Figure 2).From the 50 EST-contigs encoding to homeobox proteins present in the FOREST databank, 58% of the EST-contigs belong to the HD-Zip class, 16% to the Bell class, 14% to the Knox class and 4% has shown similarity to the unconventional homeobox gene WUSCHEL.Interestingly, our search and analysis did not identify any EST-contig of the PHD-finger family in the FORESTs database.This result is surprising since members of PHD-finger family have been found in Arabidopsis and rice ESTs libraries.To further investigate this issue we searched the FOREST databank for EST-contigs contain-Homeobox gene family in Eucalyptus 513 ing a PHD-finger motif.We found 2336 EST-contigs with significant similarity to the PHD-finger COBBLER consensus (YCSVCGKPDDGGELLQCDGCDRWYHQTCL GPPLLlEEPDGKWYCPKCK), but none of these ESTcontigs encoded for homeobox proteins.The misrepresentation of PHD-finger homeobox genes may be due to their atypical sequence features.These genes usually have extremely long transcripts that might impair reverse transcription and/or subsequent cDNA cloning (Alba et al., 2004).
Four of the Eucalyptus EST-contigs (EGMCC11220 F10, EGEQSL4002E06, EGMCF81096E11 and EGBM LV3289B05) did not cluster to any of these well-defined groups (Figure 2).Blast search against Genebank revealed that the homeodomain region from ungrouped sequences have similarities with other uncharacterized sequences from A. thaliana (CAB77810 is the best hit for EGEQSL 4002E06 and EGBMLV3289B05; NP_199231 for EGMC CL1220F10 and AAF16763 for EGMCFB1096E11).To further typify these sequences, a phylogenetic analysis was performed with 97 homeobox sequences from Arabidopsis  and sequences with significant similarities found in genebank blast search.Our results confirmed that contigs EGMCCL1220F10.g,EGEQSL4002E06.g,EGMCFB 1096E11.g and EGBMLV3289B05.g, together with at least 15 sequences from Arabidopsis, present a high divergent homeodomain and, thus, were not grouped into one of the traditional plant homeobox families (data not shown).Almost all Eucalyptus EST-contigs in the Bell class showed the conserved sequence "QVSNWFINARVR" in the C-terminal region of the homeodomain (Figure 1).This conserved sequence resulted in good statistical support of this group.Knox sequences also showed a high level of similarity, especially in helix 3 and in the N-terminal end, and were also well supported as a monophyletic group.The two classes of Knox proteins were also easy to distinguish.There is strong support for the Knox II group, even though the evidence for Knox I group is weaker (Figure 2).The difference in the tightness of associations between the two Knox subfamilies was noted before in other plants (Bharathan et al., 1999).
Knox and Bell homeobox proteins may be grouped into the TALE superclass.This class is characterized by three extra amino acids between homeodomain helix 1 and helix 2 (Chan et al., 1998;Bellaoui et al., 2001;Becker et al., 2002).However this characteristic three amino acid insertion was not found in two Eucalyptus EST-contigs classified as Bell (EGQHLV3278D04 and EGABST2040C11) and in one classified as Knox (EGCEST6032A06), probably due to incomplete sequencing (Figure 1).Although these two homeodomain families harbor specific structural similarities, our phylogenetics reconstruction revealed only weak statistical support for the TALE super class (Figure 2).This uncertain evolutionary relationship between TALE members has already been observed in the literature (Bürglin, 1997;Becker et al., 2002).
Our analysis identified twenty-nine potential HD-Zip genes from the FORESTs EST-contigs, among which nineteen are HD-Zip I, four HD-Zip II, three HD-Zip III and three HD-Zip IV (Figure 2).HD-Zip I genes were the least supported clade (53%).Analysis of the conserved amino acids among HD-Zip genes (Sakakibara et al., 2001) showed that the HD-Zip I family had an evolutionary rate higher than the others after the split of the monocot and eudicot plants.It has been postulated that this higher evolutionary rate in HD-Zip I genes is due to the absence of strict interactions with cofactors.Because of that, phylogenetic trees often do not assign the HD-Zip I family to a monophyletic group (Sakakibara et al., 2001).On the other hand, the HD-Zip II, HD-Zip III and HD-Zip IV families were highly supported as monophyletic groups (99%, 100% and 99%, respectively).The high similarity of HD-Zip II genes corroborates with the hypothesis that they form a specialized and recent group (Chan et al., 1998).All contig sequences grouped with the HD-Zip III protein REVOLUTA, from A. thaliana, showed the characteristic four amino acid insertion between helix 2 and helix 3 (Figure 1).
Two Eucalyptus sequences formed a monophyletic group with the Arabidopsis WUSCHEL gene (91%), and were thus called Wuschel-like (Figure 2).WUSCHEL have an "atypical" homeodomain sequence that is present only in plants.These "atypical" homeodomain sequences present a four amino acid insertion in the turn of the homeodomain, like HD-Zip III proteins.However, the Wuschel-like sequences has also an additional insertion of one or two amino acids in the loop region of the homeodomain (Figure 1) (Kamiya et al., 2003).The two FORESTs Wuschel-like sequences have structural similarities with this "atypical" group of homeobox genes and may function in analogous biochemical ways (Figure 1).

Expression pattern
It is already recognized that analysis of gene expression can be performed in silico, based on the frequency of sequence tags in cDNA libraries, allowing comparisons of the expression profiles of specific genes in plant tissues (Ewing et al., 1999).This digital expression analysis method has advantages over the conventional microarray approaches in that the liability of the later can be reduced due to cross-hybridization of closely related sequences (Kuo et al., 2002;Lipshutz et al., 1999) and to the development of stable probe secondary structures (Southern et al., 1999).
The Hierarchical Clustering analysis performed over this obtained expression pattern revealed the presence of ten distinct gene expression groups.For each of these groups we assigned the name of the library for which their components are overrepresented in terms of the normalized number of ESTs (Figure 3).The groups were: dark formed calli (group I); dark growing seedlings with three hours of light exposition (group II); roots (group III); leaves damaged by Thyrinteina (group IV); stem from drought stress susceptible seedlings (groups V and VI); leaves of Phosphate/Boron deficient plants susceptible to canker and rust (group VII); dark growing seedlings (group VIII); flower buds, flowers and fruits (group IX); and wood (group X).Two of the EST-contigs (EGBGBK1047B10 and EGCCRT6012A08) were not grouped in any of these (Figure 3).Considering the normalized number of reads, we can say that the Eucalyptus homeobox genes are preferentially expressed in six libraries: Dark growing seedlings E. urophylla (SL6, 7.93% of total number of reads); stem from frost resistant and susceptible plants (ST7, 7.08%); stem from drought stress susceptible seedlings (ST2, 7%); leaves damaged by Thyrinteina for seven days (LV3, 6.67%); stem from drought stress susceptible seedlings (ST6, 6.46%); and flower buds, flowers and fruits (FB1 -6.29%).The result observed for the libraries ST2, ST6 and FB1 is mainly due to the higher number of different EST-contigs, 16, 19 and 19 respectively.On the other hand, the libraries SL6, ST7 and LV3 have few EST-contigs, 3, 6 and 9 respectively, but they present high expression levels.The high number of reads and EST-contigs associated with abiotic stress (drought stress, frost resistance and suscepti-bility) suggests that Eucalyptus homeobox genes may be important in stress adaptation.
The expression profiles from tissue/treatment libraries were also clustered.EST-contig collections derived from the most similar tissues typically clustered together (Figure 3 top).This was observed for one of the cluster (a), where all three stem libraries and the wood library are included, representing wood forming tissues.Interestingly, the library of dark growing seedlings of E. grandis (SL7) is  also part of this cluster.The cluster (b) included almost all the rest of the libraries.The libraries (SL4, CL2, LV1 and BK1) do not have enough homeobox ESTs to sustain any correlation.The comparison of these tissue/treatments libraries did not display any consistent trend.
The gene expression groups III, V and VI (roots and plants under drought stress) showed a high number of EST-contigs whose expression are restricted to one or few libraries, indicating that these homeobox genes might be involved in distinctive aspects related to root development and drought stress adaptation.On the other hand, most of the homeobox genes belonging to group X showed a broader distribution over the libraries.This indicates that homeobox genes of the group X might be involved with more general aspects of plant physiology/development like, for instance, vascular development in different tissues.
HD-Zip genes were the type of homeobox genes most commonly found in the FORESTs database.These homeobox genes are involved in a wide range of process in plants and are expressed in different organs and developmental stages.Their expression patterns are modified by processes causing developmental responses like light, hormones, stress and wounding (Schena et al., 1993;Aoyama et al., 1995;Baima et al., 1995;Carabelli et al., 1996;Söderman et al., 1996;Aso et al., 1999;Ingram et al., 2000;Sakakibara et al., 2001;Ageez et al., 2003).Eucalyptus HD-Zip I genes were also present in almost all gene expression groups (Figure 3), but the greatest number of ESTcontigs (seven out of 19) were observed in expression groups V and VI (stem from drought stress susceptible seedlings).Three HD-Zip I genes from Arabidopsis (ATHB-6, ATHB-7 and ATHB-12) have their expression induced during drought stress (Söderman et al., 1996;Lee and Chun, 1998), and this same expression pattern was observed for the potential Eucalyptus homologous of ATHB-7 (EGQHST6232C03) and ATHB-12 (EGMCST2237G10 and EGCEST2224H03).The association between drought stress and the expression of those genes provides initial insights about their possible function.
The atypical homeobox WUSCHEL gene is essential for shoot apical meristem formation and maintenance in A. thaliana (Mayer et al., 1998).Recently it was shown that a WUSCHEL-like gene is also involved in rice root apical meristem formation (Kamiya et al., 2003).This contributes to the hypothesis that maintenance of shoot and root apical meristems are regulated by similar processes.The presence of an EST-contig homologous to WUSCHEL in the Eucalyptus roots indicates that this hypothesis may also be true for tree species (EGEQRT3102D08 -group III, Figure 3).The other Eucalyptus WUSCHEL-like gene has broader expression domain including floral buds, roots and callus.Interestingly, this EST-contig has a higher degree of similarity with the Populus tremula HB2 protein, a protein involved in wood differentiation.Populus HB2 protein is expressed in cambial cells, in xylem and phloem cells undergoing radial enlargement (Hertzberg and Olsson, 1998).

Concluding Remarks
The FORESTs database was searched for homeobox genes and 50 Eucalyptus EST-contigs were identified as sharing significant sequence similarity with the homeobox domain.Sequence alignment and phylogenetic studies provided sound basis for classifying 44 of them (88%) into one of three important homeobox groups: Bell, Knox or HD-Zip.Two sequences were found to have similarity with atypical homeobox genes, like the A. thaliana WUSCHEL gene.Based on a digital expression analysis we found ten subsets from the 50 EST-contigs with similar expression patterns.Sequence comparison revealed some Eucalyptus genes closely related to A. thaliana genes involved in the control of developmental programs, as well as in stress response.Our digital expression analysis facilitated these cross-species comparisons.For instance, we found possible homologous to ATHB-7 and ATHB-12 that were preferentially expressed in stems of plants under drought stress.Thus, such genes may act as mediators of the plant growth response to limiting water conditions in leaf, stem and other plant organs, like their potential homologous genes do in A. thaliana.We also identified numerous homeobox genes associated with biotic stress, wood development, and pathogen responses, which may be valuable in future efforts to elucidate the genetic mechanisms underlying these processes in trees.
Application of biotechnology for tree improvement offers great potential, and transcription factors, such as those encoded by homeobox genes, may play important roles in such improvements.Homeobox genes control multiple biochemical pathways and cellular processes, and several researchers have recently reported progress in identifying homeobox genes in various plant species, including trees (Hertzberg and Olsson, 1998).For instance, wood is made of secondary xylem and many of its features are determined by xylem growth and cell wall biosynthesis in its cells.Identification of genes involved in xylem formation and/or cell wall biosynthesis would lead to studies on their function that can provide mechanisms to manipulate wood features of interest.The same holds for genes involved in stress resistance, growth, reproductive development and other traits important for tree breeding.Thus, the data obtained in the FORESTs project can be very helpful in developing strategies for the biotechnological improvement of Eucalyptus trees.Understanding the molecular mechanisms underlying Eucalyptus growth, development and stress reactions can provide important insights into tree development and reveal the means by which tree characteristics could be modified for the improvement of their industrial properties.

Figure 1 -
Figure 1 -Multiple sequence alignment of 50 FORESTs and nine Arabidopsis homeodomain sequences.(a) Schematic diagram illustrating a typical homeobox protein.(b) Black boxes indicate residues that are present in all sequences, while alignments present in more than 60% but less than 100% of the sequences are shaded in gray.Shading was automatically defined, using the GeneDoc program (http://www.psc.edu/biomed/genedoc/).Residues representing the absolutely conserved Trp-49 are indicated (arrow).The conserved sequence "QVSNWFINARVR" in the C-terminal region of the Bell class homeodomain is underlined.Open boxes indicate extra amino acid residues present in HD-ZIP III and Wuschel-like proteins.The names of Arabidopsis homeodomain sequences are in bold letters.

Figure 3 -
Figure 3 -Digital northern blot representing the expression profile of the potential Eucalyptus homeobox genes.The normalized number of reads for each EST-contig in each specific library is represented in gray scale.The different EST-contigs are represented by the rows, while the libraries are represented by the columns.Hierarchical clustering method was used to group homeobox genes with similar expression patterns, resulting in ten expression groups.EST-contig collections were also clustered (bars and the letters a and b above the draw).The expression groups are indicated by roman numerals to the left of the array: I -E.grandis dark formed calli (CL1); II -E.grandis dark growing seedlings with three hours of light exposition (SL1); III-roots from developing plants (RT3); IV-leaves damaged by Thyrinteina for seven days (LV3); V-stem from drought stress susceptible seedlings prepared with 0.6 to 2.0 kb DNA fragments (ST2); VI-stem from drought stress susceptible seedlings prepared with 0.8 to 3.0 kb DNA fragments with 0.8 to 3.0 kb(ST6); VII-leaves of Phosphate/Boron deficient plants susceptible to canker and rust (LV2); VIII-E.globulus dark growing seedlings (SL4); IX-Flower buds, flowers and fruits (FB1); and X-E.grandis wood (WD2).