Overall picture of expressed Heat Shock Factors in Glycine max, Lotus japonicus and Medicago truncatula

Heat shock (HS) leads to the activation of molecular mechanisms, known as HS-response, that prevent damage and enhance survival under stress. Plants have a flexible and specialized network of Heat Shock Factors (HSFs), which are transcription factors that induce the expression of heat shock proteins. The present work aimed to identify and characterize the Glycine max HSF repertory in the Soybean Genome Project (GENOSOJA platform), comparing them with other legumes (Medicago truncatula and Lotus japonicus) in view of current knowledge of Arabidopsis thaliana. The HSF characterization in leguminous plants led to the identification of 25, 19 and 21 candidate ESTs in soybean, Lotus and Medicago, respectively. A search in the SuperSAGE libraries revealed 68 tags distributed in seven HSF gene types. From the total number of obtained tags, more than 70% were related to root tissues (water deficit stress libraries vs. controls), indicating their role in abiotic stress responses, since the root is the first tissue to sense and respond to abiotic stress. Moreover, as heat stress is related to the pressure of dryness, a higher HSF expression was expected at the water deficit libraries. On the other hand, expressive HSF candidates were obtained from the library inoculated with Asian Soybean Rust, inferring crosstalk among genes associated with abiotic and biotic stresses. Evolutionary relationships among sequences were consistent with different HSF classes and subclasses. Expression profiling indicated that regulation of specific genes is associated with the stage of plant development and also with stimuli from other abiotic stresses pointing to the maintenance of HSF expression at a basal level in soybean, favoring its activation under heat-stress conditions.


Introduction
Heat stress is one of the major factors limiting the productivity and adaptation of crops, especially when temperature extremes coincide with critical stages of plant development. The major developmental performance of plants occurs at a temperature regime between 10°and 40°C. Temperatures below or above this range generally cause temperature-induced stresses (Treshow, 1970;Hsu et al., 2010). In the case of heat stress, both the rate of temperature change and the duration and degree of high temperatures contribute to the intensity of heat stress. The degree of inherent adaptedness to heat stress of a plant is an important determinant of its ability to survive a stress period (Efeoglu, 2009). However, the expression of HSF and HSP genes has been also observed under other abiotic and biotic stresses, as cited by Pirkkala et al. (2001). In response to various inducers such as elevated temperatures, salinity, drought, oxidants, heavy metals, bacterial and viral infections, most HSFs acquire DNA binding activity to the heat shock element (HSE), thereby mediating transcription of the heat shock factor genes, which results in accumulation of heat shock proteins (HSPs). Among important transcription factors, heat shock factors (HSFs) are essential for the transcription of many HSP coding genes that are active in response to sublethal heat stress leading to increased tolerance against a subsequent, otherwise lethal, heat shock (Treshow, 1970;Hsu et al., 2010).
After stress perception, intracellular changes lead to a molecular cascade of events, initiated by HSF activation and subsequent expression of HSPs limiting stress damage (Hsu et al., 2010). In general, HSF proteins have a common core structure comprising a N-terminal DNA binding domain (DBD) characterized by a central helix-turn-helix (HTH) motif, an adjacent domain with a heptad hydrophobic repeat (HR-A/B) which is involved in oligomerization, a short peptide motif essential for nuclear import [nuclear localization signal (NLS)] and export [nuclear export signal (NES)], and a C-terminal AHA type activation domain (Mittal et al., 2009;Hsu et al., 2010).
Through the DNA binding domain, activated HSFs bind to conserved cis-acting elements called heat shock elements (HSEs). HSEs are located in the promoters of HSP genes and are defined as adjacent and inverse repeats of the motif 5-nGAAn-3, for instance 5-nGAAnnTTCnnGAAn-3 (Schöffl et al., 1998).
Some HSFs have been cloned and characterized from various plant species Baniwal et al., 2007) revealing that the network of HSF genes is highly flexible and specialized in this group. Details regarding the overall HS response network were initially not clear. However, studies in Arabidopsis revealed that 21 HSFs form a complex network, in which AtHSFA1a and AtHSFA1b play important roles in the induction of HSP genes in the early stage of HSR .
An insight into the response of HSPs and HSFs to different abiotic stresses was provided through a number of genome-wide microarray datasets. Arabidopsis HSFs and HSPs were strongly induced by heat, cold, salinity and osmotic stresses. Furthermore, overlapping responses of HSPs and HSFs to heat and other abiotic stresses was reported, indicating that these genes are important elements in the crosstalk among different response pathways (Hu et al., 2009). In rice, over-expression of OsHsp17.7 enhanced rice tolerance to heat UV-B as well as to drought (Sato and Yokoya, 2008). Hu et al. (2009) identified rice HSF and HSP genes and analyzed their expression profiles under different abiotic stresses. A whole-genome microarray analysis was carried out to investigate expression changes of rice HSFs and HSPs genes in response to heat stress. By comparing their experimental data with other expression data under salt, cold, and drought conditions, Hu et al. (2009) found that the rice HSF and HSP families responded to different stresses in an overlapping relationship. The analysis also indicated that some HSF and HSP genes exhibited specific expression patterns in response to distinct stress types.
In Arabidopsis, for example, the major role of the representatives of the HsfA4/A5 group, which is generally not involved in the conventional heat stress response, may reside in cell type-specific functions connected with the control of cell death triggered by pathogen infection and/or reactive oxygen species (Baniwal et al., 2007).
Although the flexible network of HSF genes has been well studied in plants, there is little information available regarding the structure and function of HSF genes in legumes. Additionally, no comparison of HSF orthologs has been carried out until now among legumes. In this study, we used well-described Arabidopsis HSF proteins as seed sequences in order to identify and characterize the pool of HSF genes present in the Glycine max genome and perform a comparative analysis against Lotus japonicus and Medicago truncatula genomes, so as to trace the panorama of the HSF genes in these leguminous plants.

Material and Methods
Based on 21 well-described Arabidopsis HSF genes in the AfTDB database, BLASTp searches (Altschul et al., 1990) were carried out for similar sequences against the GENOSOJA database. GENOSOJA connects public and project soybean data (Nascimento et al., 2012). In total, the initiative provides information on 60,747 unigenes from the NBCI, Phytozome and Soybean full-length cDNA databases (Nascimento et al., 2012). Comparative searches were made in the Medicago truncatula and Lotus japonicus databases. After searching the GENOSOJA databank, only orthologs presenting the fully characteristic HSF DNA-Binding Domain (DBD) were considered for subsequent analysis. In view of the obtained soybean, Medicago and Lotus HSF candidates together with the Arabidopsis seedsequences, a comparative analysis with 69 aligned proteins was performed, enabling the generation of a dendrogram, using the Neighbor-Joining (NJ) method with 2,000 bootstrap replications with program MEGA program v. 5.0 (Tamura et al., 2011), to infer about HSF groups and classes within the analyzed legumes. For this purpose the sequence-coding genes from Arabidopsis that did not present similarity (orthology) with the studied legumes were excluded from the phenetic analysis. To prevent the influence of different sequence sizes, the alignments were trimmed aiming to exclude unequal 5' and 3' extremities.
To evaluate the HSF-related tags represented in the SuperSAGE libraries generated by the GENOSOJA project, a comparative analysis using the same seed sequences and the MegaBLAST algorithm was carried out according to Altschul et al. (1990). For this purpose the parameters were adjusted to an e-value equal to or less than 0.1 and word size equal to 7. The low complexity filter was deactivated. Results considered only tags with identity equal to or larger than 23 bp.
The GENOSOJA databank is comprised of six SuperSAGE libraries and allowed the generation of three comparisons, including two from root tissues subjected to water deficit stress and one inoculated with Asian Soybean Rust fungus (Phakopsora pachyrhizi). For the water deficit libraries, seeds of a drought tolerant cultivar (Embrapa 48) and a drought susceptible cultivar (BR 16) were germinated on filter paper for four days in a growth chamber at 248 Soares-Cavalcanti et al.
25 ± 1°C and 100% relative humidity (RH). Seedlings were placed in 36 L boxes containing 50% Hoagland's solution ( Hoagland and Arnon, 1950) continuously aerated and replaced on a weekly basis. These boxes were then transferred to a greenhouse under natural photoperiod of approximately 12/12 h light/dark cycle, temperature of 30 ± 5°C and 60 ± 10% RH. The plants were allowed to grow until the V4 stage (Fehr et al., 1971). The experimental plan was a randomized complete block 2x7 factorial design with three repetitions. The treatments included two cultivars (BR 16 and Embrapa 48) and seven water deficit periods (0, 25, 50, 75, 100, 125 and 150 min). Water stress was applied by removing the plants from the hydroponic solution and leaving them in boxes without nutrient solution for up to 150 min under ambient-air exposure. For each stress exposure time, roots from 10 plants were collected, pooled and frozen in liquid nitrogen before storage at -80°C. The above mentioned exposure times were bulked together generating a library from drought tolerant genotype Embrapa 48 after stress as compared with the negative control (T0); the same procedure was also applied to the drought sensitive genotype (BR16 cultivar). The comparison regarding Asian Soybean Rust infection was generated from leaves of the resistant cultivar PI561356 collected at different times (12, 24 and 48 h) after spraying with a P. pachyrhizi spore suspension (6 x 10 5 uredospores.mL -1 ). The urediniospores were collected from Phakopsora pachyrhizi infected soybean fields in the state of Mato Grosso, Brazil, and maintained for over 10 generations on the susceptible cv. BRSMS-Bacuri. The suspension of spores was sprayed onto three plants per pot at the V2 to V3 growth stage (Fehr and Caviness, 1977). The same solution without the spores was used for the false inoculations (Mock). The different times were bulked together to form a single resistant library, which was compared with the false inoculated negative control collected at the same time points.
Considering the identified G. max EST transcripts, standard statistical methods (see Eisen et al., 1998) were used to arrange the HSF genes according to their gene expression pattern, generating a graphic with colors (green, red and black) indicating their quantitative and qualitative expression (down-, up-and unregulated genes, respectively), while gray stood for absence of information. The gene expression data analyzed were collected from soybean during a variety of challenging and control conditions available at the GENOSOJA database. So as to obtain a picture of how HSFs contribute to sensing the environmental up-shifts in temperature, we applied Self-Organizing Maps followed by pairwise average-linkage cluster analysis to normalized gene expression data (Eisen et al., 1998). Relationships among genes and libraries were represented by dendrograms in which branch lengths reflect the degree of gene co-expression.
An available genome browser for soybean (Phytozome database) was used to anchor identified EST candidate sequences on G. max virtual chromosomes, aiming to identify their distribution, relative position, and abundance. For this purpose the MegaBLAST tool was used to identify the exact location of the HSF genes in the genome, using at least 80% identity as a parameter. For the construction of a virtual karyotype representation, a CorelDRAW12 graphic application was used. The soybean chromosome information for the schematic representation was obtained from the SOYBASE site. For the design of chromosomes, considering the need for high-resolution bands (data anchored in the genome), a proportion of 1:1 (cm:Mb) was adopted for all chromosomes; thus, for the sequence positioning, each millimeter corresponded to 100,000 bp. On the representation each transversal black line corresponded to an HSF gene.

Results and Discussion
Heat and cold can have damaging consequences on both vegetative and reproductive tissues. Temperature changes can also regulate plant movements, resetting internal clocks and diurnal synchronization, flowering and germination in some species (Ruelland and Zachowski, 2010). Moreover, temperature changes can induce metabolic changes so that plants adapt and tolerate moderate cold, freezing and heat stresses (Ruelland and Zachowski, 2010). HSFs are important components of the heat shock regulatory network, with a single gene identified for yeast and drosophila, while vertebrates accounted with only four genes of this category (Swindell et al., 2007). Nevertheless, unlike other organisms, plant genomes encode extraordinarily complex HSF families, both in terms of the total number of genes (usually more than 20), as well as in terms of their structural and functional diversification . This abundance and diversity can be also seen in legumes. An extensive BLAST search of Arabidopsis HSF orthologs in soybean, Lotus and Medicago EST databases led to the identification of a total of 25, 19 and 21 expressed sequences, respectively (Table 1).

HSF expressed sequence tags
The characteristic HSF domains were complete in 24, 13 and 17 orthologous candidates identified among the three species, respectively (Table 1). From the 21 types of Arabidopsis HSF genes only 13 types were identified in soybean and Lotus and 14 in Medicago (Table 1). In our evaluation, HSFA1B, HSFA6A, HSFA7B and HSFA9 were absent in all species analyzed (Table 1). HSFA1A and HSFA1B interact as regulators responsible for immediate-early transcription of a subset of HS genes in Arabidopsis, and are independently important for the initial phase of HS-responsive gene expression, while their interaction enhances the expression of their target genes (Li et al., 2010). The absence of HSFA1B may render soybean more sensitive to heat stress but another class A HSF may Legume Heat Shock Factors 249 Soares-Cavalcanti et al.
alternately play this role (Sung et al., 2003;Kotak et al., 2004;Li et al., 2004). Whether another gene substitutes the role of HSFA1B in soybean could be tested by heterologous expression of HSFA1B; in the case of the existence of different pathways, the over-expression of HSFA1B might change the performance of soybean plants, especially under heat stress.
Other members of class A, such as HSFA9, are less active or may be active only under certain conditions. The reason seems to be the presence of interesting regulators (HSFs or other transcription factors) with specialized functions. In fact, HSFA9 was found to be specific to seed development in sunflower and was exclusively detected in yellow siliques of Arabidopsis mRNA (Kotak et al., 2004). Hence the lack of identification of some HSF classes may correlate with specialized functions other than those represented among the conditions analyzed herein.
A similar result was reported by Nover et al. (2001) after carrying out an analysis of HSFs in A. thaliana. Among the 21 described genes, HSFs A3, A6A, A6B, A7B, B2A and B3 could not be detected in any of the tissues analyzed (etiolated seedlings, roots, leaves from vegetative plants stems, flowers, siliques, and developing seeds) or conditions (heat stressed leaves and cell cultures vs. control). According to the authors it was not surprising that no matching EST was found in libraries created exclusively from RNA isolated from control tissues; a serious limitation of the data from EST libraries for these studies is the lack of samples from heat stressed tissues.
Comparing the obtained results with the data available in the Legume Transcription Factor database (Legume TFDB, Mochida et al., 2009a) Kotak et al. (2004) listed 34 soybean sequences, a higher number of HSF representatives than those in GENOSOJA, but these authors did not indicate the methods and procedures used in the acquisition of these HSFs. Finally, the soybean candidates identified herein represent the active (expressed) HSFs bearing the complete DBD-domain. This set size was similar to that described for Arabidopsis and also for the Lotus and Medicago orthologs identified in this study; both being evolutionarily closely related species when compared to soybean (Fabaceae family, Papilionoideae subfamily). Notwithstanding, it is important to highlight that evolutionary studies and haploid genome analysis suggested that the soybean genome experienced a tetraploidization event approximately 10-15 million years ago. Since then, the soybean genome has gone through gene rearrangements and deletions, reverting to diploid state. Therefore, soybean multigene families, including the heat shock factor family, may contain highly related but diversified genes (Mochida et al., 2009b).

HSF matching to SuperSAGE tags
Regarding SuperSAGE, 68 different tags could be identified, including 26 tags unique to water deficit experiments with the tolerant comparison (water deficit stressed Embrapa 48 vs. control), 28 tags unique to water deficit experiments with the susceptible comparison (water deficit stressed BR16 vs. control) and 14 regarding Asian Soybean Rust (PI561356 inoculated vs. control) (Table 2; Figure 1). No common tags were identified. It is important to note that among 25 HSF EST clusters, 18 had no representative in the tags database, while five clusters were represented in all libraries. The sequence Gmax_HSFB1_SJ09-E1-R06-064-B09-UC.F was not identified in 'Embrapa 48' and 'PI561356' libraries, and Gmax_HSFB3_Contig20961 was present in the water deficit stressed libraries only. When looked at from a different point of view, from the 14 HSF types compared, only six HSF types (HSFB1, HSFA1E, HSFB2A, HSFB3, HSFA8 and HSFA4A) were identified (Table 2; Figure 1), indicating their induction during the stress response. 252 Soares-Cavalcanti et al.  Despite the small number of identified sequences in the Asian rust 'PI561356' stress analysis, when compared to water deficit experiments, the presence of HSF representatives indicates the involvement of HS-response also during biotic stresses. The stress condition by itself can activate non-specific stress-responsive-pathways, due to the debility caused to plants by biotic stressful conditions, which can activate a crosstalk among different stress related pathways, as observed in other plants (Glombitza et al., 2004;Kido et al., 2011). Moreover, it is important to consider the tissue from which the library was generated, since leaves are among the first organs to present stress symptoms (especially to abiotic ones). These are necessary for the maintenance of photosynthesis and evapotranspiration processes to ensure plant survival. Moreover, the Gmax_HSFB3_Contig20961 gene seems to be expressed specifically under abiotic stress, such as water deficit.
The analysis of SuperSAGE transcript abundance revealed a higher number of orthologous tags for the Gmax_HSFB1_Contig12262 cluster (more than 50% of the identified SuperSAGE tags), followed by Gmax_HSFA1E_Contig12828 (Table 2; Figure 1A). There is evidence suggesting that HSFB1 plays a special role in gene activation as a cooperative partner of HSFA1 and that coexpression of low levels of HSFB1 with HSFA1 can result in strong synergistic effects in reporter gene activation. Experiments in tomato showed that HSFB1 acts as a novel type of coactivator and may be able to cooperate with HSFA1a or other activators to control expression of certain housekeeping genes (Bharti et al., 2004).
Evaluating the results for the comparisons among water deficit libraries (susceptible X tolerant), a similar proportion of HSF genes was observed, with the exception of the Gmax_HSFB1_SJ09-E1-R06-064-B09-UC.F transcript, which was recorded exclusively in the susceptible genotype. In both libraries, Gmax_HSFB1_Contig12262 ( Figure 1B) was more represented, indicating that HSF genes are expressed under water stress conditions in a similar way in both susceptible and tolerant cultivars.
As expected, most SuperSAGE tags were identified from water deficit libraries. However, it is worth noting that more than 60% of the HSF gene types obtained from soybean ESTs were not identified in the SuperSAGE comparisons, suggesting that the seed EST sequences used were not complete, lacking the necessary 3' extremity for anchoring of SuperSAGE tags. This opens the possibility of identifying additional candidates upon using other annotation approaches. A role of these factors in water deficit response may exist, since their expression was reported also in association with other abiotic stresses (Kotak et al., 2007). Moreover, the 68 identified tags could be potentially useful for 3' RACE (3'-rapid amplification of cDNA ends) experiments to identify the complete transcript, besides expression validation using RT-qPCR with the same mRNA samples.

Structure and evolution of HSF candidates in soybean, Medicago, Lotus and Arabidopsis
The functional properties of HSFs are attributed to conserved structural domains, with the highest degree of conservation being observed for the DNA-binding domain (DBD) composed of helix-turn-helix (HTH) structures, and an adjacent domain with a heptad hydrophobic repeat (HR-A/B) which is involved in oligomerization. In addition, there are two further characteristic components: (i) the short peptide motif essential for nuclear import (NLS: nuclear localization signal) and export (NES: nuclear export signal), and (ii) a C-terminal AHA type activation domain (Li et al., 2010). Primarily based on the structural features of the oligomerization domain, plant HSFs are classified into three evolutionarily-conserved classes, namely A, B and C, bearing 14 sub-classes . The high degree of conservation within the HSF family is corroborated by our in silico analysis, as in the generated dendrogram it was possible to observe the differentiation of sequences according to their classes, and within each class there was a grouping of sequences according to their subclasses (Figure 2). A clear differentiation among the HSF classes A and B classes from a basal ancestral sequence has been established, as expected, since class B-and nonplant-HSFs differ from class A-and C-HSFs by an additional 21 or 7 amino acids, respectively, which separate the two subdomains HR-A and HR-B located in the hydrophobic regions . Furthermore, the AHA type acidic activation domain is exclusively represented by class A members (Mittal et al., 2009).
With respect to class A, two main groups emerged in the present evaluation: one (I) with HSFA4 and HSFA5 representatives and the other (II) with the remaining HSFA and HSFC members (Figure 2). This is a predictable result, since HSFs A4 and A5 form a group distinct from the remaining HSFs by structural features of their oligomerization domains and by a number of conserved signatures. This is also consistent with their role, since A4 HSFs are potent activators of heat stress-related gene expression, whereas A5 HFSs act as a specific repressor of HSFA4 activity, while other members of class A are not affected due to the high specificity of their oligomerization domains (Baniwal et al., 2007).
The second group included three branches, with a basal one including HSFA8 and HSFC1 (Figure 2). Although class C is more similar to class A than to B, it was expected that this class would behave as a separate group. Nevertheless, the high diversity in the response of different HSF genes to different stresses suggests that there is a high degree of specialization regarding the response of specific HSFs to a particular stress condition. This is consistent with the fact that both HSFA8 and HSFC presented increased expression under cold stress (Miller and Mittler, 2006), indicating that this adaptive response to tolerate cold conditions may be responsible for characteristics shared by these two genes. In fact, in the multiple alignment analysis, two regions comprising 15 residues each (amino acid positions 125 to 139 and 154 to 168) were shared by both HSFA8 and HSFC protein sequences, though absent in other class A HSF members. Furthermore, peculiarities shared by HSFCs, such as deletions of six amino acids at position 106-111 and probable mutations in two segments (intervals: 161-168 and 195-220) may justify the differentiation of class C proteins from class A ones, as evidenced in the dendrogram.
Regarding the specific function of class C, remarkable little information is currently available. According to Nover et al. (2001), HSFCs were well represented in expressed sequence tags (ESTs) from libraries of tomato, soybean, potato, barley and Arabidopsis. The HSFC type is clearly separated from all others by sequence details of the DBDs and by the characteristics of the HR-A/B region. However the significance of these extended oligomerization domains in class A and C HSFs for the coiled-coil structure and oligomerization behavior is not yet clear .
We denoted a conservation in the position and function of AHA motifs and NES in the C-terminal regions of class A. These regions, in addition to the flanking amino acid residues, were sufficient to identify the HSFs without prior knowledge about the respective DBDs or HR-A/B regions (Kotak et al., 2004). Furthermore, the results were positive for ESTs encoding representatives of HSF groups A1, A2 and A6 (Kotak et al., 2004). Thus, it can be inferred that the observed grouping formed by HSFA1, HSFA2 and HSFA6B in the dendrogram (Figure 2) was based on the similarity of AHA motifs and NES in the C-terminal regions.
It is noteworthy that the C-terminal domains (CTDs) of class B HSFs are completely different , justifying their isolation in a separate branch, composed of two main groups. The first one includes the B3 sub-class members together with a single member of the B2 sub-class from L. japonicus. This unexpected grouping of the Lotus B2 sub-class member seems to result from a deletion in a region rich in alanine, valine, isoleucine and methionine. Apparently, this deletion was responsible for the exclusion of this sequence from the branch including the remaining class B members. The second group includes B1, B2 and B4 sub-classes, these being separated in different branches according to their sub-classes (Figure 2). This grouping may be explained by differences observed in a cluster containing arginine and lysine residues close to the C-terminus of HSFB1, probably responsible for permanent nuclear localization (Heerklotz et al., 2001) and also by the fact that similar motifs were found in other representatives of this group and also in groups B2 and B4 (with the exception of the HSFB3 sub-class) which is the smallest of all HSFs identified so far.
Although our knowledge is still limited, functional diversification seems to be the main reason for the coexis-  tence of more than 20 HSF types in plants (Baniwal et al., 2007). A systems analysis of tomato HSFs revealed two interesting peculiarities: (i) there are at least four different HSF groups (Scharf et al., 1990Treuter et al., 1993;Bharti et al., 2000) belonging to two classes (i.e., class A with HSFs A1, A2, and A3 and class B with HSFB1), and (ii) two of the four HSFs (HSFA2 and B1) are heat stressinducible proteins Kotak et al., 2004).
In most cases, all identified gene classes and sub-classes were expressed and identified in the four evaluated legumes, suggesting that the family members diverged before the species differentiated. Alternatively, such gene classes and sub-classes may have already functioned as independent genes in the common ancestor, thus favoring divergent evolution.

HSF expression in soybean
Plant cells constitutively express a pool of HSF proteins that are maintained in an inactive state. Certain results suggest that heat-induced protein denaturation participates in the activation of these HSFs (Yamada et al., 2007). This molecular device is normally based on changes in protein conformation and can respond very quickly, playing therefore a central role in transcriptomic remodeling induced upon heat exposure. Accordingly, all HSFs expressed in soybean identified in this study were derived from experiments in the absence of heat stress.
Moreover, it is well known that heat often occurs in combination with drought or other stresses that cause extensive agricultural losses worldwide. HSFs serve as the terminal components of signal transduction, mediating the expression of HSPs and other HS-induced transcripts, but their diverse temporal and spatial expression has also been demonstrated under the influence of other abiotic stresses (Kotak et al., 2007).
HSFs are involved in stress sensing and signaling but can also be part in the regulation of other cellular processes, including development, where a role is strongly suggested by expression profiles in libraries of tissues from young stages. The only exceptions seen herein were mature adult and drought-stressed leaves where the expression of HSFB1 and HSFB2A1 was diametrically and remarkably down-and up-regulated, respectively (Figure 3).
Plant HSFs may also function as H 2 O 2 sensors, as is also the case in humans and Drosophila, where HSFs directly sense H 2 O 2 and assemble into homotrimers in a redox-regulated manner. HSFA2 controls expression under prolonged HS and recovery conditions. Interestingly, its expression is induced by high luminosity and exposition to H 2 O 2 , emphasizing its importance under various stress conditions (Miller and Mittler, 2006). HSFA4A and HSFA8 are likely to act as sensors of reactive oxygen species (ROS), with HSFA5 acting as a repressor of HSFA4. Indeed, in soybean the profiles of HSFA4A and HSFA8 were quite similar, considering the number of libraries where they were detected. On the other hand, and considering the same libraries, HSFA5 was absent, except in immature seeds containing globular embryo stages where none of the three genes were detectably expressed ( Figure 3). It is also interesting to note that HSFB1.1 was up-regulated in seven-day-old root libraries (R02) and in seedlings (without cotyledons) (S11), situations in which HSFB2A.2 was down-regulated, indicating that these genes may act as antagonists during the initial phases of plant development. This assumption is corroborated by the fact that HSFB1.1 was down-regulated, while HSFB2A.2 was up-regulated in the mature root library (L08).
The similarity in expression patterns of HSF genes in specific libraries (in specific developmental stages or conditions) indicates that the activation of these genes might be evoked by the same cis-regulatory elements in their promoters. Such co-expression was observed for HSFA2.1, HSFA2.2, HSFA6B.1 and HSFA4A.1 in the library S07 from 'seed coats of greenhouse grown plants'. Co-expression could indicate that these genes play the same role or are co-participants in the same pathway.
The induction of transcriptomic remodeling through the HSF network is very important but complex, as it in-Legume Heat Shock Factors 255 Figure 3 -Hierarchical clustering (Cluster3.0) of up-regulated (red), down-regulated (green) and non-regulated (black) soybean EST clusters (p < 0.05) related to HS response; gray stands for absence of information.
volves several HSFs. This network is only a part of the orchestration that contributes to survival under high temperature stress. The panel exposed by our work suggests that HSFs also mediate cross-talk between signaling cascades in soybean for HS and other abiotic stresses, with possible roles in soybean development. Nevertheless, the questions raised here may have to be addressed in subsequent experiments in which the tissues and conditions should be pooled for different and sequential time points.

Distribution of HSF genes in the soybean genome
The comparative analysis of G. max EST sequences (25 in total) and genomic sequences enabled the identification of 62 loci bearing HSF genes (Table 3; Figure 4) from 65 HSFs previously described for soybean (Mochida et al., 2009a), a crop with a supposed polyploid recent past (McClean et al., 2010). From the 25 obtained candidates, two did not align significantly with the characterized heat 256 Soares-Cavalcanti et al. shock factor genes, which can be justified by differences in the cultivars used in genomic and expression sequencing projects. In addition, three described genes for soybean were not identified among the EST sequences, indicating a lack of expression of these genes in the libraries of the GENOSOJA database. Differences among the analyzed cultivars may also explain this lack of similarity.
With respect to the genomic distribution of the HSF family, nine gene clusters could be identified in chromosomes 01, 03, 04, 05, 10, 11 and 19 (Figure 4). According to Mochida et al. (2009a) these clusters may consist of paralogous genes. In soybean, the relative physical distribution of transcription factor genes is of interest, and two types of clusters can be distinguished based on their evolutionary history. The first type consists of a series of genes that arose through repeated tandem duplications (originated from a founding locus). The second type, which is not considered as consisting of paralogous genes, probably arose independently and then relocated to form these duplications and clusters (Mochida et al., 2009b). Pairs of duplicated genes on different chromosomes are common and gene clusters of three or more highly related genes are also widely found (Mochida et al., 2009a). Considering the distance of their occurrence, a few of the duplicated genes could be classified arbitrarily as either genes that were not duplicated in tandem on the same chromosome, or genes that were so (Mochida et al., 2009a).
Moreover, none of the EST clusters aligned on chromosome 12. This was expected, since in this chromosome there is no description of HSF family members (Mochida et al., 2009b), while other chromosomes (02, 06, 15 and 18) presented a single representative of the group.

Concluding Remarks
Results from the present investigation indicate that gene duplication and diversification occurred during plant evolution, whilst differences in their expression patterns caused species-specific variability in the composition of the HSF family members, which can be divided into three different classes and several sub-classes according to their particular motifs and residue-specific rich regions. Although not all of the previously described genes could be found for the three species studied when using a transcriptomic approach, we expect that experiments directed at Legume Heat Shock Factors 257 Figure 4 -In silico hybridization of HSF sequences against the SOYBASE database. Schematic representation of clusters that were anchored in soybean based on BLAST similarity results (see Table 3 for correspondence between EST cluster identification and HSF described genes).
heat-stress conditions may provide additional sequences related to the HS response, including other HSF genes. Furthermore, the absence of soybean ESTs for some HSF members did not impair the evaluation of the distribution of the HSF family in the soybean genome. The family is present in 19 of the 20 chromosomes, including clustered distribution in some.
To understand the complexity of a plant's HSF family and stress response systems in general, it is important to consider that when plants became adapted to terrestrial habitats they evidently had to face and become specialized to rapidly changing and extreme environmental conditions. The present approach represents the first evaluation considering only expressed HSF genes, revealing 25 expressed ESTs and 68 SuperSAGE tags, with emphasis on root tissue (water deficit) libraries. Some HSF candidates present in Arabidopsis, that are apparently missing in the transcriptome of the evaluated legumes (for example HSFA1B), may be important candidates for biotechnological approaches in soybean and other legumes directed towards increasing their performance under temperature stress conditions. Moreover, some genes found to be induced under water deficit may constitute interesting target genes for inferences regarding the association of heat and cold stresses, especially considering current climate change scenarios.