In silico identification of known osmotic stress responsive genes from Arabidopsis in soybean and Medicago

Plants experience various environmental stresses, but tolerance to these adverse conditions is a very complex phenomenon. The present research aimed to evaluate a set of genes involved in osmotic response, comparing soybean and medicago with the well-described Arabidopsis thaliana model plant. Based on 103 Arabidopsis proteins from 27 categories of osmotic stress response, comparative analyses against Genosoja and Medicago truncatula databases allowed the identification of 1,088 soybean and 1,210 Medicago sequences. The analysis showed a high number of sequences and high diversity, comprising genes from all categories in both organisms. Genes with unknown function were among the most representative, followed by transcription factors, ion transport proteins, water channel, plant defense, protein degradation, cellular structure, organization & biogenesis and senescence. An analysis of sequences with unknown function allowed the annotation of 174 soybean and 217 Medicago sequences, most of them concerning transcription factors. However, for about 30% of the sequences no function could be attributed using in silico procedures. The establishment of a gene set involved in osmotic stress responses in soybean and barrel medic will help to better understand the survival mechanisms for this type of stress condition in legumes.


Introduction
In the course of evolution, plants have acquired a myriad of developmental and metabolic strategies to cope with the adverse effects of environmental stresses during vegetative growth and reproduction (Parry et al., 2005), making stress tolerance a complex phenomenon.
Stress perception and the immediate induction of signals that culminate in adaptive responses are key steps leading to plant stress tolerance. Tolerance stress differences between genotypes or different developmental stages of a single genotype may arise from peculiarities in signal perception and transduction mechanisms (Chinnusamy et al., 2004). Under osmotic stress conditions diverse sets of physiological responses are activated, including metabolic and defense systems used to sustain growth and for survival.
The stress-inducible genes are classified into two major groups: one of them protects the plant directly against stresses, whereas the other regulates gene expression and signal transduction (Valliyodan and Nguyen, 2006).
Because plant tolerance against osmotic stress is a complex multigenic trait, a demand exists for genome wide analysis, including 'omics' approaches suitable for uncovering important gene sets involved in this important process (Hirayama and Shinozaki, 2010).
Soybean is an example of a non-model plant with plentiful transcriptome information available. Among available databases, the Genosoja platform connects public and restricted data, providing 60,747 unigenes (Nascimento et al., 2012, this issue).
The identification of candidate genes in soybean and barrel medic will provide additional evidence of the response mechanisms for osmotic stresses in Fabaceae, yielding useful information for crop improvement. As osmotic stress cannot be solved solely via remedial land management, tolerant crops -able to maintain cellular turgor and osmotic balance -may contribute significantly to reduce this economic burden. The key to plant engineering for osmotic tolerance lies in the knowledge of the underlying mechanisms of plant adaptive responses (Hariadi et al., 2011).
In the present work the main categories of osmotic stress genes known from A. thaliana were identified in the soybean (Genosoja Project) and barrel medic (M. Truncatula database) transcriptomes through an in silico approach, in order to contribute to a better understanding of the early molecular adaptation to osmotic (drought and salinity) stress in both leguminous plants.

Materials and Methods
In a previous study based on 7,000 Arabidopsis genes, Seki et al. (2002) identified 103 coding genes distributed over 27 functional categories (Table 1) whose expression increased more than five times in response to osmotic stress. The protein sequences of these stress-inducible genes were obtained at the RIKEN Arabidopsis Full-Length Clone Database, and used as query sequences.
After this step, a local bank with the retrieved sequences was generated in order to make searches for similar sequences against the Genosoja platform (Nascimento et al., 2012) and the M. truncatula database (Quackenbush et al., 2000) using the tBLASTn algorithm (Altschul et al., 1990) with a cut-off of 1e -05 . The results were annotated in other local databank for further analyses and for comparisons among studied organisms and literature information.
In view of the different number of seed sequences per category, the results obtained from each category and organism were normalized. The soybean and Medicago genes with unknown function were submitted to the AutoFACT program (Koski et al., 2005), and annotated according to the data available in the largest functional annotation databanks (KEGG, COG, PFAM, SMART, nr). This step was performed in order to categorize these sequences and assign function to them, based on a comparative analysis.

Results and Discussion
The stress-inducible gene products were classified into two main groups: (I) those that are at the front line of defense, protecting the plant against adverse conditions and (II) those that regulate genic expression and signal transduction in the stress response (Seki et al., 2003). The first group included proteins that probably act in the protection of plant cells from dehydration, such as the enzymes required for the biosynthesis of various osmoprotectants, LEA proteins, antifreeze proteins, chaperones and detoxification enzymes. The second group included signaling mol-316 Soares-Cavalcanti  (Seki et al., 2003). Twenty-seven categories of these two groups classified according to Seki et al. (2002) were analyzed, resulting in 1,088 (soybean) and 1,210 (Medicago) sequences (Table S1, supplementary material). In both genomes the 'unknown protein' category was the most representative ( Figure 1), with 268 candidates for soybean and 331 for Medicago, followed by 'cellular structure organization and biogenesis', 'plant defense' and 'transport protein ion channel carrier' categories ( Figure 1). The highest number of sequences for genes with 'unknown function' -a very common category in expression essays regarding osmotic stress response in plants -attracting great interest from researchers, since those genes represent a clear source of new candidates for breeding purposes. Previous studies highlighted the importance of analyzing the role of stress-induced genes, not only for a further understanding of the molecular mechanisms of stress tolerance in higher plants, but also for improving crop performance using gene manipulation (Seki et al., 2002).
Osmotic stress greatly affects cells both at the micro (i.e., membrane structure), and at the macro level (i.e. the physiology of the whole plant), with results that reflect the variety of responses involved in the acquisition of tolerance. At the microcellular level, the activation of genes in the categories 'cellular structure, organization and biogenesis' (soybean: 62; Medicago: 66) and 'transport protein ion channel carrier' (soybean: 64; Medicago: 60) was observed, showing the importance of the maintenance of cellular structures and of the control of ion exchange with the environment.
Furthermore, we observed the activation of genes in the category 'plant defense' (soybean: 66; Medicago: 60), indicating the presence of a cross-talk process between pathways, a common mechanism in plants under stressful conditions. In addition to stress-specific adaptive responses, plants also share responses that protect them from more than one type of stress (Seki et al., 2002;DeFalco et al., 2010;Nuruzzaman et al., 2010), a response also observed in cowpea, another Fabaceae member (Kido et al., 2011).
Amongst the candidates of the second group of responses, composed of genes involved in signal transduction and regulation of expression (203 in soybean and 190 in Medicago; Figure 2), the category transcription factor (TF) was the most prevalent, representing up to 80% in soybean and 82% in Medicago (Figure 2). The high number of transcription factors suggests that transcriptional regulation is an important mechanism in the signal transduction triggered by osmotic stresses in both legumes.
A surprising result was the absence of a bZIP representative in the soybean database, while in Medicago this category was represented by three candidates (Figure 3). This transcription factor has been identified in many plants and is known to participate in various responsive pathways, including abiotic stress response.
Among the transcription factors, the DREB/ERF and Zinc-finger families had the highest number of sequences ( Figure 3). This result was expected, since from more than 1,600 transcription factors encoded by A. thaliana, 9% are members of the DREB/ERF-like family (Dietz et al., 2010). Due to the versatility of functions that the zinc finger family may have, as well as the variety of their structural proteins, the obtained result was expected. According to Osmotic stress responsive genes in soybean 317  Takatsuji (1998), plants seem to have adopted preexisting prototype zinc-finger motifs, generating new zinc-finger domains to adapt them to various regulatory processes. The zinc finger domain can be present in a number of transcription factors and play critical roles in interactions with other molecules. Mutations in some of the genes coding for zinc-finger proteins have been found to cause profound developmental aberrations or defective responses to environmental cues (Takatsuji, 1998). Zinc finger proteins are required for key cellular processes including transcriptional regulation, development, pathogen defense, and stress responses (Ciftci-Yilmaz and Mittler, 2008). A recent study of rice showed that the C2H2-type zinc finger family alone was represented by 189 members and demonstrated that at least 26 of them respond to different environmental stresses (Agarwal et al., 2007). Moreover, Gong et al. (2010), in a study on transcriptional regulation in drought-tolerant tomato genotypes, also identified and characterized the zincfinger family as the main activated group during the drought response. It is important to note that the number of seedsequences used in the search was different for each category; the 'unknown protein' category, for example, was represented by 37 sequences, while the 'bZIP transcription factor' category comprised a single sequence. Thus, it was expected that the more abundant orthologous categories would be those obtained through comparative searches with the categories composed of more query sequences.
As for the remainder, after normalizing the results, proportionally the most representative categories (7% each) were: 'water channel proteins', 'protein degradation' and 'senescence-related' (Figure 4). Without doubt, all categories analyzed may contribute to an improvement in osmotic tolerance, although some functions are more relevant than others. Proteins associated with ion channels and water channels are essential in the acquisition of resistance in the presence of soluble salts and water shortages, the former controlling the entry and exit of ions such as Na + , which are toxic in high concentrations, and the latter controlling water loss to the environment. Besides these proteins, those falling into the category 'protein degradation' are required for protein turnover and recycling of essential amino acids, while 'senescence-related' genes are key components in the abiotic stress response, with genes controlling subcellular changes that lead to tolerance (Seki et al., 2002). 318 Soares-Cavalcanti  Medicago: 1.721). Nevertheless, this variation may be related to the conditions under which the data were generated and deposited, as well as to the number of sequences available in the respective databases. Additionally, speciesspecific features could be responsible for these variations, to a lesser extent.
Regarding the category 'Unknown Protein', screened candidates from soybean (268) and Medicago (331) were subjected to the AutoFACT program in order to assign function to these sequences, allowing the recognition of the function of 174 and 217 sequences, respectively.
As a result, 42 and 57 G. max and M. truncatula were categorized according to the COG (Cluster of Orthologous Groups) functional database in five categories (Table 2; Figure 5). Within each category, the annotation revealed that they present the same description as the matched sequences deposited in the databank. For example, the 'Amino acid transport and metabolism' functional category was represented just by 'Amino Acid Permease' sequences (Ta-Osmotic stress responsive genes in soybean 319   ble 2). Two candidates of Medicago, which were functionally classified into the 'Carbohydrate transport and metabolism' category, were also annotated on the KEGG database as involved in the beta-galactosidase pathway (Galactose Metabolism Glycan Structure -degradation), ( Table 2).
The remaining previously 'unknown' sequences were annotated as shown in Table 3. The analysis through AutoFACT allowed a function assignment to 132 and 160 soybean and Medicago sequences, respectively. In general, the highest number of sequences was categorized as transcription factors, essential genes participating in the transcriptional regulation of plants. Although it was possible to record more than 65% of the sequences, 35% of 'unknown' soybean and 34% of 'unknown' Medicago sequences remained without their putative function identified. These are relevant data to be worked out in future functional studies, since they may represent new genes not yet described and unique to legumes.
In conclusion, even in the absence of libraries restricted to osmotic stress in the Genosoja databank, this study indicated that most of the genes involved in the osmotic stress pathways were expressed by the non-stressed soybean and Medicago libraries at least in a baseline way. The data also revealed that soybean and Medicago are a rich source of stress-responsive candidates, which can be also applied to improve soybean and other legumes. It also highlights the existence of significant diversity for most genes, useful for comparative physiological essays. The obtained data are available for gene-targeted functional evaluation using qRT-PCR, as well as other biotechnological approaches. The molecular differences detected between the compared libraries will permit the identification of important candidates by additional approaches including PCR walking, as previously done for other crops (e.g. Coemans et al., 2005).
The identified candidates are also being monitored in further expression assays carried out in the Genosoja project (considering contrasting combinations of tolerant and susceptible plants under drought stress as compared with their negative control in a time frame) providing a more complete picture of genes involved in osmotic stress response and useful for breeding and biotechnological purposes.

Supplementary Material
The following online material is available for this article: Table S1 -Identified candidates among abiotic stress responsive gene categories in soybean and Medicago genomes.
This material is available as part of the online article from http://www.scielo.br/gmb.

License information:
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Table S1 -Identified candidates among abiotic stress responsive gene categories in soybean and Medicago genomes based on selected arabidopsis seed sequences, as well as number of other hits, e-value and score, against the respective database of Medicago truncatula (Mt) and Glycine max (Gm).