Boron transport in Eucalyptus . 2 . Identification in silico of a putative boron transporter for xylem loading in eucalypt

Boron (B) is a low mobility plant micronutrient whose molecular mechanisms of absorption and translocation are still controversial. Many factors are involved in tolerance to Boron excess or deficiency. Recently, the first protein linked to boron transport in biological systems, BOR1, was characterized in Arabidopsis thaliana. This protein is involved in boron xylem loading and is similar to bicarbonate transporters found in animals. There are indications that BOR1 is a member of a conserved protein family in plants. In this work, FORESTS database was used to identify sequences similar to this protein family, looking for a probable BOR1 homolog in eucalypt. We found five consensus sequences similar to BOR1; three of them were then used in multiple alignment analysis. Based on amino acid similarity and in silico expression patterns, a consensus sequence was identified as a candidate BOR1 homolog, helping deeper experimental assays that could identify the function of this protein family in Eucalyptus.


Introduction
Boron (B) is an essential nutrient for plant development, required mainly to maintain cell wall integrity.Boron deficiency or toxicity causes productivity losses in different crops around the world (Nuttall, 2000;Leite, 2003).Many problems of Boron deficiency or excess are linked to its low mobility in the majority of cultivated plants (Ruiz, 2001).Considering these facts, plants genes involved in Boron absorption capacity and transport are potential targets in plant breeding.
Until recently, the most commonly accepted theory for boron uptake was that boric acid only entered in root apoplast (extracellular space) by passive transport.However, Nuttall (2000), Dordas et al. (2000) and Dordas and Brown (2000) have shown that boron absorption can also occur by facilitated diffusion, through transmembrane channels -the aquaporins (Chrispeels et al., 1999).
Once in the root cortical region, borate (B(OH) - 4 ) makes a radial movement to reach the xylem.The portion absorbed by passive transport diffusion travels across the root through simplast (intracellular region linked by plasmodesmata).The boron absorbed by apoplast first needs to enter the cell (simplast) to reach the xylem due to the Casparian band, an apoplast barrier in the endoderm.When these solutes enter the xylem, they return to the apoplast, since vase elements are made of dead cells.
The process in which a nutrient leaves simplast and enters the xylem through an ion-efflux channel is called xylem loading (Peres, 2002).This seems to be a key step in the accumulation of ions in shoots, as demonstrated for phosphate (Poirier et al., 1991;Hamburger et al., 2002) and potassium (Gaymard et al., 1998).Takano et al. (2002), is the first protein linked to boron transport in biological systems and is related to boron xylem loading.Among the ten BOR1 hypothetical transmembrane domains, Takano et al. (2002) found a difference of two amino acids in the second transmembrane domain of the putative protein expressed by Arabidopsis mutants which requires higher levels of boron.

BOR1, characterized by
Database searches show that proteins related to Cl -/HCO3 -ion-exchangers and Na + /HCO3 -co-transporters in yeast and mammals are similar to BOR1, but there is no other characterized protein similar to BOR1 in plants.However, EST searches using BOR1 as a query found many similar expressed sequences in monocots and dicots.These data, as suggested by Takano et al. (2002), indicate that BOR1 is probably a member of a family of highly conserved membrane proteins in plants.
In this work, we show the first report of ESTs highly similar to BOR1 in forest trees, using FORESTs database to identify and characterize the putative expression of ESTs related to BOR1 family in eucalypt.We gave special emphasis to the identification of a consensus sequence that corresponds to a putative homolog of BOR1 in Eucalyptus.

ESTs database
We used FORESTs database (https://forests.esalq.usp.br) as a source of eucalypt ESTs.It was composed of 123,889 partial cDNA sequences from various Eucalyptus tissues, and grouped in 33,080 clusters.Descriptions of the cDNA libraries and sequence nomenclature are described in https://forests.esalq.usp.br.

Data mining of BOR1-related sequences
The amino acid sequence of the putative protein codified by BOR1 (Access BAC20173 in Genbank) was used as a query to identify FORESTs clusters consensi in a tBLASTn search (Altschul et al., 1997).The minimum criteria for annotation was an e-value lower than e-50.The sequences related to BOR1 had been translated using the ESTScan tool (Iseli et al., 1999) and revalidated by BLASTP.The sequences whose translation contained the transmembrane domain, linked to difference in boron absorption by Takano et al. (2002), were selected for multiple alignment analysis.

Multiple alignment
In multiple alignment, we used identified eucalypt sequences and the expressed sequences used by Takano et al. (2002) for comparison with BOR1 gene.We extended these comparisons to include other expressed sequences from the Genbank dbEST and SUCEST databases (Vettore et al., 2003).The expressed sequences from SUCEST and GenBank were also translated into putative proteins using ESTScan2 (Iseli et al., 1999).The conserved domain of this protein family was identified using the MEME program (Bailey and Elkan, 1994).These domains and 10 aminoacids on each side of the conserved region were aligned using CLUSTALX (Thompson et al., 1997).We used these data to draw a neighbor-joining dendrogram (Saitou and Nei, 1987) with a bootstrap test of 1000 replicates on MEGA2 software (Kumar et al., 2001).It is important to note that this strategy was chosen because there was no EST that covered the full-length sequence of BOR1 protein.

Data normalization
In the annotated Eucalyptus contigs, the read frequency from each tissue was normalized dividing the number of reads in each tissue by the total number of sequenced reads in the tissue, and the result multiplied by 100,000.

Inference of ESTs expression patterns
The normalized annotated contig data were used to study expression patterns in each tissue by hierarchical clustering (Eisen et al., 1998).This was performed using a non-centered relational matrix and the average-linkage method, through Cluster program v2.20 (Eisen et al., 1998).Data was displayed in TreeView program v 1.60 (http://rana.lbl.gov/EisenSoftware.htm).We also used the Audic and Claverie (1997) method to give statistical support to expression patterns based on the number of ESTs identified in different libraries; we considered p < 0.05 as the cutoff value to identify contigs with differential expression in any tissue.

Results
This study identified five eucalypt consensus sequences related to BOR1 in FORESTs database ( In a BLASTP analysis of the hypothetical proteins codified by these clusters, all of them returned BOR1 and/or related sequences in their first hit (Table 1).Contigs EGEQRT330A02.gand EGEQRT3102H 04.g, which did not contain the transmembrane domain, had BOR1 as the 2 nd best hit in BLASTP.They presented stronger similarity with a sequence in chromosome 3 of Arabidopsis whose function is still not characterized.From the sequences that contained the transmembrane domain, only contig EGEQFB1002H05.g did not have BOR1 as a first match in BLASTP analysis.This contig had greater similarity to a sequence of chromosome 1 of Arabidopsisalso not characterized.
The conserved domain found with the aid of the MEME program in the deduced aminoacid sequences used in the multiple alignment analysis, corresponded to part of the second transmembrane region of BOR1 protein.Incorporating 10 aminoacids at each end of the conserved region permitted evaluation of the transmembrane domain extension and its adjacent regions in the multiple alignment (Figure 1).We used this analysis to draw a dendrogram (Figure 2) that gives some indication of the possible relationship among members of the putative BOR1 protein family.Contig EGEQRT300E03.g,which was preferentially expressed in roots according to the Audic and Claverie (1997) statistical test (Table 2), was grouped in the same clade as BOR1.Although the bootstrap value was lower than 50%, contig EGEQFB1002H05.g, found only in the flower library, grouped with two hypothetical proteins of chromosome 1 of Arabidopsis thaliana (Figures 2 and  3).Curiously, contig EGEQBK1086D09.g, statistically more expressed in the BK (bark) library (Table 2), was grouped with two barley sequences.
Concerning the expression patterns, all the sequenced libraries presented ESTs related to the putative BOR1 family, with the exception of stems susceptible and resistant to frost and water deficiency (Figure 4).If we consider the normalized data, these reads were more expressed in the roots and BK libraries.However, we must consider that the Boron transport in Eucalyptus 627  number of sequenced reads in the BK library was much lower than all other libraries, which could cause sampling deviation in expression patterns and reinforces the idea of a probable preferential expression in roots.

Discussion
Data analyses supported the hypothesis that the Arabidopsis thaliana xylem loading boron transporter is part of a conserved gene family in plants that codifies putative membrane proteins, probably related to anion transport.These proteins could be involved in other biological processes related to the efflux of boron or other ions.This is similar to phosphate xylem loading, as proposed by Hamburger et al. (2002).
The dendrogram and multiple alignment analysis distinguished animal and yeast proteins from the plant sequences, although it did not distinguish monocot from dicot plant sequences.This is probably due to the comparison methodology, which did not focus on distinguishing homologous from paralog genes -sequences that are produced by gene duplication and can be also related to BOR1, but have different functions (Gibas and Jambeck, 2002).Nevertheless, we observed some groupings that indicate a possible phylogenetic relationship; for example, the Lycopersicon esculentum and L. pennellii sequences that grouped with a chromosome 3 sequence of A. thaliana (At3g06450), and the sugarcane sequence which only has one aminoacid different from the maize sequences (Figure 1).
Eucalypt contigs that did not show the transmembrane region studied by Takano et al. (2002) are probably homologous to the At3g62270 protein, codified in chromosome 3 of A. thaliana and similar to BOR1, in accordance with the multiple alignment data.Of the three eucalypt contigs in the neighbor-joining tree (Figure 2), only EGEQBK1086D09.g did not confirm the BLASTP analysis.This cluster had BOR1 as a first hit in BLASTP, but grouped in the dedrogram with Hordeum vulgare (barley) sequences.This discrepancy could be related to the conserved region selected for multiple alignment, which could have generated this artifact.Its expression pattern was also different from the other clusters (Figure 3), and possibly caused a statistical deviation in expression analyses.If we exclude this contig from the analysis, there is a preferential expression of BOR1-related contigs in root.
The dendrogram confirmed the local alignment (BLASTP) analyses with EGEQFB1002H05.g and EGEQ RT300E03.gconsensus sequences (Figure 2).Considering that the bootstrap value was close to 50%, EGEQFB10    02H05.g,found only in the flower library, showed indications of being related to At1g15460 and At1g74810 proteins in Arabidopsis.In the same manner, the sequence that grouped better with BOR1 in the dendrogram and BLASTP cluster was EGEQRT300E03.g.This is the most probable candidate to contain the partial sequence of the BOR1 eucalypt homolog.Our analyses also suggest that the deduced protein of this contig may perform the same function as BOR1 in Arabidopsis thaliana, based on the in silico analyses of expression patterns and the large BOR1 region that is covered by the translation of this expressed sequence.Thus, experimental studies that involve the sequences assembled in this cluster could be important in characterizing the function of this putative protein.

Conclusion
The identification of eucalypt ESTs similar to BOR1 reinforces the idea that this gene is a member of a conserved gene family in plants, expressed in different tissues, and involved in diverse biological processes.These data indicate the need for more detailed studies to understand the function of these ion transporters in different plant tissues, mainly in root.We also concluded that FORESTs database could be an initial source of information for studies that intend to follow this path.

Figure 1 -
Figure 1 -Multiple alignment of the conserved domain in proteins that contained the second transmembrane domain of BOR1, with 10 amino acids added on each side of the common domain, was run in the Clustal X program.Genbank access (gi) of each sequence is indicated in each line.On the right is the analyzed region length.

Figure 3 -
Figure 3 -Subtree detailing the relationship between contig EGEQFB 1002H05.g and hypothetical proteins of Arabidopsis thaliana.Bootstrap values were shown in percentages.Sequences were represented with their access in GenBank (GI) and the relevant species.

Figure 4 -
Figure 4 -Hypothetical expression pattern of sequences similar to BOR1 found in FORESTs database, evaluated with normalized data.Each EST contig is represented in a line and each tissue with sequenced cDNAs in a column.Black means no expression, and different intensity red means the expression intensity of each read in a tissue.Legend: ST -stem; FBflower bud; BK -bark, xylem, heartwood and medulla; WD -wood; RTroot; CL -callus; LV -leaves; SL -seedlings.

Figure 2 -
Figure 2 -Neighbor-joining tree showing the relationship between hypothetical conserved domains in BOR1-like sequences and putative Eucalyptus proteins.The tree was drawn using MEGA2, with bootstrap of 1000 replicates.Only bootstrap values of 50% or higher are shown.Eucalypt sequences were represented as in FORESTs database (https://forests.esalq.usp.br) or SUCEST database (http://sucest.lad.ic.unicamp.br);other sequences were represented by their access in GenBank (GI) and the relevant species.

Table 1 )
Takano et al. (2002)lished criteria.Three clusters have the transmembrane domain indicated byTakano et al. (2002)related to xylem loading boron transport deficiency.

Table 1 -
Eucalypt EST contigs related to BOR1 in tBLASTn search showing their respective e-value, identity, number of gaps and shared region with the query protein.Data were confirmed with translated sequences in BLASTp.

Table 2 -
Contigs with differential expression in FORESTs database, analyzed by the Audic and Claverie statistic test (1997), with p < 0.05.