Eucalyptus ESTs associated with resistance to herbicide inhibitors of aromatic and branched-chain amino acid synthesis

Herbicides inhibit enzymatic systems of plants. Acetolactate synthase (ALS, EC = 4.1.3.18) and 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS, EC 2.5.1.19) are key enzymes for herbicide action. Hundreds of compounds inhibit ALS. This enzyme is highly variable, enabling the selective control of weeds in a number of crops. Glyphosate, the only commercial herbicide inhibiting EPSPS is widely used for non-selective control of weeds in many crops. Recently, transgenic crops resistant to glyphosate were developed and have been used by farmers. The aim of this study was the data mining of eucalypt expressed sequence tags (ESTs) in the FORESTs Genome Project database (https://forests.esalq.usp.br) related to these enzymes. Representative amino acid sequences from the NCBI database associated with ALS and EPSPS were blasted with ESTs from the FORESTs database using the tBLASTx option of the blast tool. The best blasting reads and clusters from FORESTs, represented as nucleotide sequences, were blasted back with the NCBI database to evaluate the level of similarity with available sequences from different species. One and seven clusters were identified as showing high similarity with EPSPS and ALS sequences from the literature, respectively. The alignment of EPSPS sequences allowed the identification of conserved regions that can be used to design specific primers for additional sequencings.

The mechanisms of action of herbicides associated with ALS and EPSPS enzymes have been described by several authors, such as Mousdale and Coggins (1991), Hess (1993), Hess (1997b), and Hess and Bridges (2002).The enzyme Acetolactate-synthase (ALS), which is the site of action of sulfonylureas, imidazolinones and other groups of herbicides, acts on the synthesis route of the branchedchain amino acids leucine, isoleucine and valine (Nelson and Leningher, 2000).
According to the authors mentioned above, ALS is characterized by its great variability; it is therefore possible to develop herbicides with the ability to differentiate and select among plants with a high degree of similarity (for example, control of dicotyledon weeds in dicotyledon crops, or grass weeds in grass crops).The great diversity of this enzyme system, in addition to the wide use of herbicides that act upon it, is responsible for a higher number of occurrences of weed species resistant to herbicides.
The ALS enzyme is widely studied, and has been sequenced from several plant species susceptible or resistant to herbicides that act upon it.The nucleotide sequences that confer resistance, initially present at low frequencies in weed populations that developed resistance, are the same found in genotypes of cultivated plants equally tolerant to herbicides.It is likely that the sequences that condition resistance are present in at least some individuals of many plant species, making it viable to obtain resistant genotypes without using transgenic techniques.Hess (1997b), Tan and Medd (2002) and Yu et al. (2003) observed that the resistance to ALS inhibiting herbicides has been attributed to single point mutations, which can occur at multiple sites in ALS gene.Base changes in at least four protein domains have been associated with in vivo resistance in field plants.The most common mutation in biotypes selected by sulfonylureas is located in the highly conserved "Domain A" site that codes for 13 amino acids, where any alteration of the codon for Proline confers resistance, primarily to the sulfonylureas and triazolopyrimidines.A Tryptophan ® Leucine mutation in "Domain B" has been associated with broad cross-resistance to representatives of all four families of ALS inhibiting herbicides.In "Domain C", an Alanine ® Threonine mutation appears to confer resistance only to imidazolinones.An Alanine ® Valine substitution in "Domain D" is reported to confer broad cross-resistance, as in the case of the mutation in "Domain B." The EPSPS enzyme (5-enolpyruvylshikimate-3phosphate synthase, E.C. 2.5.1.19)acts on the synthesis pathway of aromatic amino acids and of shikimic acid, which are involved in the production of many compounds that belong to the secondary metabolism of plants, mainly related to growth, wood quality, and allelopathic effects, in addition to tolerance to pests and diseases, with emphasis on indole acetic acid, lignin, flavonoids, and tannins (Mousdale and Coggins, 1991;Hess, 1993;Hess, 1997b;Nelson and Leningher, 2000;Hess and Bridges, 2002).
Glyphosate (N-phosphonomethyl-glycine) is the only commercially available compound that acts upon the EPSPS enzyme.It is a systemic, non-selective, broad-spectrum herbicide, with translocation via the symplast, and absorption facilitated by proteins that transport phosphate groups, which are present in the plasma membrane.This herbicide is widely used in eucalyptus plantations as the major tool for weed control.Without transgenic transformation, all plant species are sensitive to glyphosate in a higher or lower degree, indicating little functional variability of EPSPS at the compound's binding site.Glyphosate is considered a non-selective herbicide for eucalyptus, and therefore should be applied as a spray directed to the weeds.
The EPSPS enzyme is coded at the nucleus and works within the chloroplast (Stauffer et al., 2001), by catalyzing the binding of shikimate-3-phosphate and phosphoenolpyruvate, producing enolpyruvylshikimate-3-phosphate and inorganic phosphate (Peterson et al., 1996).According to Hess (1993), glyphosate is a non-competitive and competitive inhibitor, respectively, in relation to those two substrates.EPSPS inhibition leads to an accumulation of shikimate in the vacuoles and an increase in the flow of carbon into this pathway, which is exacerbated by the loss of feedback control.
According to Kruse et al. (2000), approximately 35% of plant dry mass is represented by derivatives of the shikimate pathway and 20% of the carbon fixated via photosynthesis follows this metabolic route.Singh and Shaner (1998) concluded that glyphosate blocked the activity of EPSP and increased the accumulation of shikimic acid.The authors also describe a method that can be used to evaluate the concentrations of shikimic acid even in dead leaves.
It is worth noting that Rippert et al. (2004) demonstrated that the shikimic acid pathway is indirectly involved in a second mechanism of action of herbicides.The compound DKN (the active Isoxaflutole metabolite) acts on the conversion of Hydroxyphenylpyruvate (produced from tyrosine) into homogentisate.Homogentisate is, in turn, used in the production of tocopherols, tocotrienols, and Vitamin E, which are important antioxidant compounds.The authors observed that the increase in shikimate production could lead to increases in the accumulated amounts of tocopherols, tocotrienols, and Vitamin E, and could increase resistance to DKN (active isoxaflutole metabolite), a herbicide with great potential for use in eucalyptus.
The processes for obtaining plants that are resistant or tolerant to glyphosate are discussed by Shaner and Bridges (2002) and Meilan et al. (2002), and are nearly always based on the super-expression of the gene responsible for producing the EPSPS enzyme or on the use of amino acid sequences that cause a greater affinity of the enzyme with PEP, which binds to shikimate when producing EPSPS (as, for example, in the configuration that corresponds to the CP4 EPSPS Agrobacterium gene presented by Harrison et al., 1996).In soybean cultivars resistant to the herbicide, both modifications are present.There is a third, less frequently used alternative, which consists of the insertion of the GOX gene (glyphosate oxireductase responsible for transforming glyphosate into glyoxylate and into the inactive metabolite AMPA -aminomethylphosphonic acid).
The broad spectrum of control and absence of selectivity in glyphosate, with the exception of transgenic crops, indicate the absence of functional variability in the EPSPS enzyme, specifically at the glyphosate binding site.However, the work by Rippert et al. (2004) indicates the presence of functional variability for the synthesis of aromatic amino acids; it is possible to alter or optimize the synthesis pathway of these compounds with important reflexes on growth and on plant tolerance to stresses and to herbicides that inhibit the HPPD enzyme.The papers show that the enzyme can be found at different amounts in different genotypes (probably as a function of the presence of different promoters) conditioning different velocities of production of aromatic amino acids, even if the nucleotide sequence is maintained.
The objective of this work was to locate Eucalyptus ESTs corresponding to the ALS (acetolactate synthase enzyme) and EPSPS (5-enolpyruvylshikimate-3-phosphate synthase, E.C. 2.5.1.19)genes, which are directly related to resistance to herbicides and could optimize the process of obtaining plants that are resistant to the products that act upon those enzymes.

Material and Methods
The Eucalyptus Genome Project (Projeto Genoma do Eucalipto -FORESTs), developed by a consortium of four companies in the forestry industry, in an agreement with FAPESP, and executed with the participation of 20 laboratories from the State of São Paulo associated with the AEG network (https:forests.esalq.usp.Br), obtained 123,889 reads constructed from expressed sequence tags (ESTs) of cDNA libraries, mainly derived from Eucalyptus grandis tissues.The tissues were removed from different organs of plants submitted to different growing conditions.The makeup and coding of the libraries are described in Table 1.
The search for ALS and EPSPS enzymes was performed using the BLAST tool (Altschud et al., 1997).The amino acid sequences for those enzymes, described for different plant species, were compared with the information from the FORESTs project database using the "tBLASTn" option, allowing the identification of Clusters associated with them.Only Clusters effectively aligned with the amino acid sequences were selected, using an e-value < e-70 as a selection criterion.
The nucleotide sequences of the selected Clusters were compared with the NCBI (National Center for Biotechnology Information) and the geneBank amino acid sequence databases after translation in all possible frames.The procedure allowed us to confirm the alignment with sequences of the GS enzyme or the D1 protein from different plant species, to find the translation frame for the cluster and to obtain values of identity percentages and similarity probability values (e-value) for the alignments.
Based on the translation frame that produced the best alignments and using the software GENERUNR, the nucleotide sequence corresponding to the clusters was translated into amino acids for the identification and analysis of Open Read Frames.The amino acid sequences corresponding to ORFs were aligned with the amino acid sequences of different plant species (with e-value < e-70 as previously described).The software CLUSTAL was used to align the sequences and to estimate the phylogenetic distances represented in consensus phylogenetic trees obtained from a total of 1000 bootstrap trials.The trees were built up using the neighbor-joining method and were plotted with TreeView.

Results and Discussion
Amino acid sequences corresponding to the ALS and EPSPS enzymes found in the literature for other plant species were confronted with the sequences in the FORESTs project database, and high levels of similarity were found.The main results of this initial similarity search process are presented in Table 2.
The nucleotide sequences of the clusters corresponding to ALS enzyme were compared with the NCBI (National Center Biotechnology Information) and the genBank amino acid sequence databases after translation in all possible frames and the results are presented in Table 3. Due to the high number of alignments with e-values smaller than e-70, they were counted and the numbers are presented in the fourth column of Table 3.It was possible to identify ORFs 210 to 679 amino acids in length and corresponding to GS sequences from the literature.
In Table 4 we listed information on the numbers of reads that make up clusters related to the ALS enzyme, classified by tissues of origin.It must be pointed out that the 32 reads related to ALS were present in 12 of the 19 librar-ies, indicating the expression of this gene in different tissues and conditions.
The results showed that the cluster EGEQBK1001 E10.g is consensual in 14 reads, corresponds to a sequence of 2611 bases and encloses a complete sequence of the Acetolactate Synthase enzyme.The start codon was located at the position 268, and the sequence was truncated by an end codon at the position 2,365.Coding was obtained for a total of 679 amino acids.
The alignments of amino acid sequences from the literature indicated the presence of several conserved regions 578 Velini et al.
from 1 to 28 amino acids in length.The longest conserved region encloses the domain A, where any alteration of the codon for proline confers resistance to sulfonylureas and imidazolinones.When the sequences from Eucalyptus were included in the alignments, no domain could be observed.Even alignments using only the sequences from Eucalyptus did not allow identifying conserved regions that could be used to design specific primers for further sequencings of the ALS gene.
The absence of domains in Eucalyptus is in conflict with the information available in the literature for several other plant species and indicates the need for further and better sequencings of the ALS gene.Tan and Medd (2002) describe the primers used to clone this gene in Raphanus raphanistrum, but the nucleotide sequences from the litera-ture also can be used to design the specific primers necessary for additional sequencings.
The phylogenetic distance matrix obtained from the amino acid sequences from the literature and the sequence corresponding to the cluster EGEQBK1001E10.g is represented in Figure 1.This cluster was selected because it encodes a complete sequence corresponding to ALS gene.The GenBank accession numbers for the 14 sequences from the literature are listed in Table 5.The phylogenetic tree presents three main branches enclosing the cluster EGEQBK1001E10.g (isolated from the other species), five Poaceae and ten eudicotyledon species.The species grouping, except for Eucalyptus, is consistent with the information about the control spectra of herbicides inhibiting ALS.Most of the compounds act mainly on sedges, grasses or broadleaved weeds.Few compounds have broad-spectrum control.
The results show that additional sequencings are necessary to develop assisted breeding procedures aiming to identify individuals and to obtain eucalyptus varieties resistant to ALS-inhibiting herbicides.Considering the high number of molecules that act on this site, including preemergence or postemergence herbicides from different manufacturers and with different control spectra and environmental dynamics, the resistance to ALS inhibitors is quite significant from a practical standpoint.
For the EPSPS enzyme, only cluster EGEQRT310 2H01.g was identified with a significant e-value (Table 2).With regard to the origin of the 11 reads that made up the cluster, 4, 4, 2, and 1 corresponded to libraries consisting of nursery seedling roots, E. grandis wood, E. globulus seedlings grown in the dark, and leaves colonized with the Thyrinteina sp.caterpillar for 7 days, respectively.The cluster EGEQBK1001E10.g corresponded to a sequence of 1,931 bases and encloses a complete sequence of the EPSPS enzyme.The start codon was located at the position 322, and the sequence was truncated by an end codon at the position 1,897.Coding was obtained for a total of 523 amino acids.
For this cluster, several homologous genes from different species were found, with high similarity indices (Table 6).Consistent with information from the literature and with the absence of selectivity shown by glyphosate, the Eucalyptus ESTs related to ALS and EPSPS 579  phylogenetic analysis indicated a high level of similarity among sequences corresponding to this enzyme from different plant species (Figure 2).The presence of a branch enclosing the four grass species studied (Lolium rigidum, Oryza sativa, Zea mays and Eleusine indica) must be pointed out.
The alignments of amino acid sequences from the literature and corresponding to the cluster EGEQBK1001 E10.g indicated the presence of several conserved regions from 1 to 20 amino acids in length.It is viable to use these domains to design specific primers for additional sequencings aiming at locating nucleotide polymorphism and different promoters associated to the EPSPS enzyme.
The identification of different promoters could allow control of the expression levels of the gene associated with EPSPS, altering, in turn, the production of lignin, tannins, flavonoids, IAA, tocopherols, tocotrienols, and Vitamin E, with likely consequences in terms of sensitivity or tolerance to pests, diseases, glyphosate, and plant growth.The potential of use of tyrosine production increases, to reduce sensitivity to DKN (Isoxaflutole) must be pointed out, as evidenced in the work by Rippert et al. (2004).

Figure 1 -
Figure 1 -Alignment of the amino acid sequence for the acetolactate synthase enzyme (ALS, EC = 4.1.3.18).Sequences for different organisms obtained from the GenBank were compared with the longest sequence found in the FORESTs ESTs Database.The tree was built using the neighbor-joining method with the ClustalX program, and was visualized with TreeView.

Figure 2 -
Figure 2 -Alignment of the amino acid sequences for the enzyme 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS, E.C. = 2.5.1.19).Sequences for different organisms obtained from the GenBank were compared with the highest e-value clusters found in the FORESTs ESTs Database.The tree was built using the neighbor-joining method with the ClustalX program, and was visualized with TreeView.

Table 1 -
Codes and source tissues of cDNA libraries approved by the FORESTS project.

Table 2 -
FORESTs clusters, identified through the tBlastn tool, which showed similarity with the ALS and EPSPS enzymes.

Table 3 -
Translation frames for the nucleotide sequence of the clusters, minimum and maximum values of e-value and sizes of the ORFs corresponding to the ALS enzyme.
(1) Number of amino acid sequences from the literature aligning with sequences corresponding to the clusters from Eucalyptus with e-values lower than e-70.

Table 4 -
Read origin libraries corresponding to clusters with the greatest similarity for the ALS enzyme.

Table 5 -
GenBank accession numbers for the sequences used to build up the phylogenetic tree for the ALS enzyme.

Table 6 -
Translation frames for the nucleotide sequence of the cluster EGEQRT3102H01.g, values of e-value for the alignments, amino acid sequence lengths corresponding to EPSPS enzyme of different higher plant species and identity percentages in relation to the sequence of the cluster.