Genetics and Molecular Biology, 24 (1-4), 257-261 (2001)

Sequences from the sugarcane expressed sequence tag (SUCEST) database were analyzed based on their identities to genes encoding chalcone-synthase-like enzymes. The sorghum (Sorghum bicolor) chalcone-synthase (CHS, EC 2.3.1.74) protein sequence (gi|12229613) was used to search the SUCEST database for clusters of sequencing reads that were most similar to chalcone synthase. We found 121 reads with homology to sorghum chalcone synthase, which we were then able to sort into 14 clusters which themselves were divided into two groups (group 1 and group 2) based on the similarity of their deduced amino acid sequences. Clusters in group 1 were more similar to the sorghum enzyme than those in group 2, having the consensus sequence of the active site of chalcone and stilbene synthase. Analysis of gene expression (based on the number of reads from a specific library present in each group) indicated that most of the group 1 reads were from sugarcane flower and root libraries. Group 2 clusters were more similar to the amino acid sequence of an uncharacterized pathogen-induced protein (PI1, gi|9855801) from the S. bicolor expressed sequence tag (EST) database. The group 2 clusters sequences and PI1 proteins are 90% identical, having two amino acid changes at the chalcone and stilbene synthase consensi but conserving the cysteine residue at the active site. The PI1 EST has not been previously associated with chalcone synthase and has a different consensus sequence from the previously described chalcone synthase of sorghum. Most of the group 2 reads were from libraries prepared from sugarcane roots and plants infected with Herbaspirillum rubrisubalbicans and Gluconacetobacter diazotroficans. Our results indicate that we have identified a sugarcane chalcone synthase similar to the pathogen-induced PI1 protein found in the sorghum cDNA libraries, and it appears that both proteins represent new members of the chalcone and stilbene synthase super-family.


INTRODUCTION
Chalcone synthase (CHS) operates early in the biosynthetic pathway of flavonoids, secondary metabolites which play important roles in the interactions which occur between plants and their environment.Peters et al., (1986) have shown that this synthase is involved in pigment formation, symbiosis, and plant defenses against pathogen attack and exposure to ultra-violet light.Shirley (1996) states that chalcone synthase is involved in flavonoid synthesis, where it catalyzes the condensation of acetate groups from malonyl-CoA with 4-coumaroyl-CoA to form naringerin-chalcone, leading to the formation of flavonols, flavonones, isoflavonoids and anthocyanins.Lanz et al. (1991) have pointed out that this enzyme is well-conserved among plants of different groups, and that it has a cysteine residue at amino acid 169 that is thought to be part of the 4-coumaroyl-CoA binding site and which is required for enzyme activity.Phytoalexins (which play a role in plant-microbe interactions) can be derived from the flavonoid pathway and activation of the plant defense-response can often be detected by checking for the accumulation of chalcone synthase mRNA or by measuring the activity of this enzyme after inoculating a plant with microorganisms (Cui et al., 1996).
The chalcone synthase genes present a high degree of sequence similarity at the amino acid level and have been the object of numerous studies in dicotyledonous plants, where up to seven copies have been identified in several species (Koes et al., 1987;1989;Durbin et al., 2000).In the monocotyledonous grasses, most of the genera studied (i.e.Zea, Oryza, Hordeum and Secale) have two copies of the chalcone synthase gene (Wienand et a.l, 1986;Franken et al., 1991;Rhode et al., 1991), although seven copies have been identified in Sorghum bicolor by Lo and Nicholson (1999), (gi|5305906|, gi|5305908|, gi|5305910|, gi|5305912|, gi|5305914|, gi|5305916|, gi|5305918|).Durbin et al. (2000) have pointed out that chalcone synthase is suitable for studies of gene duplication and investigations on the origin of gene families.
The sugarcane genus, Saccharum, is complex and characterized by high polyploidy and frequent aneuploidy.The genome of modern sugarcane cultivars have a chromosome number between 70 and 120, the cultivars being derived essentially from interspecific hybridization involving different species e.g.Saccharum officinarum, Saccharum barberi, Saccharum sinense and the wild species Saccharum espontaneum and Saccharum robustus.The fact that the chalcone synthase gene family is not well characterized in graminea (Oberholzer et al., 2000) and the complexity of the sugarcane genome makes sugarcane an interesting system to study this gene family.In the work presented in this paper, we used sequence similarity and expression profiling to show that sugarcane has two groups of chalcone synthase sequences and that one of these groups represents a new member of the chalcone synthase family that may be induced by the interaction between plants and microbes.
All of the analyses were done using the cluster consensi database.Sequences were assembled at SUCEST using the fragment assembly program Phrap (Green, 1996) as described by Telles and Silva (2001).Comparisons of sequence homology between SUCEST and GenBank (http://www.ncbi.nlm.nih.gov/)sequences were performed using various basic local alignment search tool (BLAST) programs (e.g.tBLASTn) and multi-alignments were done at (http://ca.expasy.org)using the CLUSTALW program.The Pfam (http://www.sanger.ac.uk/Pfam/) and Prosite (http://ca.expasy.org/prosite/)protein family databases were used to recognize the domains.
The SUCEST database is made up of sequencing reads from clones of cDNA libraries prepared from the mRNA of a variety of tissues in different stages of development growing under different conditions, some of the libraries having as many as 18,000 reads.To investigate gene expression, the total number of sequences in a specific library was compared with the total number of reads in each cluster of a given library.We searched EST libraries from different sugarcane tissues, i.e. the flower (FL) libraries of 47,887 reads; the root (RT) libraries of 23,168 reads; the AD and HR libraries (containing a total of 22,320 reads) from plantlets infected with Gluconacetobacter diazotroficans or Herbaspirillum rubrisubalbicans, respectively; the apical meristem (AM) libraries 20,078 reads; the seed (SD) library of 15,617 reads; the stem first internode (ST) library of 14,532 reads; the lateral bud (LB) library of 12,821 reads and the stem bark (SB) library of 11,313 reads.
We found that the deduced amino acid sequence of sugarcane chalcone synthase 1 was 402 amino acids long and 97% identical to all seven S. bicolor chalcone synthases described (gi|5305906|, gi|5305908|, gi|5305910|, gi|5305912|, gi|5305914|, gi|5305916|, gi|5305918|), 95% identical to Zea mays chalcone synthase c2 (Franken et al., 1991) and 93% identical to Oryza sativa chalcone synthase.We examined the 5' end nucleotide sequence of four full-length clusters and found differences of up to five base additions.All ten deduced open reading frames (ORFs) of the group 1 clusters presented differences in amino acid composition, suggesting that there may be ten different members of this enzyme in sugarcane.
Regarding sugarcane chalcone synthase 2, we found that the deduced amino acid sequence was 398 amino acids long and 81% identical to rice chalcone synthase.By searching the GenBank dEST database with the nucleotide sequence of sugarcane chalcone synthase 2 we found that the sugarcane enzyme shared sequence identity with a 596 base-pair (bp) long pathogen induced EST (PI1, GenBank sequence BE600826) from Sorghum bicolor.The sorghum EST has not been previously identified as chalcone synthase and we found that it has a different amino acid composition at the consensi signature compared with all chalcone synthases yet identified in sorghum.We found that the deduced amino acid sequence of PI1 was 90% identical to our sugarcane chalcone synthase 2. Two of our four full-length clusters were identical at the 5 end, while the other two had one base substitution and two base additions.All four deduced ORFs of the group 2 clusters had differences in amino acid composition, and it seems that in this case there may be less then four chalcone synthase 2 homologues.
Comparison of sugarcane chalcone synthases 1 and 2 with chalcone synthase genes from other organisms All members of the chalcone and stilbene synthase families present conserved sequence features.For example, Lanz et al. (1991) have shown that the major domain of these enzymes is the well conserved active site with the consensus sequence R- where C is the active site residue.When we aligned the active site of the sugarcane chalcone synthases 1 and 2 to several other chalcone synthases (Table I), we found that the C residue at the center of the sequence aligned with the sorghum PI1 protein and the bcsA protein from Bacillus subtilis (gi|16079263|), although there were two conserved changes in the PI1 protein and sugarcane chalcone synthase 2 that did not match the conserved signature around the active residue.Jez and Noel et al., 2000 have reported that three other residues (His 307 , Asn 340 and Phe 219 ) form the catalytic center of chalcone synthase and are conserved in all chalcone synthase-like enzymes.We found that sugarcane chalcone synthases 1 and 2 all had the strictly conserved residues at the correct positions along with regions important for secondary structure organization (Table II).

Sequencing analysis and expression of sugarcane chalcone synthase 2
Figure 1 shows that the full-length cDNA of sugarcane chalcone synthase 2 is 1637bp (based on cluster analysis) with the translational start codon being at nucleotide 105 and the TGA stop codon at nucleotide 1301.This ORF has a G+C content of 59% and encoding a polypeptide of 398 amino acids.At the translational start the sequence presents the characteristic adenine residue at position -3 and cytosine residue at position +5 and has a putative polyadenylation signal (AATAAC) 124 nucleotides before the poly-adenine (poly A+) tail at base-pair position 1454-1460.
The sequence similarity and the conserved alterations at the consensus signature of sugarcane chalcone synthase    2 and the sorghum PI1 protein suggests that sugarcane chalcone synthase 2 may be a candidate for use in plant-microbe interaction studies.There are sequenced reads from 83 EST clones in chalcone synthase 1 and from 38 EST clones in chalcone synthase 2. We analyzed the differential expression of chalcone synthase 1 and chalcone synthase 2 in different tissues based on the number of sequenced reads from a given library occurring within each of the enzymes (Figure 2).This analysis showed that these enzymes exhibit a tendency to differential expression.It can be seen in Figure 2 that 0.13% of root (RT) libraries and 0.04% of flower (FL) libraries are chalcone synthase 1.There is no sequencing reads of chalcone synthase 1 in the lateral bud (LB) library.There were more chalcone synthase 2 reads (0.06%) than chalcone synthase 1 reads (0.02%) in the AD and HR libraries which had been constructed from plantlets infected with Gluconacetobacter diazotroficans (the AD library) or Herbaspirillum rubrisubalbicans (the HR library), two microbial endophytes associated with nitrogen fixation (Reinhold-Hurek and Hurek, 1998), and there were no chalcone synthase 2 reads in the flower library.
Chalcone synthase is a plant-specific polyketide synthase important for the biosynthesis of anti-microbial isoflavonoid phytoalexins, anthocyanin floral pigments and flavonoid inducers of Rhizobium nodulation genes (Dixon and Paiva, 1995;Long, 1989).Considering the number of sequenced clones of these libraries (AD1+HR1 = 22,320 and FL = 44,887) and the number of clones that are within each group, we have a three-fold increase in expression of the chalcone synthase 2 in sugarcane infected with microorganisms and no expression in flowers when comparing with chalcone synthase 1 (Figure 2).The work presented in this paper has shown that sugarcane chalcone synthase 2 is very similar to the pathogen-induced sorghum PI1 protein, and we propose that sugarcane chalcone synthase 2 has a different expression pattern and tissue specificity than sugarcane chalcone synthase 1 and that the expression pattern and tissue specificity of sugarcane chalcone synthase 2 is associated with the interaction between plants and bacteria.It is also probable that the same phenomenon exists in sorghum, with two distinct types of chalcone synthase being coded for in the genome of Sorghum bicolor.

DISCUSSION
Chalcone synthase which is important for the biosynthesis of secondary metabolites has been studied in various species, and some of them have several copies of this enzyme (Wienand et a.l, 1986;Franken et al., 1991;Rhode et al., 1991).In our study we have characterized chalcone synthase-like enzymes of sugarcane and found that these enzymes can be divided into two different groups, each with several members.
The chalcone synthase group 1 is the well-conserved chalcone synthases found in plants and is highly homologous to sorghum chalcone synthases.In sorghum there are seven different genes (gi|5305907|, gi|5305909|, gi|5305911|, gi|5305913|, gi|5305915|, gi|5305917|, gi|5305919|).Working with maize, Franken et al. (1991) have described two chalcone synthase genes, c2 and Whp (white pollen), which are 94% identical and have the same active site as other chalcone synthases but different expression patterns and tissue specificity, the c2 gene being expressed mostly in developing kernels while the Whp gene is normally expressed in the tassel.According to our research, there may be ten different members of the sugarcane chalcone synthase 1 group which are expressed in the seeds, roots, flowers and tassels of sugarcane plants.
The sugarcane chalcone synthase 2 group may contain new members of the chalcone synthase family.We base this view on the changes, which we found in the chalcone family signature, being conserved in the sorghum EST deduced PI1 protein.All the other features of the putative sugarcane chalcone synthase 2 sequences are the same as those which have been proposed to be essential for chalcone synthases, these features are conserved in sugarcane chalcone synthases 1, 2 and the sorghum PI1 (BE600826) sequence.
From the distributions of sequencing reads within the chalcone synthase 1 and 2 groups it appears that members of these groups have different patterns of gene expression.In sugarcane, we found increased chalcone synthase 2 expression in plants infected with microorganisms.The presence of a similar pathogen-induced protein in sorghum cDNA libraries supports the view that in sugarcane, as in maize, chalcone synthase may be expressed for different purposes.It is also interesting to note that the gene sequencing study of Selman-Housein et al., (1999) placed sugarcane closer to maize than to sorghum, but our sequence analysis of chalcone synthases indicates that sugarcane is closer to sorghum than to maize.The research presented in this paper is the first report in grasses of differences in the sequence and expression patterns of chalcone synthase-like enzymes related plant-microbe interaction.To confirm the results presented in this paper the EST databases of sorghum and other plants could be examined to confirm whether or not the same chalcone synthase sequence and expression patterns that we found in sugarcane exist in other crops.
-face type indicate amino acid residues which are strictly conserved in all members of chalcone and stilbene synthase family.

Figure 1 -
Figure 1 -Possible full-length cDNA sequence of sugarcane chalcone synthase 2 (CHS 2).The translational start signal and polyadenylation signal are in bold-face and underlined.

Table I -
Alignment of sugarcane chalcone synthases (CHS) amino acid sequence with other chalcone synthases.
Sorghum bicolor (PI1) 159 RAMMYHQGCF AGGMVLR 176 Bacillus subtilis 135 RIPIWGLGCA GGASGLA 152 Consensus sequence R.MMY.QGCF AGG.VLR The common residue C, is shown in bold and the A, H, and M characteristic residues of CHS2 and PI1 are underlined.

Table II -
The five characteristic motifs of plant chalcone synthase (CHS) and the distance between them in sugarcane CHS 1 and 2. The different regions correspond to the loops of the secondary structure of the enzyme.