Identification of sugarcane cDNAs encoding components of the cell cycle machinery

Data on cell cycle research in plants indicate that the majority of the fundamental regulators are conserved with other eukaryotes, but the controlling mechanisms imposed on them, and their integration into growth and development is unique to plants. To date, most studies on cell division have been conducted in dicot plants. However, monocot plants have distinct developmental strategies that will affect the regulation of cell division at the meristems. In order to advance our understanding how cell division is integrated with the basic mechanisms controlling cell growth and development in monocots, we took advantage of the sugarcane EST Project (Sucest) to carry an exhaustive data mining to identify components of the cell cycle machinery. Results obtained include the description of distinct classes of cyclin-dependent kinases (CDKs); A, B, D, and H-type cyclins; CDK-interacting proteins, CDK-inhibitory and activating kinases, pRB and E2F transcription factors. Most sugarcane cell cycle genes seem to be member of multigene families. Like in dicot plants, CDKa transcription is not restricted to tissues with elevated meristematic activity, but the vast majority of CDKb-related ESTs are found in regions of high proliferation rates. Expression of CKI genes is far more abundant in regions of less cell division, notably in lateral buds. Shared expression patterns for a group of clusters was unraveled by transcriptional profiling, and we suggest that similar approaches could be used to identify genes that are part of the same regulatory network.


INTRODUCTION
The fundamental controls that coordinate the cell cycle, ensuring that Mitosis does not proceed until DNA is completely duplicated seem to be conserved in all eukaryotes and such controls depend on cyclin-dependent kinase (CDK) complexes that regulate key transitions at the entry into S phase and Mitosis.A universal model for the control of progression through the mitotic cycle has been proposed and it involves the formation, activation and subsequent deactivation of these protein complexes.CDK complexes consist of a catalytic CDK subunit (a member of the CDC2-related family), and a regulatory cyclin subunit (Nurse, 1990).CDKs bind sequentially to a series of cyclins that are responsible for differential activation of the kinase during cell cycle progression.According to the phase of the cycle in which the cyclins are functional they can be classified as G1, S and G2-phase cyclins.In addition, CDK activity is also regulated by postranslational modifications and by association with interacting proteins (Figure 1).Studies on the characterization of plant genes involved in controlling cell division strongly suggest that the cell cycle machinery is also conserved in plants (reviewed in Mironov et al., 1999).
Cell division plays a key role in plant development by giving rise to the cells that will expand and differentiate in various cell types, finally forming the plant body.These processes require essential co-ordination of cell division and developmental signals to drive plant growth.Cell divi-sions should be differentially activated and repressed to allow the formation of multicellular tissues.Therefore, throughout plant life, developmental controls must be perfectly integrated with the mechanisms regulating mitotic activity in the meristems to construct highly ordered structures.In addition, clearly discernible developmental strategies can be recognized between the two angiosperm classes.While in monocot plants secondary growth does not occur, in dicot plants, cell division in the vascular cambium provides new cells for the stem secondary growth.Another important difference between the two classes occurs during leaf development.In grasses there is clear separation among the regions of cell division and differentiation.Divisions occur mainly at the meristematic regions at the base of the leaf, and differentiation takes place along the leaf.Regions of cell proliferation and cellular proliferation are not be so obviously defined in dicot plants.In addition, in grasses the cells of the mature leaves are terminally differentiated, while in many dicot plants many cells retain the competence to restart cell division (Hemerly et al., 1993).Although substantial progress has been made in the understanding the mechanisms controlling cell division in plants several most research has been conducted in dicot systems, and very little is known of the mechanisms regulating the cell cycle in grasses.
As a first step to study cell division in sugarcane, data mining in the SUCEST (sugarcane EST project) data set was conducted to identify genes involved in cell cycle con-trol.Sequencing of randomly selected cDNAs is an effective approach to discover new genes and inspect their expression in a broad diversity of tissues and/or treatments.Our results show that a complete set of sugarcane genes encoding components of the cell cycle machinery is present in the SUCEST data base.Moreover, more than one cluster was retrieved for most genes, indicating that they are encoded by multigene families.Sequence comparisons of the CDC2-related proteins revealed the existence of a new member of the family.Transcriptional profiles obtained for each gene revealed the existence of subsets of clusters with common expression patterns and this finding could be used to design strategies to identify novel genes that may participate in the same regulatory network.

MATERIAL AND METHODS
The overall goal of this study was to retrieve from the SUCEST data set, sugarcane homologues to all genes described in Figure 1.In order to achieve this, data mining in the SUCEST database was carried using a combination of Blastx analyses using plant genes as baits, and keyword searches in the SUCEST home page (http://sucest.lbi.dcc.unicamp.br/cgi-bin/prod/blast/form_maker.pl or http:// sucest.lbi.dcc.unicamp.br/cgi-bin/prod/blastkwsearch/blastkwsearch.pl,respectively).Each cluster from the first collection of genes retrieved was compared again with the SUCEST data set using the Blastn program in order to extend the number of candidate genes.Then, both nucleotide and deduced amino acid sequences were used to compare all the clusters inside a gene family in order to confirm the number of candidate clusters.Progression through the mitotic cycle involves the successive formation, activation and subsequent inactivation of cyclin-dependent protein kinases (CDKs).The kinases bind sequentially to a series of cyclins, which are responsible for differential activation of the kinase during the cell cycle.The G1 to S transition is thought to be controlled by CDKs containing D-type cyclins that phosphorylate the retinoblastoma protein, releasing E2F transcription factors.E2F are involved in the transcription of genes needed for the G1 to S transition.The G2 to M transition is carried by CDK complexes containing CycA and CycB cyclins.CDK complexes are kept in inactive state by phosphorylation by the Wee1 kinase, and by interaction with inhibiting proteins (CKIs).At the G2 to M boundary activation of the kinase is brought about by release of the CKI protein, by positive phosphorylation (by CAK kinase), and by a still unidentified protein phosphatase.
Alignment of partial amino acid sequences from CDC2 related sequences was made using CLUSTALW (http://biomaster.uio.no/clustalw.html).The closest plant homologue to each sugarcane gene was included in the comparison.The resulting alignment was used to obtain a phylogenetic tree using the Mega2b3 program (Kumar et al., 1993).An unrooted phylogenetic tree was constructed using the "Neighbor Joining" method.Confidence in the resulting tree was evaluated running a bootstrap test with 500 repetitions.
Transcriptional profiles were obtained by calculating the percentage of abundance from all reads of a given cluster in each cDNA library.Only cDNA libraries with more than 4500 reads were generally used.Exceptionally, when three or more reads of one cluster was found in cDNA libraries with smaller number of the reads than 4500, these reads were considered in the final calculation.Percentage abundance was obtained by first calculating a normalization factor for each gene was obtained by dividing the total number of reads of in the cluster by the total number of reads of all cDNA libraries were the particular gene was ex-pressed.Next, a normalized value for individual cDNA libraries was calculated by multiplying the normalization factor by the number of reads in the particular cDNA library.Finally, percentage abundance was calculated by dividing the normalized value by the sum of all normalized values, and multiplying the result by 100.

Cyclin-depedent kinases
Cyclin-dependent kinases (CDKs) play a pivotal part in the control cell cycle progression in eukaryotes.They are specific serine/threonine kinases that are activated at defined cell cycle phases.Various homologous of CDKs have been isolated from different plant species (reviewed in Hemerly et al., 1999).The CDKs can be separated into three main classes.The group containing the PSTAIRE sequence, with the perfectly conserved 16 amino acid cyclin interacting motif forms the a-type class.A second class, the b-type, contains the PPTALRE or the PPTTLRE motifs at the equivalent position.The c-type class is formed by a group of homologues with divergent motifs that do not fit into either of the other classes.
A phylogenetic tree was constructed with the cDNA sequences from the members of the sugarcane CDC2-related family available in the SUCEST data set.Because not all clusters contained full-length clones, the alignment was constructed with amino acid sequences ranging from the conserved kinase catalytic domains I to VI (Hanks et al., 1988).Data mining in the SUCEST data set allowed the identification of 2 CDK-a homologous clusters.Interestingly, the rice Cdc2Os1 gene (the rice CDC2 functional homologue) shares more similarities with the Cdc2Sc1-1 cluster (Figure 2), indicating that a gene duplication occurred in the Saccharum progenitor species after the event that generated the Oryza sativa/Saccharum sp separation.Because commercial sugarcane varieties are hybrids of Saccharum officinarum and Saccharum spontaneous species it would be interesting to verify if each gene is originated from the individual parental species.Expression profile shows that the Cdc2Sc1-2 transcript is less abundant (5 out 25 total reads came from this gene, data not shown).Interestingly, all transcripts from this gene came from tissues with less meristematic activity, and one hypothesis is that this gene may have evolved to perform cell division in more restricted situations, while the other gene is responsible for cell divisions in regions of intense cell proliferation.
The CDK-related proteins are separated in six discrete families with a clear correlation with their function (Figure 2).The CDK-b1 class was represented in the EST bank by one cluster with the characteristic PPTALRE motif (Cdc2Sc2, Table I).Its close homologue is the tobacco CdkB1-2.An interesting situation emerges in the CDK-b2 family.One cluster with the typical PPTTLRE family motif was identified (Cdc2Sc3).In addition, another SUCEST cluster, with a PPTAMRE motif (Cdc2Sc4) was found in the CDK-b2 branch of the tree.Data base searches have not identified any CDK homologue in other eukaryotes with this motif.Because the bootstrap value separating these 2 clusters was quite high, the PPTAMRE containing protein may represent a new gene.On the other hand, both clusters are together in a branch of the tree that includes typical CDC-b2 genes such as the alfalfa Cdc2MsF and the rice Cdc2Os3 and this observation could suggest that CdcSc3 and CdcSc4 are alleles.The determination of the complete sequence of the cDNAs should solve this question.
Clusters homologous to CDK-activating kinase (CAK), to CDK8 kinase, to PITSLRE the family of human protein kinases and a cluster with a PITAIRE motif are split from the CDK-a and CDK-b classes.Although the phylogenetic tree was constructed with partial amino acid sequences, the results reinforce previous suggestions that the CDK-b branch may have evolved from a CDK-a containing phylum (Lessard et al., 1999).Furthermore, it also indicates that an earlier duplication in the family of CDC2related kinases separated the members of the family involved in the control of cell division from another group of proteins that evolved to assume more specialized functions.
An important step in CDK activation is phosphorylation of a conserved threonine residue by CAK (Mironov et al., 1999).While in some organisms like mammals, CAKs can not only activate CDKs, but also phosphorylates the carboxy-terminal domain (CTD) of RNA-Polymerase II, in yeast these activities are carried out by two separated kinases.It has been proposed that in plants that both situations may exist.In rice, a CAK homologue (Cdc2OsR2) can carry the two biochemical activities (Yamaguchi et al., 1998), but an Arabidopsis CAK homologue (Cak1At) can only phosphorylate cyclin-dependent kinases (Umeda et al., 1998).A sugarcane CAK homologous gene was identified (Cdc2Sc7) in the SUCEST database, and its closest match is the rice Cdc2OsR2 gene.Surprisingly, the best hit in the Arabidopsis genome (74% of homology) was not Cak1At but a putative CDC2-related protein.
The closest homologue to the sugarcane Cdc2Sc5 cluster was a putative CDC2-related kinase from Arabidopsis (Figure 2), and the best hits with proteins with known function was with the family of human PITSLRE protein kinases that are targets to caspases during apoptosis (Tang et al., 1998).Two clusters with the PITAIRE motif were identified in the sugarcane family of CDC2 related proteins (Cdc2Sc6).Animal and plant proteins with very similar sequences are found in databases, but a biochemical function has not been proposed yet.A sugarcane cluster with a novel PRQILRE motif (Cdc2Sc9) is highly homologous to the Cdc2Sc6 clusters in the conserved kinase catalytic domains, but quite divergent in less conserved regions.Further characterization of this cluster may reveal that this is a new member of the CDK family not described yet in eukaryotes.The Cdc2Sc8 cluster encodes a homologue to the human CDK8 gene that interacts with Cyclin C and is involved in transcription (Tassan et al., 1995).

Cyclins
It has been demonstrated that the triggering of the events that control cell cycle progression depends on the accumulation of the cyclin proteins.Cyclin binding to CDKs regulates not only kinase activity, but also determines the substrate specificity and subcellular localization of the kinase.Plant cyclins are classified as A, B or D-types (CycA, CycB and CycD), depending on their sequence homology to their animal counterparts.According to their sequence homology plant cyclin were separated in classes.While expression of CycA and CycB genes is cell cycle regulated, CycD mRNA levels do not fluctuate in actively dividing cells.There is a growing amount of evidence that CycDs are involved in mediating the response of plant cells to stimuli that control re-initiation of cell division (reviewed in Huntley and Murray, 1999).
Sugarcane cyclins identified in this study could be classified in subgroups, using the categorization scheme proposed by Renaudin and coworkers (1996).Clusters encoding CycA1, CycA2, CycA3, CycB1, CycB2, CycD1, CycD2, and CycD3-related genes were encountered by us in the SUCEST database with varying numbers in each class (Table I).Interestingly, in the CycD2 class, one cluster displays slightly more similarity with the Arabidopsis CYCD4;1 gene, than with CYCD2;1 gene.Whether sugarcane encodes also a D4-type cyclin will only be confirmed by the determination of the full sequences of the clusters, but two observations suggest that this indeed be the case.First, there was no nucleotide homology among of this cluster and the other D2-type cyclins, despite the fact it is long cDNA.The other observation is that the homology between the putative D4 cluster and the other D2 sugarcane cyclins was approximately the same it is found among the Arabi-dopsis cycD2 and cycD4 proteins.Our data mining in the SUCEST data set indicated the presence of two sugarcane clusters encoding cyclinH proteins, the regulatory subunit of the CAK kinase.
Other genes involved in the regulation of CDK activity Besides cyclins, other proteins have been isolated based on their ability to bind to CDKs, among them the anchoring protein CKS or Suc1, which can bind both negative and positive regulators of CDKs (Mironov et al., 1999).In plants, it has been shown that the Arabidopsis CKS homologue, CKS1At is capable of binding both Cdc2aAt and Cdc2bAt (Jacqmard et al., 1999).Sugarcane clusters with homology to CKS were identified in the SUCEST database.Their deduced amino acid sequences have only thee conserved substitutions compared with the Arabidopsis gene.Both at the amino acid and at the nucleotide levels the coding regions of the sugarcane genes were 100% identical.However, at the 5' UTR and 3' UTR of the sugarcane clusters there were minor differences that suggested that they were indeed different genes.Another group of proteins, called CDK inhibitors (CKIs), interact with CDK and inhibit cell cycle progression.Although several CKIs have been identified in the genome of Arabidopsis, the sequence of only two of these genes has been reported (Wang et al., 1998;Lui et al., 2000).Our results indicate that sugarcane encodes at least four CKI genes.Sequence comparisons indicated that their homology was restricted to the carboxy-terminal domain, and similar observation was made before, regarding the homology among Arabidopsis CKI genes (Lui et al., 2000).
Modulation of CDK activity can also be accomplished by reversible phosphorylation in conserved tyrosine and threonine residues located in the catalytic subdomain I. Inhibitory phosphorylation is carried out by a dual specificity kinase, Wee1.Three sugarcane clusters homologous to the maize Wee1 gene (Sun et al., 1999) are present in the SUCEST bank.In higher animals and in the fission yeast, dephosphorylation in the above-mentioned residues is a key step in activating CDKs and it is carried by a protein phosphatase called Cdc25.There are genetic and biochemical evidence that tyrosine dephosphorylation on CDKs may be essential for cell cycle progression in plants (John, 1998;McKibbin et al., 1998).However, no CDC25 homologue has been identified in plants, despite the availability of large cDNA collections, such as the SUCEST data set, and the completion of the sequence of the Arabidopsis genome.
Recent work carried in plants suggests that the pathway controlling G1 to S transition in mammalian cells is also conserved in plants (Huntley and Murray, 1999).Key players in this regulatory path, such as the retinoblastoma gene and E2F transcription factions have been identified in several plants.This view is strengthened by the observation that in maize leaves, levels of the retinoblastoma protein are inversely correlated with Cdc2-related protein accumulation detected with an anti-PSTAIR antibody (Huntley et al., 1998).Furthermore, as in higher eukaryotes, plant cyclin D proteins may act as sensors of external mitogenic stimuli (Riou-Khamlichi et al., 1999).In sugarcane, we identified in the Sucest data set seven retinoblastoma-related clusters.However, they may indeed represent only four genes, because the best homology of four of these clusters was the maize RRB1 gene (Ach et al., 1997).Another cluster was highly homologous to the maize RRB2 gene, and two further clusters were identified for the first time in grasses, and they share highest similarities with homologous genes in pea and poplar respectively.Out of 5 sugarcane E2F-related clusters, three have highest homologies with the only wheat E2F published sequence (Ramirez-Parra et al., 1999), the other two clusters being homologous to different Arabidopsis E2F-related sequences.

Expression analysis based on relative abundance of EST/CDNA library
Large sets of ESTs from redundant non-normalized cDNA libraries can be used to detect evidence of differential expression of individual genes in distinct tissues and/or treatments.An estimation of the abundance of a given mRNA can be obtained by the frequency of the ESTs corresponding to this mRNA, divided by the total number of ESTs in the data set.Not only can the expression of individual genes can be compared in several situations, but it is possible to compare the mRNA expression of different genes.This is the concept of "digital Northern blot" or "transcriptional profile" (for a discussion of the advantages and limitations of this approach see Audic andClaverie, 1997 andOhlorogge andBenning, 2000).One assumption of this strategy is that groups of genes with similar transcriptional profiles could be part of a common regulatory network.
We calculated the relative abundance of ESTs in each cDNA library and a transcriptional profile of the sugarcane components of the cell cycle machinery was obtained (Table I).Previous analysis using promoter marker fusions and in situ hybridization in dicot plants have shown that CDK-a expression is not restricted to dividing cells, its abundance being correlated with the tissue competence to proliferate (Hemerly et al., 1993).In contrast, CDK-b is preferentially expressed in dividing cells (Fobert et al., 1996).Our transcriptional profile of these genes in the SUCEST data set indicated that a similar pattern exists in monocot plants.While, CDC2Sc1 transcripts were observed not only in actively dividing tissues (39% abundance in cDNA libraries other than the RT, AM and FL), Cdc2Sc2, Cdc2Sc3 and Cdc2Sc4 mRNA levels are much reduced or even absent in tissues with less meristematic activity (79, 100 and 86%, respectively in the RT, AM and FL libraries).
Cdc2Sc5 and Cdc2Sc6 shared approximately 60% of homology at the amino acid level, and they are closer to each other than to any other member of the CDC2-related family (Figure 2, and other data not shown).Although no biological function has been proposed for PITAIRE CDKs, because of the sequence homology between Cdc2Sc4 and Cdc2Sc5, and their strikingly similar transcriptional profiles (Table I), it is tempting to speculate that the PITAIRE CDKs could also be involved in apoptosis.
A, and B-type sugarcane cyclins were preferentially expressed tissues with more meristematic activity, especially in apical meristems and flowers.In the A and B classes, CycA2 and CycB2 were the only sugarcane genes present in roots.Surprisingly, the transcriptional profile of the sugarcane CycD-type genes showed a much less positive correlation with meristematic activity.Only approximately 50% of the CycD1 and CycD2 were found in the RT, AM and FL cDNA libraries.CycD3 transcripts were absent of flowers and roots, but present in apical meristems.These observations are not easy to reconcile with data from animals and dicot plants which shows that cyclin D genes are absent from non-dividing tissues, their expression being induced by mitogenic stimuli (Huntley and Murray, 1999).The enrichment of cyclin D transcripts in tissues with low meristematic activity could indicate that sugarcane adopted a particular developmental strategy in answer to external stimuli.Alternatively, the distinctive transcriptional profiles of cyclin D genes may reflect the restricted meristematic activity or competence to divide in a subset of the cells inside each tissue.This later hypothesis is strengthened by the pattern of expression of the sugarcane retinoblastoma and E2F genes.Only 51 of the retinoblastoma, and 58% of the E2F total ESTs were restricted to the RT, AM and FL cDNA libraries.However, there was a strong correlation among presence and abundance of transcripts from both genes and the family of sugarcane cyclin D genes in the other cDNA libraries.In animal systems, phosphorylation of pRB by cyclin D kinases results in E2F release leading to the G1 to S transition, and the same could be happening in sugarcane cells.More careful studies of cyclin D expression in specific tissues may help to answer this question.
Studies on CKI expression in dicot plants are compatible with a role in inhibiting or arresting cell division during plant development (Wang et al., 1998;Lui et al., 2000).Likewise, the transcription profile of the sugarcane CKI clusters indicates that similar they may serve to a similar purpose in monocots, because 85% of all CKI transcripts were found in libraries prepared with tissues with low meristematic activity.Even the transcripts detected in the flowers were from a cDNA library constructed with mRNA extracted from flowers at late stage of development.Noteworthy is the abundance of CKI transcripts in lateral buds.It has been shown in dicot plants that CKI can bind to both A-type cyclins and cycD3 (Wang et al., 1998;Mironov et al., 1999).Therefore, it is suggestive that in the sugarcane tissues showing the highest level of CKI transcripts, only A and D-type cyclin ESTs are present.
The results described here represent a first step towards understanding the regulation of cell division regulation in sugarcane.Sugarcane homologous clusters to all known components of the cell cycle controlling machinery in plants were identified.In addition, transcriptional profiles obtained allowed the identification of subsets of clusters showing similar expression patterns.In future studies, the integration of more traditional approaches of investigating gene expression with new strategies to retrieve relevant information from the SUCEST data set should advance research and render a wealth of new perspectives from which to investigate questions not only in sugarcane, but in plants in general.

Figure 1 -
Figure 1 -Schematic representation of cell cycle control in plants.Progression through the mitotic cycle involves the successive formation, activation and subsequent inactivation of cyclin-dependent protein kinases (CDKs).The kinases bind sequentially to a series of cyclins, which are responsible for differential activation of the kinase during the cell cycle.The G1 to S transition is thought to be controlled by CDKs containing D-type cyclins that phosphorylate the retinoblastoma protein, releasing E2F transcription factors.E2F are involved in the transcription of genes needed for the G1 to S transition.The G2 to M transition is carried by CDK complexes containing CycA and CycB cyclins.CDK complexes are kept in inactive state by phosphorylation by the Wee1 kinase, and by interaction with inhibiting proteins (CKIs).At the G2 to M boundary activation of the kinase is brought about by release of the CKI protein, by positive phosphorylation (by CAK kinase), and by a still unidentified protein phosphatase.

Table I -
Sugarcane ESTs encoding regulators of the plant cell cycle Sugarcane cDNA libraries: RT: roots; RZ: leaf-root transition zone; SB: stem bark; LB: lateral buds; LR: leaf roll; AM: apical meristem; ST: first and fourth internodes; FL: flowers in different stages of development; SD: seeds in different stages of development; LV: leaves from etiolated plants; CL: calli; AD: plantlets without developed leafs and roots, infected with G. diazotroficans; HR1: plantlets without developed leafs and roots infected with H. rubrisubalbicans.Asterisks (*) indicates the presence of a corresponding EST in cDNA libraries with less than 4500 reads.