A search for markers of sugarcane evolution

To determine the phylogenetic relationship between sugarcane cultivars and other members of the Saccharinae subtribe, we identified the fast evolving ITS1-5.8S-ITS2 (ITS = internal transcribed spacer; 5.8S = 5.8S ribosomal DNA) region of the sugarcane genome in the Sugarcane Expressed Sequence Tag (SUCEST) genome project database. Parsimony analysis utilizing this region and homologs belonging to the 23 closely related Andropogoneae currently deposited in the GenBank database has shown sugarcane as the sister group of Saccharum sinense. However, because there are few parsimony-informative characters and high homoplasy in the ITS1-5.8S-ITS2 region we were not able to determine with confidence the phylogenetic relationship between sugarcane and some of the remaining members of Saccharine subtribe. To find alternatives for the phylogenetic reconstruction of sugarcane evolutionary history, we selected 17 markers (nuclear, chloroplastic or mitochondrial) from the SUCEST database of which apha-tubulin, ribosomal protein L16 (rpl16) and DNA-directed RNA polymerase beta chain (rpoC2) were found to have a low incidence of polymorphism and comparable, or even faster, rates of evolution than the ITS1-5.8S-ITS2 region. We suggest that these markers should be considered as preferential choices for phylogenetic studies of Saccharinae subtribe.


INTRODUCTION
The Saccharum L. group is a polyploid complex within the Saccharinae subtribe of the Andropogoneae Dumort tribe which itself if located within the Poaceae family (Sobral et al., 1994;Jacobs and Everett, 2000).The Saccharum L. group and Sorghum (Sorghinae subtribe) seem to have diverged from a common ancestor 5 million years ago (Al-Janabi et al., 1994) through a single and rapid radiation (Kellog, 2000;Spangler, 2000), accumulating few consistent mutations (Kellog and Watson, 1993;Mason-Gamer et al., 1998;Spangler et al., 1999).Because of this, phylogenetic reconstructions of Andropogonae often result in clades with poorly supported relationships (Spangler, 1999).Even the fast evolving ITS1 and ITS2 regions have shown few differences between or within the Andropogoneae species (Wang et al., 2000, Ainouche andBayer, 1997).Therefore, the molecular systematics of the Andropogoneae would greatly benefit from markers with similar, or even faster, rates of evolution compared to that of the ITS regions.In this regard, the Sugarcane Expressed Sequence Tag (SUCEST) genome project is an outstanding source of information because it contains many of the genes expressed by sugarcane cultivars used in agriculture.
However, the identification of such markers may be complicated by the large size (2n = 18-170) of the Saccharum genome, whose expansion has produced new gene combinations and increased polymorphism and chromosome number (Ming et al., 1998).The genomic complexity of the sugarcane cultivars whose data is included in the SUCEST database is probably even greater, since they are derived from hybrids between Saccharum spontaneum and Saccharum officinarum.
The aim of the present investigation was to identify fast evolving markers within the SUCEST database, investigate their polymorphism and select the most appropriate of these markers to study the evolution of closely related specimens within Andrpogoneae.To accomplish this, we identified sugarcane internal transcribed spacer (ITS) ITS1 and ITS2 sequences and used them to infer the phylogenetic position of sugarcane within the Saccharinae.We also compared the rate of evolution of the ITS regions to the rate of evolution of nuclear, chloroplastic and mitochondrial cDNA sequences selected from the SUCEST database.Our results indicate that at least three sugarcane markers have comparable or even faster evolution rates than the ITS regions and can be considered as preferential choices for use in research on the molecular systematics of the Saccharinae subtribe.

Search for sugarcane sequences
Complete coding sequences of markers, including some commonly utilized in plant phylogenetic studies (Table I) were retrieved from the GenBank database and utilized as a query to a basic local alignment search tool (BLASTN) search against the SUCEST database.Results were treated as described below in order to select appropriate molecules for evolutionary studies.

Retrieving of sequences homologous to sugarcane markers
Each of the identified sugarcane markers was utilized in a new BLASTN search against the GenBank database, and homologous sequences were retrieved and grouped into a new database for phylogenetic analysis.

Phylogenetic analysis
Conserved elements within the homologous sequences were aligned through the introduction of gaps using the program MALIGN (multiple alignment), version 2.8 (Wheeler and Gladstein, 1994), running with the score zero and quick parameters.Regions which could not be unambiguously aligned were excluded from the phylogenetic analyses which were carried out through parsimony or maximum likelihood methods using the program PAUP* (phylogenetic analysis using parsimony and other methods), version 4.0b4a of Swofford (2000), and branch-and-bound search (Handy and Denng, 1982).Decay index was calculated according to Bremer (1988), using the Treeroot 2.0 program of Sorenson (1999).

RESULTS AND DISCUSSION
Many current investigations within the Andropogoneae tribe utilize the ITS1 and ITS2 regions because they are known to incorporate changes at comparatively high rates (Hershkovitz and Zimmer, 1997;Baum et al. 1998).The analysis of the sugarcane dataset indicates that cluster SCCCLR1CO5E04.g (107274) contains 590 base pairs corresponding to the complete coding sequence of the ITS1-5.8S-ITS2region, which was utilized for the parsimony analyses summarized in Table II.
The position of sugarcane cultivars, represented by SCCCLR1CO5E04.g cluster, has been determined within the 23 closely related Andropogoneae taxa that have the ITS1-5.8S-ITS2region deposited in the GenBank database.The generated alignment was made up of 603 nucleotides and resulted in 33 most-parsimonious trees (MPTs) with length = 373, consistency index (CI) = 0.7131 and retention index (RI) = 0.7902.In order to minimize the interference from homoplastic characters, data were submitted to successive approximation using the PAUP* option for rescaled consistency index (RCI, Farris, 1969;Carpenter, 1988).This procedure increased the goodness-of-fit and resulted in one of the 33 MPTs (length = 227.94614,CI = 0.8175, RI = 0.8877) which shows SCCCLR1CO5E04.g cluster as the sister group of Saccharum sinense in a clade supported by a bootstrap index = 99 (Figure 1).A reasonable bootstrap index (88) was also obtained for the clustering of Saccharum robustum and Saccharum cultivar R46, but little support exists for the clustering of Saccharum barberi, Saccharum officinarum and Saccharum cultivar R48.Identical tree topology and comparable bootstrap indexes (not shown) were obtained for a single MPT (length = 47, CI = 0.9362, RI = 0.8235) derived from the analysis which included only Saccharum specimens and Miscanthus sinensis.The pairwise test applied to Saccharum and Miscanthus ITS1-5.8S-ITS2sequences showed few differ-  ences, from 1 to 5%, and the alignment of these sequences resulted in only 12 parsimony informative characters (Table II).By considering gaps as a fifth state, parsimony informative characters increased to 13, but this did not increase the support for Saccharum barberi, Saccharum officinarum and Saccharum cultivar R48 clade (not shown).In addition to bootstrap values, low decay index and distance values (Figure 1) indicate that tree nodes involving Saccharum species are supported by few mutations and that tree topology may be highly influenced by homoplasy.
In addition to parsimony, maximum-likelihood analysis was applied to the ITS1-5.8S-ITS2alignment.To render the analysis computationally tractable, only Saccharum and Miscanthus sinensis representatives have been considered.The most parameter-rich model (general time-reversible + proportion of invariant sites + rate heterogeneity modeled as a gamma distribution with six rate categories) was significantly more likely than the next best model, as determined by the likelihood ratio test.The selected model was applied to a starting tree that had been generated by neighbor-joining (Kimura 2-parameter distances) and tree bisection-reconnection (TBR) branch swapping.The single most likely tree (MLT) generated preserved clades which had high support in the single MPT, i.e. those involving SCCCLR1CO5E04.g cluster and Saccharum sinense or Saccharum robustum and cultivar R46.However, the MPT clade containing Saccharum cultivar R48, Saccharum officinarum and Saccharum barberi was not present in the MLT (Figure 1).These results reinforce the low confidence in the poorly supported relationships obtained for this clade in the MPT.
It is not easy to determine clearly the phylogenetic relationship between sugarcane and other members of the Saccharinae subtribe and we feel that it will be necessary to consider a larger number of consistent mutations than those accumulated by the ITS regions.To accomplish this, molecular analysis of Saccharinae should focus on data sets containing not only ITS markers but other markers with relatively high evolutionary rates, and use methods such as successive weightinh to minimize homoplastic noise.
We searched the SUCEST database to find fast evolving markers, the searches being directed at three categories of molecules, i.e. nuclear or mitochondrial ribosomal DNA and spacers, nuclear or mitochondrial protein coding genes related to carbohydrate metabolism and electron transport, and chloroplast genes.These searches resulted in the identification of 17 cDNA clusters (Table III).
Since single clusters were identified for every marker, a low polymorphism incidence was detected in sugarcane.While the finding of a unique gene is consistent with the absence of polymorphism, this result must be inter- Key: CI = Consistency index, RI = Retention index, PIC = Parsimony informative characters, RCI = rescaled consistency index (Farris, 1969;Carpenter, 1998), 5 th state = analysis of Saccharinae taxa considering gaps as a 5 th state.Poaceae sequences were retrieved from the GenBank database and have the following accession numbers, according the order in which they appear from the top to the bottom of the tree: AF345222, AF345230, AF345200, AF345241, AF345226, AF345239, AF019822, U04790, U04793, U04792, AF345217, AF345236, AF019824, AF345211, U04789, U04797, U04798, AF019819, U46626, U46637, U46603, AJ133707, AJ133708.
preted with special caution.Identification of a given marker from a cDNA library occurs as a function of its relative abundance, so that its absence may just be the result of a low transcription rate.Thus, to refine the results of the polymorphism investigation we carried out a BLASTN search for reads in the SUCEST database utilizing each of the 17 identified sugarcane clusters as queries.This procedure resulted in the identification of small gene fragments, which were found both as a single molecule or as domains inserted in a larger sequence.These fragments were mostly found in nuclear 18S ribosomal RNA (18S rRNA) and elongation factor alpha (EF-1 alpha), and it appears that their distribution in sugarcane cDNA sequences may ultimately prove to be useful in the study of genome expansion in the Saccharum L. group.
In order to evaluate the utility of the selected markers for evolutionary studies, we generated alignments using a given marker and its homologs from the Poaceae (Gramineae) family currently deposited in the GenBank database.The rates of evolution of each marker were estimated by de-termining the percentage of parsimony-informative characters (PIC) presented by each of the alignments.Table III shows that the ITS1-5.8S-ITS2alignment contained 136 PIC out of 577 characters, i.e. 24%.In addition, the alpha-tubulin alignment presented 21% of PIC, indicating that alpha-tubulin has an evolution rate comparable to that found in ITS1-5.8S-ITS2region.On the other hand, rpl16 and rpoC2 have faster evolution rates, since their alignments contained 49% and 59% of PIC respectively.Thus it is reasonable to assume that sequencing of alpha-tubulin, rpl16 and rpoC2 from Saccharinae specimens would add considerable information to the phylogenetic analysis of this subtribe.These findings make such molecules preferential markers for future studies on the systematic position of sugarcane cultivars within the Saccharinae subtribe.

Table I -
Molecular markers investigated in sugar cane and previous references to their utilization in phylogenetic studies.(-) indicates that the marker has not yet been used in a phylogenetic study.

Table III -
Phylogenetic information of sugarcane evolution markers.