Phylogenetic relationships within Chamaecrista sect. Xerocalyx (Leguminosae, Caesalpinioideae) inferred from the cpDNA trnE-trnT intergenic spacer and nrDNA ITS sequences

Chamaecrista belongs to subtribe Cassiinae (Caesalpinioideae), and it comprises over 330 species, divided into six sections. The section Xerocalyx has been subjected to a profound taxonomic shuffling over the years. Therefore, we conducted a phylogenetic analysis using a cpDNA trnE-trnT intergenic spacer and nrDNA ITS/5.8S sequences from Cassiinae taxa, in an attempt to elucidate the relationships within this section from Chamaecrista. The tree topology was congruent between the two data sets studied in which the monophyly of the genus Chamaecrista was strongly supported. Our analyses reinforce that new sectional boundaries must be defined in the Chamaecrista genus, especially the inclusion of sections Caliciopsis and Xerocalyx in sect. Chamaecrista, considered here paraphyletic. The section Xerocalyx was strongly supported as monophyletic; however, the current data did not show C. ramosa (microphyllous) and C. desvauxii (macrophyllous) and their respective varieties in distinct clades, suggesting that speciation events are still ongoing in these specimens.

The genus Chamaecrista, formerly defined as Cassia subgenus Lasiorhegma (Irwin and Barneby, 1982) (Lewis, 2005). It has a significant ecological importance because it is the only genus within Cassiinae with concave extrafloral nectaries and roots bearing bacterial nodules (Irwin and Barneby, 1982).
The section Xerocalyx is easily recognizable and is distinguished by its parallel-nerved leaflets, strongly graduated and multistriate sepals and reduced chromosome number of 2n = 14. However, it has been subjected to considerable taxonomic reformulation (Irwin and Barneby, 1982). While it was included in the taxon Cassia, Irwin (1964) recognized 16 species within Xerocalyx, as defined by their morphological, chromosomal and chemical characteristics. Afterwards, based on an arbitrary classification of morphological characters, such as amplitude of foliage and length of petiole, Irwin and Barneby (1982) proposed a profound reorganization within Xerocalyx, recognizing only three species with 22 varieties. More recently, employing expressive organographic characteristics and corological aspects, Fernandes and Nunes (2005) rearranged this section into 10 species and 27 varieties.
Furthermore, Irwin and Barneby (1982) moved, with some confidence, several specimens of C. diphylla (L.) Greene, previously classified by Irwin (1964), to C. rotundifolia (Pers.) Greene. This demonstrates the occasional confusion between the identification of C. diphylla and C. rotundifolia in herbaria. The two species are similar in number and sometimes also the form of their leaflets, but they can be distinguished by the venation of their leaflets, the presence of petiolar glands, the strongly graduated and multistriate calyx-lobes, and by the decandrous androecium (Irwin and Barneby, 1982). These disagreements concerning the classification of Xerocalyx raise the question whether C. diphylla could be closer related to C. rotundifolia (sect. Chamaecrista) than to the other Xerocalyx members.
The taxonomic incongruence within Xerocalyx remains. Because most studies have been done by analysis of herbarium specimens that could not discern truly discrete units, it is worthwhile employing alternative tools to resolve this taxonomic instability. Recently, using sequence data from nuclear ITS and plastid trnL-F DNA spacers and representatives of all six sections of Chamaecrista, de Souza Conceição et al. (2009) analyzed the phylogeny of this genus and supported the monophyly of sect. Xerocalyx. However, the phylogenetic relationships within Xerocalyx were not discussed in detail.
To the best of our knowledge, we report in the present work the first sequences from the trnE UUC -trnT GGU intergenic spacer region (trnE-trnT) of the cpDNA for the subfamily Caesalpinioideae (Leguminosae). This intergenic spacer is located within the trnD GUC -trnT GGU region, which has relatively high rates of substitution compared to other chloroplast regions (Hahn, 2002;Shaw et al., 2005) and has been effectively used in phylogenetic studies at lower taxonomic levels (Friesen et al., 2000;Lu et al., 2001). We aimed to evaluate the usefulness of trnE-trnT spacer sequences to provide further information on the phylogeny of Cassiinae, focusing in the taxon Chamaecrista sect. Xerocalyx. However, the comparison of phylogenetic hypotheses derived from different sequences from both nuclear and chloroplast genomes is crucial to obtain additional resolution to represent true organismal relationships (Kuzoff et al., 1998). Therefore, we also obtained several sequences from the internal transcribed spacer (ITS)/5.8S region of the nuclear ribosomal DNA (nrDNA) cistron, which comprises the first spacer (ITS1), the 5.8S rRNA gene and the second spacer (ITS2), to further investigate whether these molecular characteristics infer the true relationships within Cassiinae.

Taxonomic sampling
Twelve specimens from the genus Chamaecrista were obtained, of which accessions from sect. Xerocalyx are the main target in the study. In order to also represent the phylogenetic diversity outside Chamaecrista, the ingroup also included other species from Cassiinae. Samples were collected from different locations of six states (Ceará, Piauí, Bahia, Tocantis, Goiás and Minas Gerais) from Brazil. Voucher specimens were deposited in the Herbarium Prisco Bezerra, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil. The list of taxa, locality data, voucher specimens, and GenBank accession numbers of the sequences are shown in Table 1. This study comprises two datasets, including the ITS/5.8S region of the nrDNA and the trnE-trnT complete intergenic spacer sequence from the cpDNA.
The cpDNA dataset included 12 specimens from Chamaecrista, represented by four of the six sections with emphasis on Xerocalyx, six species from Senna and one from Cassia. The tribe Cercideae has already been demonstrated to be the sister group of the remainder Leguminosae in molecular analyses (Bruneau et al., 2001;Kajita et al., 2001). Therefore, Bauhinia pentandra (Bong) Vog. ex. Steua (Cercideae) was selected as the outgroup for the trnE-trnT intergenic spacer analysis.

DNA extraction
Total genomic DNA was extracted from plant (leaf) material sampled from herbarium specimens. Samples (0.3 g) were ground in liquid nitrogen and digested for 1 h at 60°C in CTAB extraction buffer (2% w/v CTAB, 100 mM Tris-HCl, pH 8.0, 20 mM EDTA, 1.4 M NaCl, and 0.2% v/v 2-mercaptoethanol). Further processing of the samples was done as described by Foster and Twell (1996). DNA concentration was determined by measuring the absorbance at 260 nm (A 260 ) of a ten-fold dilution of each sample. The quality of all DNA preparations was checked by 0.8% agarose gel electrophoresis according to Sambrook et al. (1989).

PCR amplification and DNA sequencing
Amplification of the trnE UUC -trnT GGU intergenic spacer region of the cpDNA was performed using the primers trnET-F (5'-ATCGGATTTGAACCGATGAC-3') and trnET-R (5'-CCCAGGGGAAGTCGAATC-3'). These primers were designed based on the Lotus japonicus chloroplast genome sequence (GenBank accession number: NC_002694; Kato et al., 2000). For the internal transcribed spacers (ITS1 and ITS2) and the 5.8S rRNA coding region of the nrDNA, the primers ITS4 (5'-TCCTCCGCTTATT GATATGC-3') and ITS5 (5'-GCAAGTAAAAGTCGTA ACAAGA-3') were used, as suggested by Becerra and Venable (1999). Both amplification reactions were performed in a final volume of 25 mL containing: 800-1000 ng of genomic DNA (template); 20 mM Tris-HCl, pH 8.4; 50 mM KCl; 1.5 mM MgCl 2 ; 100 mM of each dATP, dCTP, dGTP, and dTTP (GE Healthcare Life Sciences, Piscataway, NJ, USA); 12.5 pmol of each primer; and 0.5 units of Taq DNA polymerase (GE Healthcare Life Sciences). PCR reactions were carried out in a MJ-Research (Watertown, MD, USA) PTC-200 thermocycler. For the trnE-trnT spacer, the cycling parameters included an initial denaturation step (4 min at 94°C) followed by 35 cycles of 1 min at 94°C, 1 min at 58°C for primer annealing, and 1 min and 30 s at 72°C for extension. The PCR cycling parameters for the amplification of the ITS/5.8S region com- 246 Phylogeny of Chamaecrista sect. Xerocalyx prised an initial denaturation step of 94°C for 4 min, followed by 35 cycles with 94°C for 1 min, 55°C for 1 min, and 72°C for 2 min. The last cycle for both reactions was followed by a final incubation step of 9 min at 72°C, and then the PCR products were stored at 4°C until used. Control samples containing all reaction components except DNA were always used to test that no self-amplification or DNA contamination occurred.
Once the specificity of the amplifications was confirmed, PCR products were purified from the remaining reactions using the GFX PCR DNA and Gel Band Purification kit (GE Healthcare Life Sciences). DNA sequencing was performed with the DYEnamic ET terminators cycle sequencing kit (GE Healthcare Life Sciences), following the protocol supplied by the manufacturer. Sequencing reactions were then analyzed in a MegaBACE 1000 automatic sequencer (GE Healthcare Life Sciences). Each PCR product was sequenced three times in both directions using the same primers employed in the amplification reaction. Sequencing of the ITS/5.8S region from several samples was not successful. The fact that most of DNA samples isolated were from herbarium specimens might explain this issue.

Sequence alignment and phylogenetic analyses
The quality of the DNA sequences was checked and overlapping fragments were assembled using the Phred/Phrap/Consed package Gordon et al., 1998). BLASTn searches (Zhang et al., 2000) were conducted in GenBank to detect potential contaminant sequences. For the trnE-trnT spacer, positions of coding and noncoding borders were determined by comparison with Lotus japonicus cpDNA sequence (NC_002694), while the ITS/5.8S regions were determined by comparison with the nrDNA sequence from Senna tora (FJ572046) using a method based on Hidden Markov Models (HMMs) to delimit the ITS2 region (Keller et al., 2009). Any uncertain base positions, generally located close to priming sites, were excluded from the phylogenetic analyses.
Assembled sequences with high quality (phred >20) comprising the two datasets mentioned above were separately aligned using ClustalX version 2.0.9 (Larkin et al., 2007), with default gap penalties, and manually corrected using the software BioEdit version 7.0.3 (Hall, 1999) to produce an alignment with the fewest number of changes (indels or nucleotide substitutions). Alignment files are available upon request to the corresponding author.
Phylogenetic analyses were performed independently for each dataset in PAUP* version 4.0b10 (Swofford, 2002) and MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001). Maximum parcimony (MP) analyses were conducted using heuristic searches with tree-bisection-reconnection (TBR) branch-swapping, ACCTRAN character optimization, and the Multrees option in effect, holding a maximum of ten most parcimonious trees per replicate of 500 random addition replicates in an attempt to sample multiple islands of most parsimonious trees. A maximum of 10,000 trees was allowed to accumulate, which is sufficient to capture topological variation (Sanderson and Doyle, 1993). In all phylogenetic analyses, characters were weighted equally and their state changes were treated as unordered. Indels were treated as missing data. Bootstrap support (BS) values for the optimal trees were calculated using 1,000 replicates with heuristic search settings identical to those for the original search.
The selection of the most suitable model for the Bayesian inferences was calculated using the Akaike Information Criterion (AIC) by MrModeltest, version 2.3 (Nylander, 2004), which presents several important advantages over other strategies of model selection (Posada and Buckley, 2004). Two independent analyses with five million generations were run to estimate parameters related to sequence evolution and likelihood probabilities using a Markov chain Monte Carlo (MCMC) method. Trees were collected every 100th generation. After removing 25% of the generations as burn-in, a 50% majority rule consensus tree was calculated to generate a posterior probability (PP) for each node. Trees generated were visualized by TreeView (Page, 1996). The proportion of variable sites and the GC content were calculated using the MEGA software, version 4.0 (Tamura et al., 2007).

DNA sequence characteristics
PCR amplification was not uniformly successful for all loci across the sampled taxa. While we were able to generate good quality sequences (phred >20) for the trnE-trnT intergenic spacer from all taxa, the ITS/5.8S region was not sequenced for several specimens under study. We hypothesize that, despite the multiple copies of the ITS/5.8S region presented in the nuclear genome, the method of DNA extraction chosen for herbarium specimens was not feasible to preserve good quality genomic DNA in contrast to chloroplast DNA.
The sequence characteristics for each DNA data set are summarized in Table 2. Complete ITS/5.8S sequences showed difficulties in alignment, mostly in the ITS1 region, among the three genera studied. The length of the ITS/5. 33.5% (Cassia fistula). The small GC content observed in the cpDNA alignment is mainly due to polyA or polyT regions. This same observation has already been noticed in Asteraceae trnD-trnT sequences (Shaw et al., 2005).
The ITS/5.8S dataset contained the highest ratio of parsimony informative sites (PIS) to aligned characters (60.5%), while the trnE-trnT intergenic spacer presented 7%. However, the ITS/5.8S region presented the lowest consistency and retention indexes. Values closer to 1 indicate a low amount of homoplasy. This convergent event has been interpreted as undesirable for phylogenetic data (Lyons-Weiler et al., 1996;Swofford et al., 1996), because one character may mislead the true branching history. Nevertheless, data that are homoplastic may still imply phylogenetic resolution, sometimes better than internally consistent data sets (Källersjö et al., 1999;Wenzel and Siddall, 1999).

Phylogenetic analyses
Descriptive values for the MP trees resulting from the two datasets studied are summarized in Table 2. The 50% majority rule consensus trees from the Bayesian analyses for the ITS/5.8S and trnE-trnT intergenic spacer data sets are shown in Figures 1 and 2, respectively. Both MP and Bayesian analyses were mostly congruent. The monophyly of the Chamaecrista genus is well supported and a sister relationship between Senna and Cassia was also observed, although none of the data sets gave a robust confidence value.
The monophyly of section Xerocalyx is strongly supported in all analyses (BS and PP above 98). Bayesian analysis of the ITS/5.8S fragment provided better resolution within Xerocalyx. However, none of the regions studied provided enough resolution to clearly resolve the relationships among the specimen varieties from C. ramosa (Vogel) H.S. Irwin & Barneby and C. desvauxii (Collad.) Killip (Figures 1 and 2).

Discussion
The genus Cassia s.l., formerly comprised of 600 species, was submitted to several taxonomic treatments that led to the segregation of this large genus into three taxa (Cassia s. str., Chamaecrista and Senna), which were fur-248 Phylogeny of Chamaecrista sect. Xerocalyx  ther ascribed to subtribe Cassiinae (Irwin andBarneby, 1981, 1982). This separation was further confirmed by floral development (Tucker, 1996) and phenetic studies (Boonkerd et al., 2005).
In the present study, it was observed that Senna and Cassia are monophyletic, corroborating previous molecular phylogenetic studies (Bruneau et al., 2001(Bruneau et al., , 2008Herendeen et al., 2003;Marazzi et al., 2006). The absence of taxon sampling outside Cassiinae did not allow us to make any conclusive remarks concerning the monophyly and generic relationships within the subtribe, although our results favor the sister relationship between Cassia and Senna (Bruneau et al., 2001(Bruneau et al., , 2008Marazzi et al., 2006) rather than between Chamaecrista and Senna (Doyle et al., 1997;Kajita et al., 2001;Herendeen et al., 2003;De-Paula and Oliveira, 2008).
Numerous peculiarities in the inflorescence structure (Tucker, 1996) and the presence of root nodules (Sprent, 2000) make Chamaecrista quite an interesting taxon within Cassieae. More recently, biochemical and genetic studies have been conducted in order to elucidate the variability within Chamaecrista (e.g. Conceição et al., 2008a, b;Costa et al., 2007;Silva et al., 2007;de Souza Conceição et al., 2009). The present study tried to elucidate some phylogenetic relationships within this genus, focusing on sect.

Xerocalyx.
Our Bayesian analysis of the ITS/5.8S sequences is highly congruent with previous results based on a combined dataset of ITS/5.8S and plastid trnL-F regions (de Souza Conceição et al., 2009)

in which sections
Apoucouita and Xerocalyx were supported as monophyletic while sections Absus and Chamaecrista were found to be paraphyletic. Moreover, C. calyciodes (sect. Caliciopsis) also appeared as a sister group of members of sect. Chamaecrista based on our ITS/5.8S dataset. C. calyciodes presents ambiguous characteristics relative to sections Chamaecrista and Xerocalyx (Irwin, 1964). As member of Caliciopsis, it resembles herbaceous specimens from sect. Chamaecrista in the morphology and chromosome number, while a resemblance to specimens from Xerocalyx is evident in the close parallel striate venation of the sepals. Whether the members from Caliciopsis evolved independently from the other two mentioned sections or represent a recombination of genetic material from both of them is unknown (Irwin and Barneby, 1982).
The sect. Chamaecrista comprises the largest number of species and is subdivided into six series. The controversial taxonomic classification of this group, probably due to an explosive evolutionary radiation, was pointed out earlier by Irwin and Barneby (1982). The less numerous ser. Flexuosae H.S. Irwin & Barnaby is represented here by C. flexuosa (L.) Greene, one very distinct species relative to the other American Chamaecrista, essentially by the presence of peculiar characteristics, like venulation of the leaflets, stem and angulate leaf-stalks (Irwin and Barneby, 1982). The basal node position of C. flexuosa relative to sect. Chamaecrista, Caliciopsis and Xerocalyx in the trnE-trnT spacer tree topology is congruent with previous work (de Souza Conceição et al., 2009), which might be a reflection of the morphological features mentioned above. However, this topology was not supported by our ITS/5.8S analyses.
As observed by de Souza Conceição et al. (2009), ser. Prostratae (Benth.) H.S. Irwin & Barnaby also appeared polyphyletic in our phylogenetic analyses. Furthermore, C. rotundifolia [ser. Bauhinianae (Collad.) H.S. Irwin & Barnaby] grouped with Prostratae taxa [C. trichopoda and C. pilosa (L.) Greene] with robust node support. A resemblance among members of the series Bauhinianae and Prostratae has already been suggested, despite determined differences, like the glandless petioles and the reduced pair of leaves observed only in Bauhinianae (Irwin and Barneby, 1982). Moreover, we suggest that the common features observed between C. diphylla and C. rotundifolia, such as the number and sometimes form of the leaflets, that causes confusion in herbarium specimens identification, cannot be regarded as a great evolutionary step. Thus, other characteristics, like the venation of the leaflets, presence of a petiolar gland and chromosome number (2n = 14), must be interpreted as synapomorphies for sect. Xerocalyx.
Sect. Xerocalyx has suffered continuous taxonomic reorganization. After its reformulation by Irwin and Barneby (1982), this section was considered as a macrospecies in which evolutionary processes are still under development to give rise to truly discrete units at the subgeneric level. It forms an extremely distinct type, segregated into three species, distinguished by the number (diphyllous: C. diphylla; tetraphyllous: C. ramosa and C. desvauxii) and Torres et al. 249 size of the leaflets (microphyllous: C. ramosa; macrophyllous: C. desvauxii). More recently, Fernandes and Nunes (2005) discussed the classification proposed by Irwin and Barneby (1982) and, considering them as extremely subjective, proposed the elevation of several varieties to the species level. The monophyly of Xerocalyx is strongly supported by our phylogenetic analyses. Considering the ITS/5.8S data set and the tetraphyllous group, the Bayesian analysis revealed two clades A and B (Figure 1) with robust node support. Clade B suggests that the size of the leaflets cannot be considered a truly discrete unit to distinguish C. ramosa and C. desvauxii specimens. However, only clade A was congruent with the MP analysis (weak branch support). In the plastid data set analyses only C. desvauxii var. glauca separated from the other tetraphyllous, remaining at the basis node. According to Fernandes and Nunes (2005), this C. desvauxii variety, well determined for its great size among Xerocalyx and for considerable morphological variation, like the leaflets and stipules glaucescent, should be recognized at the species status as C. latistipula. Moreover, considering ecological, geographical, morphological, reproductive and genetic data, Costa et al. (2007) proposed that two other varieties, C. desvauxii var. latistipula and C. desvauxii var. graminea, should be treated as distinct species. However, none of the trees analyzed had enough resolution in discriminating the microphyllous and the macrophyllous groups.
Another interesting feature found in sect. Xerocalyx is the paraphyletic relationship observed between C. desvauxii var. mollissima from Morro do Chapéu, BA, and Formoso, MG. The phylogenetic position of the specimen from Formoso, MG is clearly discriminated in Clade B (PP = 99); however, the phylogenetic resolution obtained for the other specimen was not clear. Although the results don't allow us to draw conclusive remarks, we suggest that cryptic species might have emerged within sect. Xerocalyx.
In the present work, our analyses reinforce the need for new sectional boundaries in the genus Chamaecrista, especially the inclusion of sections Caliciopsis and Xerocalyx in sect. Chamaecrista as suggested by de Souza Conceição et al. (2009). None of the trees analyzed showed the microphyllous and the macrophyllous groups as distinct clades. Thus, we hypothesize that speciation events are still ongoing in the tetraphyllous group, which is congruent with the macro-species hypothesis suggested by Irwin and Barneby (1982). On the other hand, it is premature to draw any final conclusions on species circumscriptions in the tetraphyllous complex. A more extensive revision and phylogenetic study of this group are necessary to further establish the current taxonomic shuffling involved in section Xerocalyx.