Phylogeny of the Serrasalmidae (Characiformes) based on mitochondrial DNA sequences

Previous studies based on DNA sequences of mitochondrial (mt) rRNA genes showed three main groups within the subfamily Serrasalminae: (1) a “pacu” clade of herbivores (Colossoma, Mylossoma, Piaractus); (2) the “Myleus” clade (Myleus, Mylesinus, Tometes,Ossubtus); and (3) the “piranha” clade (Serrasalmus, Pygocentrus, Pygopristis, Pristobrycon, Catoprion, Metynnis). The genus Acnodonwas placed as the sister taxon of clade (2+3). However, poor resolution within each clade was obtained due to low levels of variation among rRNA gene sequences. Complete sequences of the hypervariable mtDNA control region for a total of 45 taxa, and additional sequences of 12S and 16S rRNA from a total of 74 taxa representing all genera in the family are now presented to address intragroup relationships. Control region sequences of several serrasalmid species exhibit tandem repeats of short motifs (12 to 33 bp) in the 3’ end of this region, accounting for substantial length variation. Bayesian inference and maximum parsimony analyses of these sequences identify the same groupings as before and provide further evidence to support the following observations: (a) Serrasalmusgouldingi and species of Pristobrycon (non-striolatus) form a monophyletic group that is the sister group to other species of Serrasalmusand Pygocentrus; (b) Catoprion, Pygopristis, and Pristobryconstriolatusform a well supported clade, sister to the group described above; (c) some taxa assigned to the genus Myloplus(M.asterias,Mtiete,Mternetzi,andMrubripinnis) form a well supported group whereas other Myloplusspecies remain with uncertain affinities (d) Mylesinus,Tometesand Myleussetigerform


Introduction
Piranhas and pacus (Serrasalmids) form a distinctive assemblage of characiform fishes. For a long time, they were considered a subfamily within the family Characidae. Recent phylogenetic studies of these fishes, however, strongly suggest that Characidae is non-monophyletic and that serrasalmids are not closely related to taxa originally placed in the subfamily Characinae, or other characid subfamilies (Zanata, 2000), but rather that they may be more closely related to Anostomoidea (Calcagnotto et al., 2005). All these arguments support the separate family status of piranhas and pacus; their relationships to other families within the order Characiformes, however, remain uncertain (Ortí and Meyer, 1997;Calcagnotto et al., 2005;Hubert et al., 2005). Species of the Serrasalmidae are endemic to the Neotropics and are distributed widely in all the major river systems of South America. At least 60 species (in 15 genera) have been recognized. This family includes the wellknown piranhas, notorious from accounts of their grouppredatory behavior, the seed-eating tambaquí, which is highly regarded as a food species, and the pacus. Several serrasalmid species are of economic importance and are used in aquaculture (Junk, 1984;Marshall, 1995;Araujo-Lima and Goulding, 1997).
Characteristic features of serrasalmids include a compressed body, a long dorsal fin with more than 16 rays and the presence of sharp serrae arising from modification of abdominal scales. The number of these serrae is variable, ranging from 6 to 9 in Acnodon to over 60 in Piaractus.
Serrasalmids occupy diverse habitats from lowland floodplains and flooded forests to upstream habitats in the headwater regions of river systems (Lowe-McConnel, 1975;Géry, 1977Géry, , 1984. They also display a range of trophic specializations, with three general feeding habits: carnivory, frugivory and lepidophagy (feeding on the scales of other fishes). Feeding habit is reflected in the mor-phology and patterns of dentition found among these taxa (Goulding, 1980). Carnivorous serrasalmids usually have one row of tricuspid teeth on each jaw, while frugivores have two series of incisor or molariform teeth on the premaxilla, one row of teeth on the dentaries, and often a pair of symphyseal teeth. The lepidophagous taxa have tuberculated teeth located on the outer side of the premaxilla that are used to remove scales from other fish. Not all species are specialists however, and their feeding habit varies with age and food availability (Nico and Taphorn, 1988;Winemiller, 1989;Leite and Jégu, 1990). The arrangement and morphology of teeth have been the main characters traditionally used in serrasalmid classification. Eigenmann (1915) erected the subfamilies Serrasalminae, containing six genera with one row of teeth on each jaw, and Mylinae, with nine genera having two rows of teeth on the premaxilla. The monotypic, lepidophagous genus Catoprion was included in the Mylinae. Classifications that followed also were based largely on dental morphology (Norman, 1929;Gosline, 1951;Géry 1977), and differed mainly in the assignment of ranks for some taxa (e.g. genera changed to subgenera).
In the first cladistic treatment of serrasalmid systematics, Machado-Allison (1983) inferred the presence of two lineages, labeled A and B (Figure 1a), which correspond to the Mylinae and Serrasalminae of Eigenmann, respectively, but including the genera Catoprion and Metynnis with the piranha clade. The first test of this hypothesis with molecular data (Ortí et al., 1996) used mitochondrial DNA (mtDNA) sequences, and recovered a phylogeny of the group containing three or four distinct lineages rather than two (Figure 1b) based on fragments of the 12S and 16S rRNA genes. Relatively low levels of sequence divergence among the rRNA genes, however, resulted in poor resolution within these groups, and a representative of the genus Pygopristis was not included in that study. The mtDNA data strongly suggested that Pristobrycon includes two components: Prystobrycon striolatus, closely allied to Catoprion, and the other species of Pristobrycon, more closely related to Serrasalmus and Pygocentrus. A recent phylogenetic study of species of Serrasalmus and Pygocentrus (Hubert et al., 2007) based on mitochondrial control region sequences provided higher resolution for this group. Within the "Myleus clade," mtDNA data (Ortí et al., 1996) were not able to resolve with confidence the relationships among the included taxa, but also did not support the monophyly of Myleus or the subspecies designations proposed by Géry (1972Géry ( , 1977: Myleus, Myloplus, Prosomyleus, and Paramyloplus. A morphological reassessment of elements included in Myleus 2002;Jégu et al., 2003) proposed the recognition of Myleus setiger (formerly Myleus pacu) as the only valid representative of the genus and moved the other components to the genus Myloplus (originally erected by Gill, 1896). We follow these taxonomic recommendations in this study.
Most recently, a multi-gene assessment of characiform phylogeny based on mitochondrial (16S and cytochrome b) and nuclear DNA (4 fragments) supported the distinctive groping of serrasalmids among charciforms (Calcagnotto et al., 2005). But since that study focused on higher-level relationships among characiforms, it included only 12 serrasalmid taxa, and obtained inconclusive results for within family relationships.
The current study aims to evaluate the previous findings with an extended data set, and also employ a more variable molecular marker to resolve relationships at the shallower nodes within each of the groups. The taxonomic sampling of the 12S and 16S mtDNA sequence data set is here extended from 34 to a total of 74 serrasalmid taxa (including Pygopristis). In an attempt to increase resolution among closely related species, 44 sequences from the mitochondrial control region (D-loop) representing all genera in 344 Ortí et al.  Ortí et al. (1996) based on partial mtDNA sequences of the 12S and 16S ribosomal RNA genes.
the family are employed. Albeit based on mtDNA sequence data only, this study represents the most comprehensive molecular systematic treatment of this group to date.

Taxon sampling
Representatives of all serrasalmid genera were sampled from their natural habitat and also obtained from commercial sources (aquarium trade). Several specimens per genus, and in some cases more than one specimen per species were used to confirm taxonomic identifications and also to control for intraspecific variation. Outgroup taxa were chosen from the Anostomoidea and Cynodontidae based on a recent analysis of characiform relationships that suggest a close relationship of these groups to serrsalmids (Calcagnotto et al., 2005). A complete list of taxa used for this study, their associated Genbank accession numbers, their source and (when present) voucher information are presented as Supplementary Material (Table S1).
All samples were sequenced using the BigDye Terminator cycle sequencing reaction kit (Applied Biosystems Inc.) on an automated DNA sequencer (Applied Biosystems 310 or an MJ Research Basestation) following manufacturer's instructions. All templates were sequenced completely in both directions. The nucleotide sequence data determined for the present paper were deposited in GenBank (see Supplementary Material, Table S1).

Phylogenetic analysis
All sequences were aligned with Clustal X (Thompson et al. 1997) using default parameters. Each fragment (12S, 16S, and D-loop) was aligned separately and the ribosomal gene alignments were subsequently verified using the secondary structure models described by Ortí et al. (1996). Alignment gaps that were inserted by ClustalX in putative stem regions that may imply disruption of hairpin structure were moved to contiguous loops or non-paired regions. D-loop sequences were compiled into two separate groups due to alignment ambiguities when alI sequences were aligned together. The two groups (the 'piranha clade' and the rest) differ substantially in total length for this fragment. Micro or minisatellite repeats within the variable control region were identified and excised from the sequences before the alignment. Indels for all resulting alignments were coded for phylogenetic analysis following the modified complex method described by Müller (2006) implemented in the program SeqState (Müller, 2005).
Alignments for each fragment were analyzed initially by the neighbor joining method (NJ: Saitou and Nei, 1987) to control for potential sequencing errors. Sequences that were found misplaced in the resulting tree were re-sequenced or eliminated from subsequent analyses. Given the degree of redundancy in taxonomic sampling, errors can be detected when sequences from putative congeneric or conspecific specimens are not placed together in the tree. Some exceptions to this procedure are discussed below. After the preliminary NJ analyses, the ribosomal 12S and 16S fragments were gauged for congruence in phylogenetic signal by the incongruence length difference (ILD) test (Farris et al., 1994;Farris et al., 1995), implemented as the partition homogeneity test in PAUP* 4.0 (Swofford, 2000). This test showed no significant difference among the two partitions, and the 12S and 16S sequences mtDNA Phylogeny phylogeny of Serrasalmids were concatenated for all further analyses. The D-loop data were compiled into 2 separate data sets, one for the 'piranha clade' and one for the remaining taxa of the family.
Tree searches were performed using PAUP* version 4.0b4a (Swofford, 2000), MrBayes 3.1.2 Ronquist, 2001, Ronquist andHuelsenbeck, 2003), and TreeFinder (Jobb, 2006). Maximum parsimony (MP) analyses performed in PAUP* used heuristic searches starting with stepwise addition trees and replicated 100 times, with each replicate starting with random input order of sequences. Branch swapping was performed by the tree-bisection-reconnection (TBR) method. The consistency index (CI) and the rescaled consistency index (RC) were computed for the best trees. Bootstrap values (BV) were used to estimate confidence in the resulting topology and were based on 100 replicates of heuristic search with starting trees obtained by stepwise addition. MP analyses were applied to the DNA sequence data alone or in combination with the coded indel characters. Modeltest 3.7 (Posada and Crandall, 1998) was used to determine the optimal model of nucleotide evolution for each data set. Maximum likelihood (ML) searches were performed with TreeFinder specifying the model determined by Modeltest. Bayesian inference (BI) was performed by running 4 MCMC chains simultaneously for 1 million generations, sampling every 100 steps (i.e., saving a total of 10,000 trees and parameter sets). MrBayes 3.1 by default runs two such MCMC chains simultaneously (Nruns = 2) and independently for each run, starting from different random trees. The value of 1 million generations was determined by examination of the average standard deviation of split frequencies (as they approach zero). At least two independent runs were performed to check for convergence. After each run, stationarity was verified by examination of the plot of generation versus the log probability of the data (the log likelihood values) and the burnin value was determined to summarize the results. This value was typically less than 2000 samples, but a conservative value of 5000 was usually chosen. The DNA data were analyzed under the 6-parameter model (Nst = 6) with invariant sites and rate variation (rates = invgamma). Indels were coded as a second partition for the Bayesian analyses under the Standard model of evolution allowing for among site rate variation (rates = gamma), and both partitions were unlinked for the analysis (unlink statefreq = (all) revmat = (all) shape = (all) pinvar = (all)), allowing each to have its own rate (prset applyto = (all) ratepr = variable). A consensus tree was computed using the sumt command and the posterior probabilities (PP) were obtained directly from the frequency of each partition among the post-burnin trees.

12S and 16S data
Mitochondrial rRNA sequences from a total of 74 serrasalmin taxa plus 9 outgroup species were collected (to-tal = 83). The total length of the combined 12S plus 16S alignment was 890 bp (347 bp of 12S and 543 bp of 16S). Length variation among sequences resulted in 58 additional indel characters coded by the modified complex method, 14 of which used a step matrix and the rest were unordered for MP analyses. None of the step matrices coded by SeqState was internally inconsistent. Of the total 948 characters, 580 were constant, 104 were variable but uninformative for parsimony, and 264 were informative. Pairwise sequence divergence ranged from 0 to 0.105 (uncorrected "p" or proportion of sites that differ) among the ingroup taxa, and from 0.055 to 0.144 between serrasalmids and the outgroup taxa. A total of 10 ingroup taxa were excluded from further analyses because their sequences were identical to another taxon that was included (see Supplementary Material, Table S1). Therefore a final data set of 73 taxa was used for phylogenetic inference.
Model parameters for the mtDNA sequence data suggest significant levels of among site rate variation (proportion of invariable sites = 0.51, and alpha = 0.56), typical of ribosomal DNA data. Indel characters also exhibited significant among-site rate variation (alpha = 1.0). The result obtained by Bayesian analysis (Figure 2) agrees with previous results based on mtDNA ( Figure 1b) and with the other inference methods used in this study. The Serrasalmidae form a distinct, strongly supported monophyletic group, containing three main clades: (1) a "pacu" clade, comprised of Colossoma, Mylossoma and Piaractus that is the sister group to the other serrasalmids, (2) the Myleus clade, containing Myleus, Mylesinus, Tometes and Ossubtus; and (3) the "piranha" clade, with the genera Serrasalmus, Pristobrycon, Pygocentrus, Pygopristis, Catoprion and Metynnis. The analyses are not conclusive with regard to the placement of Acnodon and also do not support the monophyly of the two species (A. normani and A. oligacanthus) included in the study. Results from ML analysis are almost identical to the BI tree ( Figure 2) differing only in the branching pattern with each major clade (mostly shown as polytomies in Figure 2). ML results also place A. oligacanthus within the Myleus clade, separate from A. normani (outside of the Myleus clade). Maximum parsimony analyses yielded 740 equally parsimonious trees (L = 1224, CI = 0.43, RC = 0.32), a strict consensus of which recovers the monophyly of the piranha and the pacu clades, but not of the Myleus clade. MP bootstrap analysis yields BV = 56 and 95 for these two clades, respectively, and no support for the Myleus clade. Interestingly, the monophyly of Acnodon was recovered in several (but less than 50%) of the 740 equally parsimonious trees. A posteriori reweighting of characters (based on the RC, Farris, 1969, Carpenter, 1988) reduced the number of MP trees to 81, a strict consensus of which shows the same relationships among the main groups that were obtained with BI ( Figure 2) but with a monophyletic Acnodon (A. normani + A. oligacanthus) as the sister group to the Myleus clade. 346 Ortí et al.
Relationships among taxa within these three groups are relatively well resolved for the pacu and piranha groups but not among taxa in the Myleus clade. Within the pacu group, the sister group relationship between Colossoma and Mylossoma is recovered by ML analysis, and supported by a PP = 0.98 (Figure 2), but it does not receive support from MP bootstrap analysis. The five species of Metynnis included in the study form a strong monophyletic group (PP = 1.0, BV = 92) and the genus is well supported (PP = 1.0 and BV = 78) as the sister group to all other taxa in the piranha clade (Figure 2). Pristobrycon striolatus forms a distinct taxon, branching off next in the piranha clade, and quite separate from the other putative Pristobrycon species (P. serrulatus and P. eigenmanni) that group tightly with Serrasalmus and Pygocentrus, the most derived group within the piranha clade. This result is robust in all methods of analysis and was reported before (Ortí et al.1996). The two specimens assigned to Pygopristis denticulatus used in this study do not form a monophyletic group in any analysis and their sequence divergence is 0.025 (uncorrected "p" value), a relatively large difference for an intraspecific comparison. As reported earlier (Orti et al., 1996) the relationships among species of Serrasalmus, Pygocentrus and Pristobrycon, are not resolved by the 12S and 16S sequence data.

Mitochondrial control region data
Complete mitochondrial control region sequences from 45 serrasalmid taxa were collected. In agreement with a priori expectations (e.g., Meyer, 1993), D-loop sequences display much higher levels of variation than the rRNA genes, with sequence divergences ranging from 0.017 to 0.256 (uncorrected "p" divergence) among serrasalmid taxa. However, this higher level of polymorphism also resulted in problematic sequence alignment. Up to about 500 bp of the 5' end of the control region (Domain I, Brown et al., 1986) contained tandem repeats in several taxa examined (Table 1). The repeated motifs ranged from 12 bp to 33 bp and some repeat patterns were imperfect (some repeats slightly different to the others). These motifs were repeated up to 17 times (in the case of Catoprion) and they accounted for most of the variation in length of the amplified fragments. Tandem repeat regions were not used for phylogenetic analyses because their homology among divergent serrasalmid taxa was impossible to assess. After excluding the repeated regions, the D-loop sequences were aligned separately for the "piranha" clade (18 taxa) and the other groups (27 taxa), resulting in alignment lengths of 1130 bp and 1180 bp, respectively. Indels were coded as above, resulting in the addition of 86 or 119 characters for each data set, respectively. Figure 3 shows the BI tree for the pacu and the Myleus clades based on the control region data. The data set analyzed consisted of 1299 total characters, of which 583 were constant, 214 were variable but parsimony-uninformative and 502 were parsimony-informative. MP analysis recov-mtDNA Phylogeny phylogeny of Serrasalmids 347   Figure 4 shows the BI tree for the piranha clade obtained with control region sequences. The data set analyzed consisted of 1216 total characters (1130 bp and 86 indel characters), of which 697 were constant, 189 were variable but parsimony-uninformative and 330 were parsimonyinformative. MP analysis recovered a single tree (L = 976, CI = 0.58, RC = 0.44) that is almost identical to the BI tree, differing only in the branching order among Serrasalmus, Pygocentrus and (non-striolatus) Pristobrycon taxa. MP bootstrap analysis results agree well with PPs obtained with BI ( Figure 4). The presence of three divergent groups of piranhas is well supported: (1) the genus Metynnis, (2) the Catoprion-Pygopristis-Pristobrycon striolatus group and (3) the Serrasalmus-Pygocentrus-group (Pristobrycon species other than Pristobrycon striolatus, such as P. serrulatus and P. eigenmanni, are here assigned to Serrasalmus). The control region data resolved with high confidence (PP = 1.0 and BV = 100) the relationships among Catoprion, P. striolatus, and Pygopristis that were not fully resolved in the 12S and 16S tree (Fig 2). Within the Serrasalmus clade, there is weak evidence (PP = 0.53) for affinities among Pygocentrus and one group of Serrasalmus species (S. manueli, S. maculatus, and S. rhombeus), and somewhat higher support (PP = 0.83 and BV = 54) for affinities among putative Pristobrycon (non-striolatus) with a different group of Serrasalmus species (S. gouldingi, S. serrulatus, and S. eigenmanni).

Discussion
This study represents the most complete molecular systematic treatment of serrasalmids to date. Building on 348 Ortí et al.  and indel characters (119 characters). The combined data were partitioned into two categories, each with its independent model (Nst = 6 and rates = pinvar for DNA, and the standard model with rates = gamma for indel characters, see text for more details). Numbers next to nodes are posterior probabilities followed by a dash and bootstrap values (BV) from the maximum parsimony analysis.
the previous study by Ortí et al. (1996), analyses presented here include additional taxa from all genera of the subfamily and add a new molecular marker (complete mtDNA control region sequences) to recover phylogenetic patterns or population structure among serrasalmids. As expected, the higher level of variation in the control region compared to the 12S and 16S mitochondrial rRNA genes provides better resolution of the relationships within each of the main clades (Figures 3 and 4). In agreement with previous estimates of rates of substitution-control region rates were found to range between 2.8 (Cann et al., 1984) to 5 times (Aquadro and Greenberg, 1983) the rate of the rest of the mtDNA genome-the higher rate observed for control region among serrasalmid taxa provided additional characters for phylogenetic inference of closely related species. The study also documents complex patterns of variation involving tandem repeats in the mitochondrial control region. This phenomenon has been generally accepted to account for size variation among vertebrate mitochondrial genomes (reviewed by Hoelzel, 1993, Rand, 1993 and has been was documented in fishes before (e.g. Bentzen et al., 1998).
Although based on a single molecular marker (mtDNA), the results of this study carry several taxonomic implications. Most notably, many of the generic designations in the family seem to lack support or are clearly contradicted by the data. Some of these conclusions are not new: Pristobrycon striolatus has previously been regarded as quite distinct from its congeners (Machado-Allison et al., 1989), differing in several morphological aspects and its well-supported grouping with Catoprion and Pygopristis is consistent with the finding of Ortí et al. (1996). Our present results confirm this observation and therefore we prefer to restrict Pristobrycon to the single species P. striolatus, and place all other taxa previously assigned to this genus in Serrasalmus. According to the classification of Géry (1977), the genus Serrasalmus contained the subgenera Pygopristis, Pristobrycon, Pygocentrus, Taddyella and the nominate subgenus Serrasalmus; Serrasalmus (Pristobrycon) striolatus was noted to resemble closely the subgenus Pygopristis. This observation is well supported by our molecular analysis of control region data, as this species forms a clade with Catoprion and Pygopristis (Figure 4), and is not closely related to the other specimen putatively assigned to Pristobrycon (#224 designated Serrasalmus serrulatus here) in the rRNA tree (Figure 2). Based on various morphological characters, Serrasalmus gouldingi is distinct from other members of the genus (Machado-Allison and Fink, 1996). In this analysis, it was found to be more closely related to the remaining Pristobrycon than it is to other species of Serrasalmus. This group containing S. gouldingi, S. eigenmanni and S. serrulatus is the sister group to the Serrasalmus-Pygocentrus clade. The genus Serrasalmus contains within it the genus Pygocentrus. Results from analysis of control region sequences of a dense taxonomic sampling for Serrasalmus and Pygocentrus provides strong evidence for the monophyly of Pygocentrus but its relationship to diverse components of Serrasalmus remains unresolved (Hubert et al., 2007). Some of the poor resolution obtained in our study is evidently the consequence of poor taxonomic sampling.
Some authors (e.g. Géry, 1977) have recognized the existence of four subgenera within Myleus, namely Myloplus, Paramyloplus, Prosomyleus and the nominate subgenus Myleus, within this genus. These subgeneric distinctions have been, as with all previous classifications, based primarily on dental morphology. Other authors, however, rejected these subgeneric distinctions due to the lack of autapomoprhies (Machado-Allison and Fink, 1995). The monophyly of subgenera within Myleus is not supported by analyses of mtDNA data. Analysis of the Myleus group reveals the polyphyly of the formerly designated genus Myleus and supports the taxonomic rearrangement proposed by Jégu and Dos Santos (2002) and Jégu et al. (2003), but relationships among the various components of this group remain tentative. The group formed by Myleus mtDNA Phylogeny phylogeny of Serrasalmids 349 Figure 4 -Phylogeny of the piranha clade, obtained with Bayesian inference based on control region mtDNA sequences (1216 total characters, 1130 bp and 86 indel characters). The combined data were partitioned into two categories, each with its independent model (Nst = 6 and rates = pinvar for DNA, and the standard model with rates = gamma for indel characters, see text for more details). Numbers next to nodes are posterior probabilities followed by a dash and bootstrap values (BV) from the maximum parsimony analysis.
setiger with Mylesinus and Tometes is relatively well-supported (PP = 1.00, BV = 67, Figure 3) suggesting strong affinities of Myleus with species designated to these genera. A robust group of Myloplus species (M. rubripinnis, M. asterias, M. tiete, and M. ternetzi) is also well supported by the control region data. As these analyses have shown, there are several taxonomic inconsistencies in this subfamily. While this study represents the most comprehensive molecular systematic treatment of this group, and utilizes a highly variable mtDNA marker to provide resolution of shallow nodes, placement of some taxa remains uncertain. In order to provide a strong foundation for taxonomic revision of the group, future studies would benefit from utilizing dense taxonomic sampling, nuclear gene sequences, together with mtDNA and morphological characters to help resolve some of these ambiguous relationships.