The complete mitochondrial genome sequence of the black-capped capuchin (Cebus apella)

The phylogenetic relationships of primates have been extensively investigated, but key issues remain unresolved. Complete mitochondrial genome (mitogenome) data have many advantages in phylogenetic analyses, but such data are available for only 46 primate species. In this work, we determined the complete mitogenome sequence of the black-capped capuchin (Cebus apella). The genome was 16,538 bp in size and consisted of 13 protein-coding genes, 22 tRNAs, two rRNAs and a control region. The genome organization, nucleotide composition and codon usage did not differ significantly from those of other primates. The control region contained several distinct repeat motifs, including a putative termination-associated sequence (TAS) and several conserved sequence blocks (CSB-F, E, D, C, B and 1). Among the protein-coding genes, the COII gene had lower nonsynonymous and synonymous substitutions rates while the ATP8 and ND4 genes had higher rates. A phylogenetic analysis using Maximum likelihood and Bayesian methods and the complete mitogenome data for platyrrhine species confirmed the basal position of the Callicebinae and the sister relationship between Atelinae and Cebidae, as well as the sister relationship between Aotinae (Aotus) and Cebinae (Cebus/Saimiri) in Cebidae. These conclusions agreed with the most recent molecular phylogenetic investigations on primates. This work provides a framework for the use of complete mitogenome information in phylogenetic analyses of the Platyrrhini and primates in general.


Introduction
The phylogenetic analysis of primates has attracted much attention because of the potential for defining and understanding the processes that mold, shape and transform the human genome. Primate taxonomy based on morphological, adaptive, bio-geographical, reproductive and behavioral traits, with inferences from the fossil record (Goodman et al., 1998;Groves, 2001;Wilson and Reeder, 2005), is complex. Molecular genetic data have been used to infer many relationships among primate taxa (Hayasaka et al., 1988;Schneider et al., 1993;Moreira, 2002;Opazo et al., 2006;Hodgson et al., 2009;Wildman et al., 2009;Perelman et al., 2011). The phylogenetic relationships of Neotropical primates (Platyrrhini) in the suborder Haplorrhini are not as well understood as those of their closest relatives, the Old World monkeys and apes (Catarrhini) (Stewart and Disotell, 1998;Raaum et al., 2005). The monophyly of three major lineages (Atelidae, Cebidae and Pitheciidae) within the platyrrhines has been confirmed, but the relationships among these three groups have been difficult to resolve (Horovitz et al., 1998;Steiper and Ruvolo, 2003;Ray et al., 2005;Opazo et al., 2006;Schrago, 2007). The most recent studies support a basal division of the Pitheciidae and a sister relationship between Atelidae and Cebidae (Hodgson et al., 2009;Perelman et al., 2011). The relationships among different genera in the Atelidae and Pitheciidae are well-established (Opazo et al., 2006;Wildman et al., 2009;Perelman et al., 2011) whereas the results for the Cebidae are more controversial, and various possible relationships among the Aotinae, Callitrichinae and Cebinae have been inferred (Kay, 1990;Schneider et al., 1993;Moreira, 2002;Opazo et al., 2006;Hodgson et al., 2009;Wildman et al., 2009;Perelman et al., 2011). The most recent studies suggest a sister relationship between the Aotinae and Callitrichinae and a sister relationship between Aotinae/Callitrichinae and Cebinae (Cebus/Saimiri) (Hodgson et al., 2009;Perelman et al., 2011). The taxonomy of the genus Cebus is controversial, with different opinions on the classification of species within this group (Mittermeier and Coimbra-Filho, 1981;Groves, 2001;Silva Júnior, 2002); the phylogenetic relationships among species also remain unresolved, despite molecular and cytogenetic studies of this issue (Moreira, 2002;Amaral et al., 2008;Garcia-Cruz et al., 2011;Nieves et al., 2011;Perelman et al., 2011).
In molecular systematics, the topology structures of the phylogenetic trees vary with the molecular markers used and the number of taxa involved. A comparison of related data from different taxa can be helpful in clarifying controversial topologies. In recent years, the mitochondrial genome (mitogenome) has been widely used in phylogenetic studies because of its matrilineal inheritance, lack of extensive recombination and accelerated nucleotide substitution rates (Ingman et al., 2000;Zhang et al., 2008). A complete mitogenome contains more information on the evolutionary history of a species than individual genes, and the use of this genome in phylogenetic analyses reduces stochastic errors and minimizes the effect of homoplasy (Campbell and Lapointe, 2011). Mitogenomes can also be used as a source of molecular markers in conservation studies of endangered species (Krajewski et al., 2010;. To date, the complete mitogenomes of 46 primate species have been sequenced, including five platyrrhine species (Aotus lemurinus, Ateles belzebuth, Callicebus donacophilus, Cebus albifrons and Saimiri sciureus) (Arnason et al., 2000;Hodgson et al., 2009).
Cebus apella (Linnaeus, 1758), the tufted or blackcapped capuchin occurs only in South America (Colombia, Ecuador, Peru, Bolivia, Brazil, French Guiana, Suriname and Venezuela) (Fragaszy et al., 2004). C. apella is under severe pressure from hunting and habitat loss and fragmentation throughout its range, with a sharp decline in numbers in recent years; the species has been included in CITES-Appendix II since the 1970s (Nijman et al., 2011).
Although several molecular phylogenetic and cytogenetic studies of C. apella have been reported (Moreira, 2002;Amaral et al., 2008;Wildman et al., 2009;Nieves et al., 2011;Perelman et al., 2011), the complete mitogenome sequence of this species has not yet been described. In this work, we determined the complete nucleotide sequence of the C. apella mitogenome and determined its genomic structure. We also compared the rates of nonsynonymous (Kn) and synonymous (Ks) substitutions in protein-coding genes (PCGs) with those of Homo sapiens and five other platyrrhine species. A phylogenomic tree that included six platyrrhine species confirmed the usefulness of complete mitogenome sequences for molecular phylogenetic investigations.

Materials and Methods
Tissue samples and genomic DNA extraction Samples from one specimen of C. apella were collected from Kunming Zoo, Yunnan Province, China. The specimen was identified based on external characteristics, using the system of Groves (2001) and Nowak et al. (1999). Total genomic DNA was isolated from fresh muscle samples by using a Wizard Genomic DNA purification kit (Promega) according to the manufacturer's instructions.

Mitochondrial DNA amplification by PCR
The mitogenome was amplified by the polymerase chain reaction (PCR) technique. The entire mitogenome of C. apella was obtained by using 12 primer sets to amplify contiguous and overlapping segments (Table 1) (Sorenson et al., 1999;Ingman et al., 2000). All fragment sequences overlapped each other by at least 200 bp. PCR amplifications were done in a Mycycler Gradient thermocycler using a final volume of 50 mL that contained 20-50 ng of genomic DNA (0.5 mL), 2.5 mM of each dNTP (4 mL), 10 mM of each primer (1 mL), 5 mL of 10x buffer, 0.25 mL of Taq polymerase (5 U/mL; Takara) and 38.3 mL of sterile distilled water. The cycling parameters were: preliminary denaturation at 95°C for 5 min, followed by 35 cycles of denaturation at 95°C for 45 s, annealing at 55°C for 45 s, elongation at 72°C for 2 min, with a final elongation at 72°C for 10 min. The PCR products were electrophoresed on a 1.5% agarose gel and visualized by ultraviolet transillumination after staining with ethidium bromide. A negative control was included in each round of PCR to 546 Bi et al.

Sequence assembly, annotation and analysis
The DNASTAR software package (Lasergene version 5.0; Madison, WI) was used for sequence assembly and annotation. The borders of PCGs and rRNA genes were determined by aligning the sequences with those of C. albifrons (AJ309866) and H. sapiens (NC_001807) in GenBank. The boundaries and orientations of tRNAs were identified by tRNAscan-SE version 1.21 under default settings. The rates of Kn and Ks in PCGs were calculated using PAML version 4 (Yang, 2007). Pairwise distances of selected taxa were inferred from mitochondrial DNA (mtDNA) by MEGA version 5 (Tamura et al., 2011). The complete nucleotide sequence was submitted to GenBank under accession No. JN380205.
Phylogenomic analysis was done with a Maximum likelihood (ML) method implemented in MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001) and PhyML 3.0 (Guindon et al., 2010). The best fitting model of sequence evolution was obtained by using Modeltest version 3.7 (Posada and Crandall, 1998). The GTR+I+G model was selected as the model of best fit. For the Bayesian procedure, four independent MCMC chains were simultaneously run for 5,000,000 replicates by sampling one tree per 1000 replicates. We discarded the first 1250 trees as part of a burn-in procedure and used the remaining 3750 sampling trees (of which log likelihoods converged to stable values) to construct a 50% majority rule consensus tree. Two independent runs were used to provide additional confirmation of the convergence of the posterior probability distribution. In the ML analysis, a BIONJ tree was used as a starting tree to search for the ML tree with the GTR+I+G model. The robustness of the phylogenetic results was tested by bootstrap analysis with 1000 replicates.

Results and Discussion
Genome organization and composition The complete mitogenome of C. apella is a circular molecule with 16,538 bp (Figure 1). This size is intermediate to that of other primate mitogenomes (Matsui et al., 2009;Matsudaira and Ishida, 2010;Roos et al., 2011), with the longest being 17,149 bp (Eulemur macaco) (Matsui et al., 2009) and the shortest being 15,467 bp (Pygathrix nemaeus) . The number and arrangement of genes were the same as those of other primate mitogenomes (Boore, 1999). Table 2 shows the various features of the genome, together with the inferred start and stop codons. The complete mitogenome consisted of two rRNAs, 22 tRNAs, 13 PCGs and a control region. As expected, two rRNAs, 14 tRNAs and 12 PCGs were encoded on the H-strand, while the ND6 gene and eight tRNAs were encoded on the L-strand. Most of the PCGs were separated by one or more tRNAs.
The nucleotide composition of the C. apella mitogenome was biased towards adenine (A) and thymine (T), and the overall A+T content was 60.6%. There were seven regions in which genes overlapped by 70 bp and 15 intergenic spacer regions that were 57 bp in size.

Protein coding genes and codon usage
There were no significant differences in the sizes of the C. apella PCGs compared to other primates (Arnason et al., 1996). Most of the 13 PCGs began with an ATG start codon, except for ND2 and ND3 (ATT), COII and ND4L (GTG) and ND6 (TTA). Six PCGs (ND1 (TA), COII, COIII, ND2, ND3 and ND4 (T)) did not terminate with a complete stop codon triplet (Table 2). In mammalian mitogenomes, some peptide-coding genes end with T or TA rather than with a complete stop codon; in such cases, the terminal nt is contiguous with the 5' terminal nt of a tRNA gene (Wolstenholmde, 1992;Arnason et al., 1996)  complete stop codon to a complete one (Ojala et al., 1981;Boore, 2004).
There were 3788 codons in the 13 mitochondrial PCGs, excluding incomplete termination codons. A bias towards a greater proportion of A and T is a common feature of primate mitogenomes and results in a corresponding bias in the encoded amino acids (Arnason et al., 1996;Yu et al., 2011). The overall AT composition of PCGs in C. apella was 60.9% and the most frequently used amino acids were Leu (12.67%), Ile (10.32%), Ser (9.35%) and Thr (8.98%).

Ribosomal and transfer RNA genes
Like other primate mitogenomes, the C. apella genome contained small (12S) and large (16S) subunits of 548 Bi et al. rRNA (Yu et al., 2011). The 12S rRNA and 16S rRNA were located between the tRNA-Phe and tRNA-Leu (UUR) genes, separated by the tRNA-Val gene. The base compositions of the two rRNA genes were 23.8% T, 23.6% C, 35.7% A and 16.9% G, which generally agreed with the A+T-rich trend of the whole genome. Twenty-two tRNA genes were identified based on their respective anticodons and secondary structures, and ranged in size from 58 (tRNA-Val) to 96 (tRNA-Ser) nucleotides. All of the tRNAs folded into a canonical cloverleaf secondary structures, except for tRNA-Phe, tRNA-Met, tRNA-Thr, tRNA-Lys and tRNA-Trp. Gene sizes and anticodon nucleotides agreed with those described for other primates. However, G-G, A-C, and especially G-U wobbles and other atypical pairings were identified in the stem regions. These mutations appear to accumulate in mitochondrial genes, partly because mtDNA is not subject to recombinations that would facilitate the elimination of deleterious mutations (Li et al., 2007). The postulated tRNA cloverleaf structures generally contained 7 bp in the aminoacyl stem, 5 bp in the T-stem and the anticodon stem, and 4 bp in the D-stem. Some tRNAs, e.g., tRNA-Val and tRNA-Gly, lacked one or two bp in the T-stem or anticodon stem.

Non-coding regions
The major non-coding region, i.e., the control region, of C. apella was located between the tRNA-Pro and tRNA-Phe genes and contained 1,096 bp. This region can be divided into three domains based on the distribution of variable nucleotide positions and the differential frequencies of nucleotides (Wang et al., 2008). The control region shows marked variability across taxonomic groups and among related species, but its sequence elements related to regulatory functions are highly conserved (Cui et al., 2007). We annotated the regulatory domains of C. apella, H. sapiens and five other platyrrhine species (Figure 2). Termination-associated sequence (TAS) motifs that act as a signal to terminate synthesis of the control region were found at the 5' end of the first domain. There were five conserved sequence boxes (F, E, D, C and B) in the second domain. The third domain had a conserved sequence block (CSB-1) that is important in regulating mtDNA replication. There was less similarity between the sequences of H. sapiens and platyrrhine species.
The small non-coding region, a putative origin of light strand replication (O L ), was located in a cluster of tRNA-Trp, tRNA-Ala, tRNA-Asn, tRNA-Cys, and tRNA-Tyr (the WANCY region) and consisted of 34 nucleotides. This region could potentially fold into a stable stem-loop secondary structure with 10 bp in the stem and 14 bp in the loop. The conserved motif 5'-GCCCC-3' at the base of the stem within the tRNA-Cys gene has been associated with the transition from RNA synthesis to DNA synthesis.

Mitochondrial DNA variations in five platyrrhine species
The relative influence of natural selection can be determined by comparing the rates of Kn versus Ks in the coding region of protein genes (Jiang et al., 2009). The ND6 gene is generally excluded in such analyses because it is encoded by a different strand of the mtDNA and has a strikingly different nucleotide composition relative to other mitochondrial PCGs (Arnason et al., 1996). As shown in Table 4, the Kn/Ks ratios for all pairwise combinations varied among the 12 mtDNA PCGs, suggesting that differential selective constraints have acted on these parameters (Cui et al., 2007). The Kn/ Ks ratios of the ND1 and ATP8 synthase genes were higher than those of other genes, especially in the ATP8 synthase gene (Table 3). This finding suggests that the ND1 and ATP8 synthase genes may have evolved faster than the other 12 protein-coding genes. Sur-Mitochondrial genome of Cebus apella 549 prisingly, the COIII gene had a higher mutation rate than the COI and COII genes in C. apella; this is not present in other platyrrhine species (Table 3).

Phylogenomic relationships of six platyrrhine species
Mitochondrial sequences can be used not only to infer phylogenetic relationships and directly trace the evolution of gene rearrangements, but also to provide additional information for phylogenetic reconstructions (Braband et al., 2010). The molecular divergence and separation time of two C. apella populations have recently been investigated based on Cyt b sequence data (Casado et al., 2010); this study illustrates the value of mitochondrial sequences in ecological and conservation studies of primates. In contrast to the use of isolated or individual sequences, e.g., the Cyt b sequence, recent molecular studies have tended to use greater larger amounts of DNA data since this allows better tree resolution and provides better agreement with morphological studies (Zhang and Wake, 2009).
A phylogenetic analysis based on the complete mitogenome data from six platyrrhine species (A. lemurinus, A. belzebuth, C. donacophilus, C. albifrons, C. apella, and S. sciureus) in conjunction with ML and Bayesian methods supported the sister relationship between C. apella and C. albifrons (100%) and the sister relationship between the Aotinae (Aotus) and Cebinae (Cebus/Saimiri) (Figure 3). The results also confirmed the basal position of the Callicebidae and the sister relationship between the Atelinae and Cebidae (Aotinae/Cebinae). These findings were supported by pairwise distance analysis in which the distance between C. apella and C. albifrons was the closest while that between C. apella and C. donacophilus was the greatest, except for the outgroups (Table 4).
Our results generally agreed with the conclusions of recent molecular phylogenetic work on primates (Hodgson et al., 2009;Perelman et al., 2011), although only a few species were included in our tree and the resulting phylogenetic information was too limited. More complete mitogenome data are urgently needed to investigate the 550 Bi et al.   phylogeny of the Cebidae, Platyrrhini and primates in general. The sequence data presented here provide a contribution to this long-term goal.