The complete mitochondrial genome of Corydoras nattereri ( Callichthyidae : Corydoradinae )

The complete mitogenome of Corydoras nattereri, a species of mailed catfishes from southeastern Brazil, was reconstructed using next-generation sequencing techniques. The mitogenome was assembled using mitochondrial transcripts from the liver transcriptomes of three individuals, and produced a circular DNA sequence of 16,557 nucleotides encoding 22 tRNA genes, two rRNA genes, 13 protein-coding genes and two noncoding control regions (D-loop, OrigL). Phylogeographic analysis of closely related sequences of Cytochrome Oxydase C subunit I (COI) demonstrates high diversity among morphologically similar populations of C. nattereri. Corydoras nattereri is nested within a complex of populations currently assigned to C. paleatus and C. ehrhardti. Analysis of mitogenome structure demonstrated that an insertion of 21 nucleotides between the ATPase subunit-6 and COIII genes may represent a phylogenetically informative character associated with the evolution of the Corydoradinae.


Introduction
Corydoradinae are a species-rich group of armored freshwater catfishes that inhabit streams, rivers and floodplains throughout South America (Alexandrou et al., 2011).Together with the Callichthyinae, they comprise the Callichthyidae, a family of catfishes diagnosed by the presence of two series of bony plates on the sides of the body and one pair of barbels at lips junction (Reis, 1998).The Corydoradinae comprises 227 nominal taxa and 188 valid species (Eschmeyer & Fong, 2015), assigned to the Aspidoras Ihering, 1907, Corydoras Lacépède, 1803, and Scleromystax Gunther, 1864(Britto, 2003).Corydoras is the most species-rich genus of catfishes with over 160 described and nearly as many undescribed species (Alexandrou et al., 2011;Eschmeyer & Fong, 2015;).According to Reis (2003) about two new species of Corydoras are described each year.While the Callichthyinae is relatively well-known based on morphological (Reis, 1997(Reis, , 1998(Reis, , 2003) ) and molecular studies (Mariguela et al., 2013), the Corydoradinae remains poorly known, despite their great interest to the aquarium hobby.Phylogenetic studies that included species of Corydoras have been performed, primarily by Britto (2003) based on morphological characters, and more recently by Alexandrou et al. (2011), based on molecular data.In the later study, however, a large proportion of the 52 taxa recognized could not be associated to a valid name, with many species being referenced to informal "C-Numbers" (Fuller & Evers, 2005) available from the aquarium industry or to their geographic origin.
Corydoras nattereri Steindachner, 1876, is a widespread species of Corydoras in southeastern Brazil, ranging from rio Mucuri, in Bahia, to the Paranaguá Bay, in Paraná (Britto, 2007;Shimabukuro-Dias et al., 2004a).Corydoras nattereri and Scleromystax prionotos (Nijssen & Isbrücker, 1980) form a pair of color mimics where their distribution overlaps (Alexandrou et al., 2011).The geographic range of C. nattereri represents a distributional range of about 1,350 km, encompassing numerous isolated coastal river drainages.With such widespread distribution it is not surprising that significant variability has been found among various populations of C. nattereri.Oliveira et al. (1990) identified three different cytotypes among these populations, that differ in number of chromosomes (2n numbers of 40, 43 and 44), suggesting that more than one species is represented by the taxon.That cytogenetic variation among populations was later correlated with variation in DNA sequence data (Simabukuro-Dias et al., 2004b).
Currently partial sequences of the Cytochrome Oxydase C subunit I (COI) of Corydoras nattereri, a widely used DNA barcode marker, are available publicly from only two localities (Pereira et al., 2011(Pereira et al., , 2013)).Sequences of additional mitochondrial genes have been made available by Alexandrou et al. (2011), but only specimens with imprecise locality data have been listed in that study.Within the Corydoradinae, complete mitochondrial sequence data is only available for C. rabauti from a specimen without associated locality data (Saitoh et al., 2003).Herein, we present the complete mitochondrial genome for C. nattereri based on three specimens with precise geographic provenance, thus providing a significant increment in DNA data available for phylogenetic studies of the Corydoradinae.

Material and Methods
Specimens of Corydoras nattereri were collected in the rio Suruí (22.600556 S, -43.091667W) at the Santo Aleixo district, Magé, Rio de Janeiro, Brazil.The fish were deposited at the Museu Nacional, Rio de Janeiro, UFRJ (MNRJ 41520) and tissues from tree individuals (MNTI 8664-8666) were preserved in ethanol and RNALater.Total RNA was extracted from the liver tissue following conventional phenolchloroform extraction.RNA quality was accessed using the RNA Nano kit for Bioanalyzer (Agilent).Three individual cDNA libraries were constructed using the TruSeq RNA Sample kit v.2 (Illumina).Libraries were accessed for quality (Bioanalyzer, DNA1000 kit, Agilent) and quantity (Kapa Biosystems).Two separated runs (single-end and paired-end) were performed in an Illumina HiSeq 2500 using the TrueSeq SBS kit v.3 (Illumina).Raw Illumina data were demultiplexed using the BCL2FASTQ software (Illumina).Reads were trimmed for Illumina adaptors by Trimmomatic (Bolger et al., 2014) and its quality was evaluated using FastQC (Babraham Bioinformatics).Only reads with Phred score over 30 were used for the transcriptome assembly.Cleaned reads from the three individual fish were used for the de novo assembly of transcriptomes using the default parameters of Trinity (v.2.0.2) (Haas et al., 2013).Mitochondrial genomes were assembled using the mitochondrial transcripts from the liver transcriptome, following the approach described by Moreira et al. (2015aMoreira et al. ( , 2015b)).Briefly, mitochondrial transcripts were retrieved running a BLASTN search against the mitogenome of the closest related species with a complete mitogenome available, Corydoras rabauti (GI: 29501080) (Saitoh et al., 2003).Mitochondrial transcripts were edited according to the information of strand orientation given by the BLASTN result, and aligned with SeaView using the built-in CLUSTAL alignment algorithm and the mitogenome of C. rabauti (Gouy et al., 2010).The sequence of each CONTIG was manually checked for inconsistencies and gaps.Small gaps at mitogenomes, which code for transfer RNAs, were completed with Sanger sequencing data from PCR with specific designed primers.As the identity of the three assembled mitogenomes was higher than 99.8%, the three sequences were concatenated in one consensus sequence.The consensus mitogenome was annotated using the web-based services MitoFish and MITOS (Iwasaki et al., 2013;Bernt et al., 2013), and the origin of the L-strand replication was identified based on Wong et al. (1983).In order to determine sequencing depth of each base in the mitogenome, Bowtie v. 1.0.0 was used to align the reads on the assembled mitogenome.The aligned reads were viewed using Integrated Genome Viewer (IGV), Tablet (Langmead et al., 2009;Milne et al., 2010;Robinson et al., 2011;Thorvaldsdóttir et al., 2012), and Geneious version 6 (http://www.geneious.com,Kearse et al., 2012).
A Maximum Likelihood (ML) analysis of relationships was performed to position the mitogenomes within a taxonomic and phylogenetic context.All publicly available sequences of COI that exhibited nucleotide identity greater than 90% in a BLAST search of the GenBank Nucleotide Database and Bold Systems were included in the analysis, as well as additional sequences of Corydoras nattereri, Callichthys callichthys, Scleromystax barbatus, and Aspidoras lakoi (the latter three used as outgroups) produced in the Museu Nacional (MNLM) laboratory (Table 1) using Sanger sequencing methods, and the corresponding COI sequence extracted from the C. rabauti mitochondrial genome (GenBank Accession AB054128).To ensure uniformity of coverage, only nucleotides from positions 58 to 699, and only sequences with full coverage of that segment were included in the analysis, producing a matrix of nucleotide sequences from 35 fish.The ML analysis was performed using Mega 6.06 (Tamura et al., 2013) under the Hasegawa-Kishino-Yano model (Hasegawa et al., 1985), selected by the corrected Akaike information criterion using jModeltest v.2.1.7 (Darriba et al., 2012).Initial tree(s) for the heuristic search were obtained by Neighbor-Join and BioNJ algorithms applied to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach.The discrete Gamma distribution was used to model evolutionary differences among sites, with 5 categories (+G, parameter = 0.1563).Branch support was estimated with the Bootstrap method, using 350 replications.The tree was rooted using Callichthys callichthys as outgroup.

Results
Sequencing depth.The complete mitochondrial genome sequence of C. nattereri is 16,557 bp long (GenBank Accessions No. KT239008, KT239009, KT239010, Fig. 1).A total of 152,877,464 100bp reads were used to assemble the transcriptomes.On average, 12 transcripts were aligned to the reference mitogenome.These aligned transcripts were used to assembly the mitogenome of Corydoras nattereri, which was sequenced with an average coverage depth of 8,194 and a total of 1,378,370 aligned reads (Fig. 2).The sequencing depth varied greatly along the mitogenome sequence, from as low as of 1 read, to as high as 69,778 reads.Cytochrome oxidase subunits I, II and III were the proteincoding genes with the highest number of reads.Regions with lower sequencing depth tend to code for tRNA.Five small gaps, varying from 21 to 183 nucleotides, were filled by conventional PCR and Sanger sequencing (Table 2).
Table 2. Positions and lengths of the five gaps in the mitogenome assembled using mitochondrial transcripts sequenced using Illumina HiSeq2500.These gaps were filled using conventional PCR and Sanger sequencing with the species-specific primers listed.3).Black vertical bars indicate position of gaps that were fi lled with Sanger sequencing (Table 2).Reads were mapped to the mitogenomes using Bowtie and visualized at the Integrative Genome Viewer (IGV, Bernt et al., 2013;Thorvaldsdóttir et al., 2013).Genome organization.The complete mitochondrial genome sequence of Corydoras nattereri contains the typical vertebrate features: 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes and two noncoding control regions (D-loop, OrigL) (Table 3, (Pereira et al., 2011(Pereira et al., , 2013)).Contrasting with the high similarity of the samples from these three basins, our samples from the rio Aldeia Velha (rio São João coastal basin), and rio Itaúnas are considerably different.The latter are morphologically similar to C. nattereri, but their sequences are 2.5%-3.6%divergent in relation to the rio Surui samples.Such high divergence suggests that the populations of Corydoras from the São João and Itaúnas river basins represent cryptic species.
Our phylogenetic analysis also demonstrates that the C. nattereri clade is nested within a large clade of samples identified in the literature (Pereira et al., 2011(Pereira et al., , 2013;;Rosso et al., 2012) as C. paleatus and C. ehrhardti.Samples of C. ehrhardti (including new sequences produced here) form a monophyletic clade also included among this larger clade, but samples of Corydoras paleatus form a complex of non-monophyletic populations.Within this large complex, samples of C. paleatus GU701809, GU701810, GU701812, GU701813, and GU701871, from the upper rio Paraná basin, form the monophyletic subunit most closely related to C. nattereri, but the bootstrap value for this sister group relationship is low, indicating that further study of C. paleatus species complex is still necessary.Fig. 3. Maximum likelihood tree (log likelihood = -2410.0522) of Corydoras samples with at least 90% similarity to the mitochondrial cytochrome oxidase I sequences of C. nattereri from the rio Suruí.The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.Bootstrap robustness is indicated next to selected branches.Samples of C. nattereri have the locality name appended to the sample ID (those for which mitogenomes were produced are from de rio "Surui"), outgroups have the genus name and other samples of Corydoras have the species epithet name appended to the ID code (Table 1).

Discussion
Despite the great diversity of Corydoras, this is the first mitogenome with voucher specimens sampled in their native habitat and deposited in a permanent collection.A mitogenome of C. rabauti has been reported by Saitoh et al. (2003), but that study was based on a specimen without locality data obtained from the aquarium trade, and the study does not mention a registration number of a voucher specimen in any biological collection.
Our analysis of COI sequences revealed high levels of genetic divergence and taxonomic complexity among samples closely related to C. nattereri.Specimens from the São João and Itaúnas river basins may represent cryptic species, that are currently undistinguishable from topotype specimens of C. nattereri.Their level of divergency (2.5% -3.6%) far exceeds the maximum intraspecific divergence (1.6%) reported among six species of Corydoras from the upper Paraná basin (Pereira et al., 2013).The analysis also demonstrated the complexity of relationships among populations of C. nattereri, C. paleatus, and C. ehrhardti.Within this context, our newly produced mitogenome C. nattereri is likely to provide a solid base to identify additional mitochondrial markers to be used in future studies designed to clarify the relationships among these and other callichthyid taxa.
Comparison of the mitogenome of Corydoras nattereri with that of C. rabauti (Saitoh et al., 2003) reveals features that are likely to be phylogenetically informative and useful in future studies.A 21-nucleotide insertion sequence between the ATPase subunit-6 and COIII genes was found in C. nattereri.This insertion corresponds to a 17-nucleotide insertion previously detected in C. rabauti (Saitoh et al., 2003).Most vertebrate mitochondrial genomes have a headto-tail junction between the ATPase subunit-6 and COIII genes, and this insertion was considered phylogenetically uninformative in the study of Saitoh et al. (2003).Our discovery of an insertion in the homologous position of C. nattereri is interpreted as an apomorphic trait shared by the two species.Further investigation about the distribution and length of this insertion among Corydoradinae is likely to yield significant insights about the phylogeny of the group.

Fig. 1 .
Fig. 1.Circular representation of the mitochondrial genome of Corydoras nattereri.Genes encoded in the heavy strand are shown in the outer circle and genes encoded in the light strand are offset inwards.The inner circle represents the CG-content.Figure was generated by the online server MitoFish, http://mitofish.aori.u-tokyo.ac.jp (Iwasaki et al., 2012).

Fig. 2 .
Fig. 2. Sequencing depth over the complete mitogenomes of the three individuals of Corydoras nattereri: KT239008 (A), KT239009 (B), and KT239010 (C).Read counts (y-axis) are shown in logarithmic scale and sharp decreases correspond to the punctuation model of mitochondrial transcription (positions correspond to those shown in Fig. 1 and Table3).Black vertical bars indicate position of gaps that were fi lled with Sanger sequencing (Table2).Reads were mapped to the mitogenomes using Bowtie and visualized at the Integrative Genome Viewer (IGV,Bernt et al., 2013; Thorvaldsdóttir et al., 2013).
Fig.2).The majority of genes are encoded on the heavy strand, whereas ND6 and eight tRNAs are found on the light strand.All protein-coding genes used ATG start codons, except for COI that used GTG.Seven protein-coding genes are terminated with the complete stop codon, of which five ended with TAA (ATP8, ATP6, ND4L, ND5 and ND6), one with TAG (ND1) and one with AGG (COI).The remaining protein-coding genes are ended by incomplete stop codons, T (COII, COIII, ND2, ND3, ND4 and Cytb), which are completed by posttranscriptional polyadenylation.The 12S and the 16S rRNA genes are separated by tRNA-Val gene and their lengths are 945 and 1,667 bp, respectively.The 22 tRNA genes had sizes ranging from 67 to 75 nucleotides and the control region, located between tRNAPro and tRNAPhe genes, is 939 bp long.A 21-nucleotide insertion sequence between the ATPase subunit-6 and COIII genes was found in C. nattereri.The nucleotide composition for the heavy strand was 32.3% A, 25.7%T, 15.1% G, and 27.0% C. Phylogenetic context.The phylogeographic analysis of publically available COI together with additional COI sequences generated by our research group confirmed our identification of the samples of the rio Suruí as Corydoras nattereri (Fig. 3).Our three samples form a monophyletic group with specimens of C. nattereri from the Paraíba do Sul and the Paraitinguinha rivers (upper Tietê drainage, upper Paraná basin)

Table 1 .
List of samples used in this study, ordered according to GenBank Accession codes.Voucher museum catalog number is provided only for specimens that are being made available for the first time.

Table 3 .
Positioning of genes in the mitochondrial genome of Corydoras nattereri.Negative gap values indicate overlap.