The complete mitochondrial genome of the small yellow croaker and partitioned Bayesian analysis of Sciaenidae fish phylogeny.

To understand the phylogenetic position of Larimichthys polyactis within the family Sciaenidae and the phylogeny of this family, the organization of the mitochondrial genome of small yellow croaker was determined herein. The complete, 16,470 bp long, mitochondrial genome contains 37 mitochondrial genes (13 protein-coding, 2 ribosomal RNA and 22 transfer RNA genes), as well as a control region (CR), as in other bony fishes. Comparative analysis of initiation/termination codon usage in mitochondrial protein-coding genes of Percoidei species, indicated that COI in Sciaenidae entails an ATG/AGA codon usage different from other Percoidei fishes, where absence of a typical conserved domain or motif in the control regions is common. Partitioned Bayesian analysis of 618 bp of COI sequences data were used to infer the phylogenetic relationships within the family Sciaenidae. An improvement in harmonic mean -lnL was observed when specific models and parameter estimates were assumed for partitions of the total data. The phylogenetic analyses did not support the monophyly of Otolithes, Argyrosomus, and Argyrosominae. L. polyactis was found to be most closely related to Collichthys niveatus, whereby, according to molecular systematics studies, the relationships within the subfamily Pseudosciaenidae should be reconsidered.

The small yellow croaker, Larimichthys polyactis, a very popular fish among consumers, is one of the most important commercial benthopelagic fishes in China and Korea.The species is extensively distributed in the Bohai, Yellow and East China Seas, global landing having reached 320 thousand metric tons in 2000 (Seikai National Fisheries Research Institute, 2001).Nevertheless, intense fishing has deflated catches in the Yellow and East China Seas, to such an extent that resources are now considered overexploited (Lin et al., 2008).In the past, the focus has been on catch statistics, size composition, early life history and feeding habits (Xue et al., 2004;Yan et al., 2006;Wan and Sun, 2006).However, over recent years, the genetic characteristics of the small yellow croaker have also come under investigation by means of molecular markers (Meng et al., 2003;Lin et al., 2009;Xiao et al., 2009), although, information is still limited, and the complete mitochondrial genome sequence, critical in studies of genome evolution and species phylogeny, continues unavailable.Thus, much additional work is required to furnish important genetic and evolutionary data, essential for species conservation, management and phylogenetic analysis.
Sciaenidae is a diverse and commercially important family, comprising 68 genera and about 311 species (Nelson, 2006).Despite specific studies of morphological and molecular phylogeny, instabilities of the phylogenetic relationships within the group have not, as yet, been resolved, the methods employed in previous research on phylogenetic reconstruction based on molecular data having been, to some extent, empirical and simple.The non-implementation of advanced methods, such as ML or BI, best-fit evolutionary models for specific data, and the statistical testing of the different topologies derived from the same data matrix, has probably contributed to mismodeling and systematic error in analysis.Mismodeling itself commonly occurs when a single model, inappropriate for multiple-gene regions involving several models of evolution, as stem and loop in RNA, or codon positions in protein genes, is employed for those multiple data partitions best explained by separate models of DNA evolution.Another form of mismodeling is the case,when multiple data partitions, defined by the same general model, differ drastically in the specific model parameter estimates that maximize the likelihood score (Reeder, 2003;Brandley et al., 2005).An overall solution would be to apply appreciate models and their specified parameter estimates to each data partition, and subsequently incorporate this into a single ML tree-search (Yang, 1996).Methods for reconstructing phylogeny, based on partitioned data using Bayesian/Markov chain Monte Carlo (MCMC) methods, are now available (Huelsenbeck and Ronquist, 2001;Nylander et al., 2004).Since it more accurately models the data, the use of partitionspecific modeling -in other words, partitioned Bayesian analysis -should reduce systematic error, thereby resulting in better likelihood scores and more accurate posterior probability estimates (Brandley et al., 2005).
In the present study, pre-defined complete mitochondrial genome sequences of the small yellow croaker were compared with those reported for Percoidei species, whereupon partitioned Bayesian analysis was applied to infer the phylogeny of Sciaenidae fishes.

Materials and Methods
Fish sample and DNA extraction L. polyactis individuals were collected by trawling in the Zhoushan fishing grounds, East China Sea (Zhejiang Province, China).They were identified by morphology.Muscle tissue was removed and immediately preserved at -80 °C.Total DNA was extracted as per standard phenol-chloroform method (Sambrook and Russell, 2001) and visualized on 1.0% agrarose gels.

PCR amplification and sequencing
As shown in Table S1, 11 sets of primers that amplify contiguous, overlapping segments of the complete mitochondrial genome of L. polyactis, were used.The primers were designed from reported complete mitochondrial genome sequences for the large yellow croaker (Cui et al., 2009).Worthy of note, these primers are also very useful for amplifying the mitochondrial genomes of two other Sciaenidae species, Miichthys miiuy and Collichthys lucidus.Of the 11 pairs of primers, six (1F/R, 4F/R, 6F/R, 7F/R, 8F/R, and 10F/R) are capable of perfectly amplifying the mitochondrial genomes of both species.The total length of amplified products is approximately 12,300 bp.PCR assays were carried out in a final volume of 50 mL, each containing 5.0 mL of a 10xTaq Plus polymerase buffer, 0.2 mM of dNTP, 0.2 mM of the forward and reverse primers, 2 units of Taq Plus DNA polymerase with proof-reading characteristic (TIANGEN, Beijing, China), and 1 mL of DNA template.Cycling conditions were 94 °C for 4 min, followed by 35 cycles of 94 °C for 50 s, 60 °C for 60 s and 72 °C for 2-3 min, followed by 1 cycle of 72 °C for 10 min, PCR was performed on a PTC-200 thermocycler (MJ Research, USA).The resultant PCR products, first electrophoresed on a 1% agarose gel to check integrity, were then visualized by the Molecular Imager Gel Doc XR system (BioRad), followed by purification using a QIAEX II Gel Extraction Kit (Qiagen).The purified fragments were ligated into PMD18-T vectors (Takara, Japan) used to transform to TOP10 E. coli cells, according to standard protocol.Positive clones were screened via PCR with M13+/-primers.Amplicons were sequenced using an ABI 3730 automated sequencer (Applied Biosystems) with M13+/primers.

Sequence analysis
The sequence fragments so obtained were edited in the Seqmen program (DNAstar, Madison, WI, USA) for contig assembly to obtain a complete mitochondrial genome sequence.Annotation of protein-coding and ribosomal RNA genes, and definition of their respective gene boundaries were carried out with DOGMA software (Wyman et al., 2004) through reference sequences of Percoidei, available in GenBank.tRNA genes and their secondary structures, were identified by means of tRNAscan-SE 1.21 software (Lowe and Eddy, 1997).Base composition, genetic distances, and codon usage were calculated in MEGA 4.0 software (Tamura et al., 2007).Putative O L and CR, and conserved motifs were identified via sequence homology and proposed secondary structure.The complete mitochondrial genome sequence, deposited in the GenBank database, can be accessed through Accession Number GU586227.

Phylogenetic analysis
Multiple alignments of the COI sequences from 30 Sciaenidae species were performed using the MEGA 4.1 version.The data matrix of COI sequences was partitioned by codon position (Table 1).The appropriate model of sequence evolution for each partition (Table 2) was determined using the jModeltest program (Guindon and Gascuel, 2003;Posada, 2008), under the Akaike Information Criterion (AIC).Bayesian analysis for each data partitioning strategy consisted of two separate runs with four Markov chains in the MrBayes3.1 program (Huelsenbeck and Ronquist, 2001).Each run, comprising 10 million gen-192 Cheng et al. erations, was systematically sampled at every 100.The first 25% of the trees were discarded as part of a burn-in procedure, and the remainder employed in constructing a 50% majority rule consensus tree.The results for each partitioning strategy were then compared using the Bayes factor as an aid in accepting the best phylogeny hypothesis for the sequence data matrix.

Results and Discussion
Gene content, arrangement and base composition The complete mitochondrial genome of L. polyactiswas 16, 470 bp long (Table 3), which is similar to those of not only teleost species but also terrestrial vertebrates.Its gene content conforms to the vertebrate consensus, by containing the highly conserved set of 37 genes encoding 2 ribosomal RNAs (rRNAs), 22 transfer RNAs (tRNAs) and 13 proteins, that are essential in mitochondrion respiration and adenosine triphosphate (ATP) production.Although, as in other vertebrates, most of the genes are encoded on the H-strand, besides ND6 and eight tRNAs (Glu, Ala, Asn, Cys, Tyr, Ser-UCN, Gln, and Pro) being also encoded on the L-strand, the genes are all similar in length to those of bony fishes (Oh et al., 2007(Oh et al., , 2008)).As reported in other vertebrates, there are four notable 'overlaps' between genes, as reported in other vertebrates, with the lengths of these overlaps are generally being fixed.Whereas ATPase 8 and ATPase 6 overlap by 10 bp, overlapping was also observed between ND4L and ND4 (seven bp), ND5 and ND6 (four bp), and ATPase 6 and COIII (one bp).The remainders are located between tRNA genes themselves, and between tRNA and protein-coding genes.The sizes of non-coding spacers range from 1 to 37 bp (Table 3).The largest of these, located between tRNA-Asn and tRNA-Cys, was recognized as the putative replicate origin of the L-strand|s.These non-coding spacers are interesting in the study of mtDNA evolutionary mechanisms.The base composition of L. polyactis was analyzed separately for rRNA, tRNA, and protein-coding genes (Table S2).In the latter, pronounced anti-G bias was observed at the third codon positions (8.5%), which are free from selective constraints on nucleotide substitution.Besides the A+T composition of the second codon position being relatively higher than in most Percoidei fishes, pyrimidines were over-represented in this position (61.5%).Already observed in other vertebrate mitochondrial genomes, this has been attributed to the hydrophobic character of the proteins (Naylor et al., 1995).L. polyactis tRNA genes are A+T rich (54.5%), as in other vertebrates, whereas rRNAs are A+C rich (59.3%), as in other bony fishes (Zardoya and Meyer, 1997;Cheng et al., 2010).

Protein-coding genes
As expected, 13 large open-reading frames were detected in the mitochondrial genome of L. polyactis.The T:C:A:G base composition of the mitochondrial 13 protein-coding-gene sequence, 26.9: 32.4: 24.8: 15.9, is summarized in Table S2.Bias of nucleotide frequencies is strand specific (Broughton and Reneau, 2006).Moreover, in contrast to H-stranded genes, in L-stranded ND6 genes, the most prominent anti-C bias is at the third position (7.5%).The lengths of 13 protein-coding genes of L. polyactis mitochondrial DNA were compared with the corresponding sequences of other Percoidei species, whereat it was found that these are conserved.Moreover, there is almost no variation among species.By comparing predicted initiation and termination codons of the 13 protein-coding genes among 23 percoidei species (Table S3), it was apparent that most use ATG as the initiation codon (92.6%), with GTG in second place.In a few species, initiation codons are even absent (shown by "?").The situation in termination codons is also similar.TAA, TAG and incomplete TA-or T--are commonly used, but AGA and AGG rarely so.This condition is apparently common among vertebrate mitochondrial genome, and it also appears that TAA stop codons are created via posttranscriptional polyadenylation (Ojala et al., 1981).Furthermore, each protein has its preferable initiation and termination codons, although there are exceptions.For example, in most species, COII, COIII, Cytb, ND4L and ND4 choose ATG/T--, ATG/TA-, ATG/T--, ATG/TAA and ATG/T--, respectively.But in COI there are two types of initiation/termination codon usage.COI proteins in Sciaenidae fish mitochondrial DNA bear ATG/AGA as initiation and termination codons, whereas in other Percoidei fishes, this is not the case (they possess GTG/TAA, GTG/AGG or GTG/T--).Studies in insects have shown positive correlations between the incidence of canonical initiation and termination codons and the relative rate of gene evolution (Szafranski, 2009).Whether this relationship also applies to fishes requires confirmation.
Codons in the 13 protein-coding genes identified in L. polyactis are shown in Table S4.As regards amino acids with the fourfold degenerate third position, codons ending in C are mostly seen, followed by codons ending in A and T for alanie, proline, glycine, valine and threonine.However, for arginine and serine, A is more frequent than C.Among codons with twofold degenerate positions, C appears to be Mitochondrial genome of the small yellow croaker 193 more used than T in the pyrimidine codon family, whereas the purine codon family ends mostly with A. Except for arginine, G is the least common third position nucleotide in all the codon families.All these features are very similar to those observed in vertebrates (Hu et al., 2010;Yang et al., 2010).

Ribosomal RNA gene and transfer RNA gene
As with other mitochondrial genomes, the genome contains a small (12S) and large (16S) subunits of rRNA, which are 950 bp and 1697 bp long, respectively (Table S2).As in the other vertebrates, they are located between tRNA-Phe and tRNA-Leu(UUR), and are separated by tRNA-Val (Figure S1, Table 3).When compared with other genes reported for Sciaenidae, and similar to other vertebrates, both rRNA genes are conserved either in A+T content or gene length and location.As with Gonostoma gracile (Miya and Nishida, 1999), preliminary assessment of the secondary structure of L. polyactis indicated that the present sequence could be reasonably superimposed on the 194 Cheng et al. proposed secondary structures of carp 12S and cow 16S rRNA (Gutell et al., 1993).Twenty-two tRNA genes, with lengths varying from 67 bp (tRNA-Cys and tRNA-Ser(AGY) to 75 bp(tRNA-Lys), were interspersed throughout the entire genome.As reported in some other vertebrates (Miya et al., 2003;Kim and Lee, 2004;Oh et al., 2007), with the known exception of the tRNA-Ser (AGY) gene, all tRNA gene transcripts can be folded into typical cloverleaf secondary structures (Figure S2).Besides harboring anticodons identical to those used in other vertebrate mitogenomes, they conserve the aminoacyl, DHU (dihydrouridine), anticodon and TYC (thymidine-pseudouridine-cytidine) stems.As shown in the rock bream (Oh et al., 2007) and Pseudolabrus fishes (Oh et al., 2008), the tRNA-Ser (AGY) found in the L. polyactis mitochondrial genome bore no complete DHU arm.Similar to usual tRNAs (Ohtsuki et al., 2002), aberrant tRNAs can also fit into the ribosome by adjusting their structural conformation and function.

Main non-coding regions of Percoidei species
The putative O L was confirmed in L. polyactis.When compared with other Percoidei fishes, they were almost identical and are located in a cluster of five tRNA genes (the WANCY region) between the tRNA-Asn and tRNA-Cys gene.The putative O L, besides serving as the initiation site of Light-strand replication, is capable of folding into a stable stem-loop secondary structure with 13 bp in the stem and 11 bp in the loop.Furthermore, there is a C-rich sequence in the loop, whereby RNA primer synthesis can be initiated.This C-rich sequence has also been found in the O L loop of other fishes, such as Gadus morhua (Johansen et al., 1990) and Oncorynchus mykiss (Zardoya et al., 1995).This feature supports the hypothesis that in vertebrates, primer synthesis is most probably initiated by a polypyrimidine tract (Taanman, 1999), and not by a stretch of thymines, as previously suggested (Wang and Clayton, 1985).The conserved sequence motif, 5'-GCCGG-3', was found at the base of the stem within tRNA-Cys.This motif seems to be involved in the transition from RNA to DNA synthesis (Hixson and Brown, 1986).
The mitochondrial control region is located between tRNA-Pro and tRNA-Phe in mitochondrial DNA.Besides being the most variable region, it also contains certain conserved motifs that are associated with the initiation of DNA replication and transcription (Zhao et al., 2006).The control region of L. polyactis was identified and compared with those of other Percoidei fishes.They are also located between the two tRNAs (Pro and Phe), and range in size from 533 bp (Parargyrops edita, EF107158) to 1354 bp (Pagrus major, AP002949), all having an overall base composition rich in A and T (A+T= 60%).The variation in length is largely due to the number of conserved domains inserted in these species.Long tandem repeats were recognized in Monodactylus argenteus and Pagellus bogaraveo, with lengths of 56 bp and 183 bp, respectively.Slippage and mispairing during mitogenome replication may explain tandem repeats in the control region (Brought and Dowling, 1997).Although this region is a unique and highly variable area in mitochondrial DNA, conserved domains and motifs are recognizable by multiple homologous sequence alignment and recognition site comparison.Control regions are also divided into a typical tripartite structure with an extent termination association sequence (ETAS), central conserved-block domains (CSB-F, CSB-D, and CSB-E), and conserved sequence block domains (CSB-1, CSB-2, and CSB-3) (Sbisa et al., 1997).The conserved ETAS motif in most fishes is TACAT, with one palindromic sequence, ATGTA.In Coreoperca kawamebari, there is some variation in ETAS having the conserved TGCAT motif.The consensus sequences of ETAS in Percoidei fishes was identified as TACAT-TATGTAT---CACCAT----ATATTAAC CAT, where "-" indicating nucleotide variations such as transitions, transversions, or deletions, similar to reported in sinipercine fishes (Zhao et al., 2006).CSB domains that are associated with the initiation of mitochondrial DNA replication, and other important functions of control regions, were detected.Consensus sequences have been summarized in Table S5.While all these conserved blocks can be easily identified in most of the Percodei species, the incomplete structure of control regions with the absence of conserved domains, was also detected (Table 4).Such obvious insertions and deletions implied the rapid evolution of the control region in Percoidei fishes, a possible source of information for dissecting the structure-function-evolution relationships of control regions.

Effect of partitioning on harmonic mean -lnL, topology, posterior probabilities and bayes factors
Harmonic mean -lnL was used when measuring the aptitude of data partitioning in defining the entire data set.Partitioning the COI data set by codon position greatly improved harmonic mean -lnL (Table 5).The same results were reported (Brandley et al., 2005;Brown and Lemmon, 2007), when partitioning by codon position and RNA gene specific stems and loops.The inference is that, as the different data partitions may evolve quite variably, partitioning can be considered as a useful method for accommodating heterogeneity in the processes of molecular evolution.Consensus tree topologies inferred from the three analyses differed, yet all of these difference involved alternative placements of weakly supported nodes (Bayesian posterior probabilities < 95%).These distinct differences were dependent on whether the COI sequences were partitioned.The most dramatic differences could be noted, not only in the deep nodes in the tree, but also in the relationships among the three main groups (Groups1, 2 and 3 in Figures 1-3).No obvious differences were observed in posterior probabilities among analyses depending on different partition-strategies All Bayes-factor estimates were much higher than the criterion for strong evidence against a hypothesis.According to the Bayes factors, analysis employing the P 3 partition-strategy provided a decisively better explanation of the data than the remainder (Table 5 and  Table 6).Thus, as this is the preferred hypothesis of the phylogeny of Sciaenidae fishes based on the present data, subsequent discussion will be limited to this tree (Figure 3).

The phylogeny of the Sciaenidae family
Based on the characters of the gas bladder, sagitta, and mental pores, Zhu et al. (1963) divided the family into seven subfamilies, viz., Johniinae, Megalonibinae, Bahabinae, Sciaeninae, Otolithinae, Argyrosominae and Pseudos-196 Cheng et al.   ciaeniae.In this study, phylogenetic analysis revealed three distinct monophyletic groups (Groups1, 2 and 3), thus very different from the results of Zhu et al. (1963).Monophyly of the genera Otolithes and Argyrosomus is not supported.The proposition (Zhu et al., 1963) of grouping Argyrosomus and Nibea into the subfamily Argyrosominae is also without support, for, based on the phylogenetic tree presented herein, Argyrosomus and Nibea have been placed into two distinct groups (Figure 3).Even though monophyly of Pseudosciaeniae is supported, herein, Bayesian posterior probability is relatively weak.Notably, within the subfamily Pseudosciaeninae, L. polyactis was found to be most closely related to Collichthys niveatus, and then to its congeneric species, L. crocea.Although previous molecular phylogenetic analyses had the same opinion on the phylogenetic positions of Otolithes, Argyrosomus, and Argyrosominae, the phylogentic relationships within the subfamily Pseudosciaeninae are still far from clear (Meng et al., 2004;Xu et al., 2010).Based on different data and methods, once again our results suggested that Collichthys and Larimichthys may be merged into a single genus.These results are consistent with Chen (Chen QM, 2007, Disserta-Mitochondrial genome of the small yellow croaker 197   tion, Jinan University, China) and Tong et al. (2007), where, respectively, non-monophyletic Larimichthys and Collichthys were recovered.Nevertheless, Cheng et al. (2011) recently recovered monophyletic Collichthys and Larimichthys, thus, in common with the morphological results of Zhu et al. (1963).Sampling errors, scarce data and mismodeling may have contributed to these disputes.Thus, the inclusion of further data from the mitochondrial and nuclear genomes, more accurate evolutionary models, and extensive taxonomic sampling, with careful identification integrated with information on morphological characters, is required for reconstructing the phylogeny of Sciaenidae.

Figure 2 -
Figure 2 -Consensus trees of Sciaenidae constructed using Bayesian analysis based on P 2 partition-strategies.The accession numbers of species are those listed in Figure 1.

Figure 3 -
Figure 3 -Consensus trees of Sciaenidae constructed using Bayesian analysis based on P 3 partition-strategies.The accession numbers of species are those listed in Figure 1.

Figure S1 -
Figure S1 -Gene map of the L. polyactis mitochondrial genome.

Figure S2 -
Figure S2 -Sequences of L. polyactis mitochondrial tRNA genes, represented in the clover-leaf form.
Pagrus auriga.The interrogation shows for non-determined initiation condons.

Table 1 -
Partition strategies used in this study.

Table 2 -
Data partitions, their estimated models of sequence evolution, and total number of characters of each partition used in phylogenetic analysis.

Table 3 -
Characteristics of the mitochondrial genome of L. polyactis.

Table 4 -
Characteristics and recognized conserved domains of control regions in Percoidei species.

Table S1 -
PCR primers in the analysis of the L. polyactis mitochondrial genome

Table S3 -
Comparison of predicted initiation and termination condons of the 13 mitochondrial protein-coding genes among 23 species of Percoidei.

Table S4 -
Codon usage in L. polyactis mitochondrial protein-coding genes

Table S5 -
Consensus sequences of conserved domains in control region of Percoidei species.