The complete mitochondrial genome of Engyodontium album and comparative analyses with Ascomycota mitogenomes

Abstract Engyodontium album is a widespread pathogen that causes different kinds of dermatoses and respiratory tract diseases in humans and animals. In spite of its perniciousness, the basic genetic and molecular background of this species remains poorly understood. In this study, the mitochondrial genome sequence of E. album was determined using a high-throughput sequencing platform. The circular mitogenome was found to be 28,081 nucleotides in length and comprised of 17 protein-coding genes, 24 tRNA genes, and 2 rRNA genes. The nucleotide composition of the genome was A+T-biased (74.13%). Group-II introns were found in the nad1, nad5, and cob genes. The most frequently used codon of protein-coding genes was UAU. Isoleucine was identified as the most common amino acid, while proline was the least common amino acid in protein-coding genes. The gene-arrangement order is nearly the same when compared with other Ascomycota mitogenomes. Phylogenetic relationships based on the shared protein-coding genes revealed that E. album is closely related to the Cordycipitaceae family, with a high-confidence support value (100%). The availability of the mitogenome of E. album will shed light on the molecular systematic and genetic differentiation of this species.


Introduction
The Engyodontium album fungus is a member of the Cordycipitaceae family and it characterized by cottony, white colonies that produce numerous dry, tiny conidia. Evidence suggests that E. album can infect a wide range of invertebrates and vertebrates with a cosmopolitan distribution, including arthropods, reptiles, birds, mammals, and humans (Zimmermann, 2007). Infections caused by E. album can induce mild to severe disease, including eczema vesiculosum (Hoog, 1972), granulomatous skin lesions, brain abscesses (Seeliger, 1983), and keratitis (McDonnell et al., 1984). In addition, some patients are even infected without being directly exposed to this fungus, e.g., by using an E. album product bassianin (Tucker et al., 2004). With the incidence of E. album infection increasing throughout the world, it is necessary to explore the molecular characteristics and phylogenetics of E. album for effective therapeutic strategies. Unfortunately, the taxonomy of E. album genus remains unsettled.
Mitochondria are responsible for cellular respiration and energy production in eukaryotic organisms (Henze and Martin, 2003). Mitochondrial DNA (mtDNA) is typically circular and has its own replication machinery that is usually regulated by the nuclear genome (Hu et al., 2004). Owing to their high mutation rates, small sizes, and lack of recombination, mtDNAs have been widely used as informative molecular markers for phylogenetic analyses and species identification (Botero-Castro et al., 2013). Recently, mtDNA was also used for DNA barcoding to facilitate identification in the fields of population genetics, comparative genomics, and evolutionary genomics (Kurbalija Novicic et al., 2015;Qiu et al., 2013). The mitochondrial genomes of fungi have been used as genetic markers for identification and classification purposes (Beaudet et al., 2013). In 1997, Canadian researchers defined the goals of the fungal mitochondrial genome project as being to analyze the genome structure, gene content, and evolution of gene expression in fungal mitochondria (Paquin et al., 1997). Fungal mitochondrial genomes are closed, circular-DNA molecules with lengths ranging from 10 to 80 kb and encode a respiratory chain subunit gene, an ATP synthase complex subunit gene, and ribosomal RNA and tRNA genes (Paquin et al., 1997). As of November, 2016, 339 fungal mitochondrial genomes had been deposited in the National Center for Biotechnology Information (NCBI) database. The mitochondrial genomes of Heterakis gallinae and Heterakis beramporia were amplified by Wang et al. (2016) to develop useful markers for their systematic-and population-genetics study. Liu et al. (2014) sequenced the complete mitochondrial genome of Micrura ignea and made comparisons with other nemertean mitogenomes. However, the complete mitochondrial genome sequence remains unavailable for the genus Engyodontium.
In this study, we completely sequenced the E. album mitogenome to characterize and classify it. We also analyzed the gene content and structure, as well as codon utilization associated with protein-coding genes (PCGs). Other fungal mitogenomes were comparatively analyzed to gain additional insights into their gene content, structure, organization, and phylogenetic relationships.

Materials and Methods
Sample collection and DNA extraction E. album (strain: ATCC-56482), isolated from a human brain abscess causing death in a female patient (Seeliger, 1983), was purchased from BeiNa Biological Technology Co., Ltd. (Suzhou, China). The strain was cultured at 24°C in ATCC 200 Yeast Mold Agar medium (BD 271120). Fungus samples were collected after washing twice with sterile water and then stored at -80°C. Total genomic DNA was isolated from the spores and mycelium using the E.Z.N.A. Fungal DNA Kit (Omega), according to the manufacturer's instructions. The integrity of the genomic DNA was checked on a 1% agarose gel, and the concentration was detected using a NanoDrop 2000 UV-Vis spectrophotometer (NanoDrop).
Sequence assembly, annotation, and analysis E. album mtDNA was sequenced using an Illumina HiSeq2000 instrument and assembled using SPAdes software, version 3.6.1 (Bankevich et al., 2012). The Bandage 0.7.1 program was used to check the assembly path and confirm the E. album mtDNA formed a circular molecule (Wick et al., 2015). Moreover, iterative mitochondrial baiting was used to further verify the accuracy of the sequence from head to tail. PCGs were annotated using NCBI's ORF-finder program (https://www.ncbi.nlm.nih.gov/orffinder/). Analysis of tRNA genes was conducted with the tRNAscan-SE 1.21 Search Server (http://lowelab.ucsc.edu/tRNAscan-SE/) (Lowe and Eddy, 1997). Complete ribosomal RNA genes were identified by alignment with the Lecanicillium saksenae mitogenome (GenBank accession no. KT585676) through BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The circular genome map was constructed using OGDRAW (http://ogdraw.mpimp-golm.mpg.de/cgibin/ogdraw.pl) (Lohse et al., 2007). The codon-usage frequency for each amino acid was determined with CodonW (Peden, 2000). The complete sequence of E. album mtDNA was deposited in GenBank under accession no. KX061492. Comparative analyses of the nucleotide sequence of each PCG and ribosomal DNA genes were conducted for Acremonium chrysogenum, Fusarium oxysporum, Hypocrea jecorina, L. saksenae, and Metacordyceps chlamydosporia. Strand bias was characterized by determining AT skewing and GC skewing, calculated using the relationships (A%-T%)/(A%+T%) and (G%-C%)/(G%+C%), respectively. Mitochondrial genome sequences were compared using the Blast Ring Image Generator (BRIG; Tablizo and Lluisma, 2014), with E. album mtDNA serving as the reference sequence. To estimate the evolutionary-selection constraints on genes in the Hypocreales and Ascomycota taxa, common PCGs were chosen to calculate the ratio of nonsynonymous and synonymous changes (Ka/Ks). Codon alignments were performed before pairwise Ka, Ks, and Ka/Ks ratios were calculated using DnaSP software, version 5 (Librado and Rozas, 2009).

Phylogenetic analysis
To determine the phylogenetic location of E. album, currently available complete or near-complete mitochondrial genomes of fungi were used for phylogenetic analysis. The clade including Phaeosphaeria nodorum and Sporothrix schenckii was set as the outgroup. A global analysis was performed using 13 shared PCGs (nad1-nad6, nad4L, cox1-cox3, atp6, atp8, and atp9) among E. album and other related mitochondrial genomes. These genes were individually aligned using the default settings of MAFFT (Katoh et al., 2005), and then these 13 alignments were concatenated using CLUSTAL X software, version 1.81 (Thompson et al., 2002). Finally, a phylogenetic tree was constructed using RAxML version 8.1.12 and MrBayes 3.2, using the general time-reversible model (Stamatakis, 2014;Huelsenbeck and Ronquist, 2001). For each node of the ML tree, bootstrap support was calculated using 1000 replicates. For the Bayesian tree, the initial 10% of values were discarded as burn-in and 4 simultaneous chains were run for 10,000,000 generations.

Genome organization, structure, and composition
The complete mt genome of E. album is a circular molecule of 28,081 bp containing 17 PCGs, 24 transfer RNA genes, and 2 ribosomal RNA genes. All mt genes of E. album are transcribed in the same direction. The average base composition of the complete E. album mitogenome is 37.39% A, 14.65% C, 11.21% G, and 36.74% T. Therefore, the nucleotide composition of the E. album mt genome is biased toward A+T (74.14%). The composition of the E. album mt genome sequence was found to be strongly skewed away from A, in favor of T (AT skew = -0.01), and the GC skew was 0.14, as observed with those of other Cordycipitaceae family members. Moreover, Figure 1 shows that the mitogenome includes 24 tRNA genes and 2 rRNAs genes (large and small subunits).
The relative synonymous codon usage (RSCU) value is a measure of the synonymous codons present in a coding sequence. If there is no codon-usage bias, the RSCU values equal 1.00. A codon that is used less frequently than expected will have an RSCU value of < 1.00, whereas a codon used more frequently than expected will have an RSCU value of > 1.00 (Sharp et al., 1986). The results from the E. album mitogenome indicated that almost all amino acids (except for Met) showed codon-usage bias. The most frequently used codon in PCGs was UAU, followed by AUU and UAA, which is consistent with the (A+T)-rich content of the E. album mitogenome. CGC was the least used codon. Ile is the most commonly encoded amino acid in the E. album mitogenome, while Pro is the least common (Table 2).

Transfer and ribosomal RNA genes
Twenty-four tRNAs were recognized in the mt genome of E. album, were interspersed between the rRNAand PCGs, and ranged from 70 to 85 bp in length. Of these tRNAs, two forms each were identified for tRNA-Arg (AGN and CGN), tRNA-Ser (UCN and AGN), and tRNA- 846 Mitochondrial genome of Engyodontium album  and AGN), and tRNA-Leu (UUN and CUN) (Figure 2). These five tRNAs adopt a special structure that is widely found in the Sordariomycetes class and is a common feature for Hypocreales species. The E. album rrnL (16S rRNA) gene is located between tRNA-Pro and rps, while rrnS (12S rRNA) is located between atp6 and tRNA-Tyr. The lengths of the rrnS and rrnL genes are 1,468 bp and 2,244 bp, respectively, and their A+T contents are 65.46% and 67.98%, respectively.

Comparative analysis with other mt genomes
To better understand the gene contents and structure of this species in the Hypocreales order, which consists of six families, the mt genomes from L. saksenae (Cordycipitaceae), Fusarium oxysporum (Nectriaceae), Hypocrea jecorina (Hypocreaceae), Metacordyceps chlamydosporia (Clavicipitaceae), and Acremonium chrysogenum (Hypocreales incertae sedis) were chosen for comparative analysis. The genomes were similar in size, with the exception of F. oxysporum (Table 3). The results showed that genome size ranged from 25 kb to 42 kb. The AT-skew values for these species were all negative, while the GC-skew values were positive. As shown in Table 3, the AT-skew value of E. album is fairly close to that of M. chlamydosporia.
Our results showed clear differences in the gene contents of the mitogenomes studied (Table 4). They all contain genes encoding components of the oxidative-phos- 848 Mitochondrial genome of Engyodontium album Comparison of the Hypocreales mtDNA sequences revealed that they were fairly well conserved, with almost 80% sequence identity in the genomic regions shared with that of E. album and only major differences existing in the regions containing the tRNA-Arg (8.8k-12k), nad5 (11.5k-12.5k), cob (14.4k-15.6k), orf148 and orf77 (18.1k-19.9k), and nad1 (20.3k-20.5k) genes. In addition, no gene-module rearrangement occurred in these species, as can be seen in the BRIG map (Figure 3).

Phylogeny analysis
To investigate the phylogenetic position of E. album and the inner relationships of the order Hypocreales, phylogenetic trees were constructed using the nucleotide sequences of 13 PCGs from 20 complete mitochondrial genomes that belong to the Ascomycota division. The phylogenetic trees reconstructed using the ML and Bayes algorithms revealed different clades, which represented five orders, including Hypocreales, Pleosporales, Eurotiales, Glomerellales, and Ophiostomatales (Figure 4). The species in three different families, namely Nectriaceae (F. The species in the Hypocreales order all clustered within the same clade. E. album was located with species in the Cordycipitaceae family with a strong node-supporting value (100% for ML and 1 for Bayes). Examination of the pairwise Ka/Ks ratio for the 13 common PCGs in the Hypocreales and Ascomycota taxa demonstrated that all these genes have undergone purifying selection (Ka/Ks < 1) ( Figure 5). Among the species in the Hypocreales order, the Ka/Ks ratio was higher in the cox1 (0.409), cox2 (0.329), and nad6 (0.263) genes than in other genes, while among the species in the Ascomycota division, the most variable genes were nad6 (0.597), cox1 (0.579), and nad5 (0.504).

Discussion
Many fungi have a significant adverse impact on global human and animal health (Campbell and Johnson, 2013). A particularly important example is the Cordycipitaceae family of fungi (Menzies and Turkington, 2015). E. album is a widespread species that poses allergic, pathogenic, or toxic risks to humans and mammals (Siegel and Shadduck, 1990;Goettel et al., 2001;Tucker et al., 2004;Balasingham et al., 2011). Despite advances in sequencing and bioinformatics technologies, only limited characterization of their mitogenomes has been conducted. Here, we sequenced the whole mitochondrial genome of E. album, and then compared its genome structure, content, and phylogenetic relationships with other fungal mitogenomes. The mitochondrial genome of E. album is a circular DNA molecule of 28,081 bp in length. This size is comparable to that of previously sequenced mitogenomes of members of the Hypocreales order, such as A. chrysogenum (27,266 bp) (Eldarov et al., 2015), L. saksenae (25,919 bp) (Xin et al., 2017), and M. chlamydosporia (25,615 bp) (Ghikas et al., 2006). The average AT content of the E. album complete mitogenome is 74.13%, just like the A+T contents reported for A. Chrysogenum (74.13%) and L. saksenae (74.13%) (Xin et al., 2017). The E. album mitogenome gene arrangement is identical to that of other Cordycipitaceae family members, such as Ophiocordyceps sinensis (Li et al., 2015), Beauveria pseudobassiana (Oh et al., 2015), Cordyceps militaris (Sung, 2015), and Hirsutella minnesotensis (Zhang et al., 2016). In addition, the PCGs of the E. album mt genome were inferred to start with ATG, which is consistent with the arrangement in the mt genomes of other Cordycipitaceae family members (Oh et al., 2015;Sung, 850 Mitochondrial genome of Engyodontium album    and rrnL genes are located between atp6 and tRNA-Lys, and between tRNA-Pro and rps, respectively. The GC contents of the E. album rrnS and rrnL genes are 34.54% and 31.82%, respectively, which is within the range of other Cordycipitaceae mitogenomes (Table 4).
For decades, there has been considerable debate concerning the validity of the taxonomical classification of the Engyodontium species. Regarding E. album, it was previously included in the Beauveria genus. In 1940, this genus was renamed Tritirachium and reclassified as a member of the Moniliaceae family. However, E. album was later reassigned to the Engyodontium genus (Hoog, 1972). Due to insufficient morphological features, the phylogenetic framework of Engyodontium has been little explored, even though the sequences of the 18S and 28S ribosomal RNA genes, the nuclear ribosomal internal transcribed spacer, and the cox1 gene sequences are available (Seifert, 2009;Schoch et al., 2012). Alternatively, mt genome sequences may provide reliable genetic markers in examining the taxonomic status of E. album. Phylogenetic analysis indicated that species in Nectriaceae, Hypocreaceae, Clavicipitaceae, and Cordycipitaceae are well resolved. As a member of the Cordycipitaceae family, E. album showed, as expected, a close genetic relationship with the Cordycipitaceae family. This finding was also supported by AT/GC-skew values and sequence differences in PCGs at both the nucleotide and amino acid levels among five representative Hypocreales species. However, no exact data exist yet re-garding other lineages of Hypocreales. Therefore, it would be meaningful if a comprehensive phylogeny of Hypocreales is performed in the future, after more mt genome data become available, especially the mitogenome sequences of genera with currently incomplete sequences, such as Engyodontium and Elaphocordyceps.
In conclusion, the complete nucleotide sequence of the E. album mt genome was determined in this study. Comparative analysis showed that the structure, organization, and gene content of E. album mtDNA are highly similar to that of species in the Cordycipitaceae family. The availability of the complete mt genome sequence of E. album provides novel genetic markers for exploring cryptic/sibling species relating to the Hypocreales order; for preventing infection; and for further studies of the epidemiology, biology, population genetics, and phylogenetic systematics of E. album.