Fusion of the subunits α and β of succinyl-CoA synthetase as a phylogenetic marker for Pezizomycotina fungi

Gene fusions, yielding the formation of multidomain proteins, are evolutionary events that can be utilized as phylogenetic markers. Here we describe a fusion gene comprising the α and β subunits of succinyl-coA synthetase, an enzyme of the TCA cycle, in Pezizomycotina fungi. This fusion is present in all Pezizomycotina with complete genome sequences and absent from all other organisms. Phylogenetic analysis of the α and β subunits of succinyl-CoA synthetase suggests that both subunits were duplicated and retained in Pezizomycotina while one copy was lost from other fungi. One of the duplicated copies was then fused in Pezizomycotina. Our results suggest that the fusion of the α and β subunits of succinyl-CoA synthetase can be used as a molecular marker for membership in the Pezizomycotina subphylum. If a species has the fusion it can be reliably classified as Pezizomycotina, while the absence of the fusion is suggestive that the species is not a member of this subphylum.


Introduction
Amongst eukaryotes, Fungi is the kingdom with most sequenced genomes. The sequenced genomes cover fungal species broadly and include agriculturally, medically, and industrially significant species with very diverse genomes. The fungal kingdom is divided in Chytridiomycota, Neocallimastigomycota, Glomeromycota, Ascomycota, Basidiomycota and Blastocladiomycota, and Microsporidia (Hibbett et al., 2007). Ascomycota, which diverged from Basidiomycota in the region of 741-1195 million years ago (Scannel et al., 2007), is the largest phylum of the Fungi kingdom and is again divided into Saccharomycotina, Taphrinomycotina, and Pezizomycotina (Lumbsch and Huhndorf, 2007). Pezizomycotina is the largest subphylum of Ascomycota and contains more than 33,000 described species (Spatafora et al., 2006).
Pezizomycotina, often referred to as filamentous fungi, grow primarily in a filamentous form and degrade biomass to free sugars that are used as source of energy and carbon (Arvas et al., 2007). Most species exhibit a dominant hyphal growth form, with almost all of the sexually reproducing forms possessing ascomata (Spatafora et al., 2006). Filamentous ascomycetes have been shown to be a monophyletic group distinguished from Taphrinomycotina and Saccharomycotina on the basis of ribosomal DNA se-quence analysis (Alexopoulos et al., 1996;Spatafora et al., 2006).
In recent years, comparative genomics rather than gross morphology has been used to analyze specific sub groups of fungi. Notably, analysis of the phylum Saccharomycotina has revealed major events of evolution such as recent whole genome duplication and subsequent gene loss or subspecialization (Arvas et al., 2007;Dietrich et al., 2004). In Pezizomycotina, on the other hand, only a subset of protein families related to plant biomass degradation and secondary metabolism revealed signs of recent expansion, likely because of a repeat-induced point mutation mechanism for the active removal of duplicated sequences that has been identified in varying levels in Pezizomycotina species (Arvas et al., 2007). Gene duplication significantly influences evolution, as duplicated genes may be retained if they provide an advantage, either because of an increased gene dosage effect or because of increased functional diversification, or may alternatively be lost (Wagner, 1998;Li and Graur, 2000;Cornell et al., 2007). Multiple copies of the genes coding for the a and b subunits of succinyl-CoA synthetase have been shown to exist in several species of Aspergillus and in Neosartorya fisheri, placing succinyl-CoA synthetase among the subset of proteins showing gene expansion in Pezizomycotina (Flipphi et al., 2009). way in which acetyl-CoA derived from the catabolism of sugars and lipids is oxidized to carbon dioxide and energy in the form of GTP, NADH, and FADH 2 (Hanson, 2008). This enzyme catalyzes the reversible transformation of succinyl-CoA, orthophosphate, and GDP to succinate, GTP, and the four-carbon metabolite coenzyme A in a three-step substrate-level phosphorylation reaction (Kanehisa and Goto, 2000;Hamblin et al., 2008). Succinyl-CoA synthetase is composed of two subunits, a and b, which generally form dimers in eukaryotes and tetramers in prokaryotes. Nucleotide specificity of succinyl-CoA synthetase is thought to be determined by the N-terminal domain of the beta subunit (Hamblin et al., 2008), and the alpha subunit may contain a targeting sequence for the enzyme (Birney and Klein, 1995). While prokaryotes are capable of using both types of nucleotides, eukaryotic succinyl-CoA synthetase enzymes usually occur as isoforms specific for either GTP or ATP (Hamblin et al., 2008). Multiple copies of the succinyl-coA synthetase genes tallies with the finding that duplications are most common in Pezizomycotina when the proteins produce secondary metabolites (Arvas et al., 2007), and with the common belief that natural selection on metabolic phenotypes mainly targets enzymes that control the rate of turnover of molecules through a metabolic pathway, such as succinyl-CoA synthetase (Flipphi et al., 2009).
In Pezizomycotina, one set of the duplicated a and b gene copies for succinyl-CoA synthetase has become fused. Gene fusion, yielding formation of multidomain proteins, is a major driving force in protein evolution and is a rare evolutionary event in comparison to the frequency of point mutations. Consequently, gene fusion events were recently utilized as an evolutionary marker to pinpoint the root of the eukaryotic tree (Nara et al., 2000;Cavalier-Smith, 2002, 2003). Here we examine how the presence of the succinyl-CoA synthetase fusion gene can be used as a simple phylogenetic marker for Pezizomycotina, and attempt to trace the origin and nature of this fusion in Fungi.

GenBank sequence data
We used the fungi genome BLAST tool from NCBI to search all complete fungal genomes with predicted proteins in GenBank. We started with the sequence of a Pezizomycotina succinyl-CoA synthetase a and b fusion protein, and used blastp to identify all other succinyl-CoA synthetase proteins in different fungal genomes.
With the exception of Microsporidia species, we were able to identify at least one a and one b subunit in each fungal genome available in the Genomic BLAST page (Table  S1). The searches returned both fused and non-fused genes. For aligning the fused proteins with non-fused a and a cop-ies of the protein, the fusions were artificially split into a' and b' subunits. To increase the diversity of fungal species analyzed, we also included the sequences of the a and b succinyl-CoA synthetase subunits of two Neocallimastigomycota fungi. As an outgroup we used the a and b subunits of succinyl-CoA synthetase from three mammals, Homo sapiens, Mus musculus and Canis familiaris.

Phylogenetics analyses
An alignment of the a subunits of succinyl-CoA synthetase for 53 organisms, including 25 Pezizomycotina, 14 Saccharomycotina, 2 Taphrinomycotina, 7 Basidiomycota, 2 Neocallimastigomycota as well as 3 metazoans to provide an outgroup, was created in ClustalX v.2.0.11 (Larkin et al., 2007) using default settings. A total of 80 sequences were aligned, including the region corresponding to the a subunit in the fusion genes in Pezizomycotina. A separate alignment of b and b' subunits of the same organisms was created in the same manner. In total, 83 sequences were used in the b subunit alignment, including the region corresponding to the b subunit in the fusion genes in Pezizomycotina.
Each set of aligned sequences corresponding to the a and b subunits was used as input to infer phylogenetic trees using maximum likelihood and Bayesian analysis. For the maximum likelihood analysis we first used the program ProtTest (Abascal et al., 2005) to select the amino acid substitution model that best fits the protein alignments. In both datasets, the best model of amino acid substitution was the LG model (Le and Gascuel, 2008), with invariable sites and across site rate variation. We used PhyML 3.0 (Guindon and Gascuel. 2003), as implemented in SeaView (Galtier et al., 1996) to create maximum likelihood trees using the LG model of amino acid substitution (Le and Gascuel, 2008) with optimized number of invariable sites and optimized across site rate variation. Bootstrap support was calculated using the aLRT model (Anisimova and Gascuel, 2006). The Bayesian analysis was performed using MrBayes (Ronquist and Huelsenbeck, 2003) with four Monte Carlo Markov Chains and variable rates of substitution. Each chain was run for 1,000,000 generations and trees were sampled every 1,000 generations. The first 100 trees were excluded as burn-in and we used the remaining 900 trees to calculate bootstrap values. The trees were visualized and prepared for publication using Dendroscope (Huson et al., 2007) and FigTree.

Results
The fusion of the a and b subunits of succinyl-CoA can serve as a phylogenetic marker for Pezizomycotina fungi We used the sequence of the fused a and a subunits of succinyl-CoA synthetase from Pyrenophora tritici- 670 Koire and Cavalcanti repentis, a Pezizomycotina, to search all the complete fungal genomes with protein predictions available in the genomic blast webpage form NCBI. As of August 1 st 2009, the genomic blast webpage allowed the search of the protein sequences of 51 fungal genomes: 41 Ascomycota, 7 Basidiomycota and 3 Microsporidia. The 41 Ascomycota are divided in 25 Pezizomycotina, 14 Saccharomycotina, 2 Taphrinomycotina. All genomes analyzed had both the a and b subunits of succinyl-CoA synthetase, with the exception of the three Microsporidial genomes that completely lack both subunits of the protein and were not analyzed further. The genome of the Basidiomycota Postia placenta also seems to be lacking both subunits of the gene; however, an additional search of the genome page for this organism in the Joint Genome Institute (JGI), revealed that the genes are present, and we used the sequences obtained from JGI in the analyses for this organism.
In addition to the sequences obtained from the genomic blast searches we included the sequences of 2 Neocallimastigomycota fungi in the analysis to increase the diversity of the fungal species. The Ascomycota, together with Basidiomycota, form the subkingdom Dikarya, and are more closely related to each other than to Neocallimastigomycota or Microsporidia. Finally, we used the sequences of the a and b subunits of succinyl-CoA synthetase from three Metazoans as outgroup.
Out of the 25 Pezizomycotina fungi with complete genomes available in NCBI, we immediately detected the fusion gene in 22 genomes. According to the annotation in NCBI the fusion appears to be absent from Aspergillus niger, Coccioides immitis and Gibberella zeae. Interestingly, each of these species has at least two a and two b subunits of succinyl-CoA synthetase. Thus, we further analyzed these genomes to determine if the fusions are really absent or if they are just not annotated.
In Aspergillus niger, there are two subunits a and two subunits b annotated in the genome; a tblastn search of the fusion protein against A. niger genome showed that a subunit b gene immediately follows a subunit a gene in the same strand, indicating that the genes could be fused in this organism. Furthermore, a blastp search of the gene predictions in this organism's genome webpage in JGI revealed that a fusion gene is present (gene model number jgi|Aspni1|176118). To maintain consistency, we used the proteins in GenBank in the analyses. The proteins involved in the putative fusion protein in A. niger group with the fusion proteins in the other Pezizomycotina in the phylogenetic trees, further confirming that these two proteins are indeed fused in this genome (Figures 1 and 2, Table S1).
A similar problem occurs in Coccidioides immitis. A tblastn search revealed that two consecutive genes in the genome hit the fusion gene. In this case, however, the first gene only hits part of subunit a while the second gene hits the remainder of subunit a and the full length of subunit b. We aligned both genes with the fusion genes and determined the boundaries between the a and b subunits assuming the proteins are fused and used these sequences in the analyses. Like in A. niger, the putative fusion proteins groups with the fusion proteins in the other Pezizomycotina in the phylogenetic trees (Figures 1 and 2, Table S1).
Finally, while the fusion is not annotated in Giberella zeae, in this organism a subunit b gene follows a subunit a gene. Both subunits are in the same strand but are 3 kb apart and there seems to be a small intervening gene in the opposite strand between the subunits. A large intron between the subunits could have confounded the gene prediction software, and it is not unknown for a gene to be nested inside an intron of another (Kumar, 2009). It is worth noting that the version of the genome available in GenBank is based on a preliminary whole-genome-shotgun that has been superceded by a new assembly of the genome. Unfortunately, the new assembly doesn't seem to be available at the time of this writing. Interestingly, these two proteins group with the fusion proteins in the phylogenetic trees (Figures 1 and  2, Table S1), indicating that they could be fused. The alternative explanation would be that the proteins were once fused and that the insertion of a gene in between the a and b subunits broke the fusion into two working copies of the subunits; while plausible, this second explanation seems to be unlikely as the insertion would have to have occurred right at the junction between the two subunits. This organism also has an extra copy of the subunit a that groups with the fusions, suggesting that the fusion gene was duplicated in this organism and the subunit b was eliminated in one of the copies of the fusion.
These results indicate that, even though it might not have been annotated as such, the fusion of the a and b subunits of succinyl-CoA synthetase is present in all 25 Pezizomycotina genomes available at GenBank, with the possible exception of that of G. zeae in which the fusion protein seem to at least have once been present. In addition, the fusion gene was not detected in any other fungal genome and a blast search of the entire NR database revealed that the fusion is only present in Pezizomycotina fungi.
The apparent presence of the fusion gene in most Pezizomycotina and its absence in all other organisms suggest that the fused succinyl-CoA synthetase gene can serve as a phylogenetic marker for Pezizomycotina. If a species has the fusion it can be reliably classified as Pezizomycotina; if a species does not have the fusion it most likely does not belong to this subphylum.

Evolution of the a and b subunits of succinyl-CoA synthetase in fungi
We used the sequences of the succinyl-CoA synthetase of 53 species to build phylogenetic trees for both the a and b subunits of the protein. The data set contained 48 Dikarya, 2 Neocallimastigomycota and 3 metazoans. The Dikarya are divided in 41 Ascomycota (25 Pezizomycotina, 14 Saccharomycotina, 2 Taphrinomycotina) and 7 Basidiomycota (Table S1). The sequences of the fusion genes in the 25 Pezizomycotina were split into their a and b constituents.
All sequences were aligned in ClustalW using default parameters and trees were built using MrBayes and PhyML. The resulting trees are shown in Figures 1 and 2. In all trees the fused genes form a monophyletic group (with strong bootstrap support) and the non-fused copies in Pezi-672 Koire and Cavalcanti zomycotina form another monophyletic group (with strong bootstrap support).
The phylogenetic trees of the a subunits ( Figure 1) suggest that this subunit duplicated before the divergence of Neocallimastigomycota from Dikarya. Saccharomycotina and Taphrinomycotina lost the copy that would later become fused in Pezizomycotina, while Neocallimastigomycota and Basidiomycota lost the other copy. The a subunit present in Neocallimastigomycota and Basidiomycota groups with the fused copy in Pezizomycotina, while the a subunit present in Saccharomycotina and Taphrinomycotina groups with the unfused copies of the subunit in Pezizomycotina. G. zeae has an extra copy of subunit a that groups with the fused genes, indicating that it probably resulted from a recent duplication of the fused gene followed by the loss of the b subunit. This loss should Succinyl-CoA synthetase in fungi 673 not have a large phenotypic effect, as another copy of the fusion gene would be present in the genome.
The trees of the b subunit ( Figure 2) indicate a different evolutionary story for this subunit. The b subunit duplicated in Dikarya after their divergence from Neocallimastigomycota, and after the divergence of basidiomycetes from ascomycetes, but before the divergence of Pezizomycotina from Saccharomycotina and Taphrinomycotina. The copy that would become fused in Pezizomycotina was lost from Saccharomycotina and Taphrinomycotina.
A second unfused copy of the b subunit is present in three Aspergillus species (A. clavatus, A. flavus and A. fumigatus) and in Neosartorya fischeri (Figure 2), all members of Trichocomaceae, and groups with the fusion proteins. This suggests that the b subunit duplicated a second time before becoming fused in Pezizomycotina and was lost from most Pezizomycotina with the exception of some Trichocomaceae.
Interestingly, one Basidiomycota, Ustilago maydis, also has two copies of each subunit of succinyl-CoA synthetase. One of the a and one of the b subunits group with the fusion gene in Pezizomycotina, which would suggest that this gene could have been the result of a lateral gene transfer between a Pezizomycotina and U. maydis. This would not be the first gene shown to have been transferred from Pezizomycotina to U. maydis (Shen et al., 2009). We checked the location of these genes in the genome of U. maydis and found that they are consecutive but in opposite strands, and therefore cannot be not fused in this species. A closer look at the trees reveals that the sequence of the U. maydis subunits that group with the fusions has a long branch, and its grouping with the fusions could be explained by long branch attraction. The presence of two copies of each of the subunits in the genome of U. maydis might have left one copy of each subunit free to accumulate mutations, explaining why they would be in such a long branch. However, it is also possible that the fusion gene was horizontally transferred to U. maydis and a genome rearrangement broke the gene in two parts and reversed one of the subunits in relation to the other, although this explanation is less likely considering that such rearrangement would need to have happened in such a way as to generate working versions of the genes.
With the exception of U. maydis, the Basidiomycota form a monophyletic group in the trees. The Neocallimastigomycota, Taphrinomycotina and Pezizomycotina also form monophyletic groups. The Saccharomycotina only form a monophyletic group in the maximum likelihood tree of the b subunit ( Figure 2b); in all other trees Yarrowia lipolytica groups with the unfused copy of the proteins in Pezizomycotina and not with Saccharomycotina. This unexpected placement of Yarrowia lipolytica has been seen in other analyses as well (Arvas et al., 2007, Cornell et al., 2007 and has been attributed to Y. lipolytica ORFs bearing more similarity to other fungi than to Saccharomycotina (Arvas et al., 2007).

Discussion
Our results show that the sub-phylum Pezizomycotina is characterized by the presence of a fusion of the a and b subunits of succinyl-CoA synthetase that can be used as a phylogenetic marker to determine if an organism belongs in Pezizomycotina. If a species has the fusion it can be reliably classified as Pezizomycotina, while the absence of the fusion is suggestive that the species is not a member of this subphylum.
Phylogenetic analyses show that the two subunits of succinyl-CoA synthetase duplicated at different times in fungal evolution and then became fused in an ancestor of Pezizomycotina. The gene for the a subunit of succinyl-CoA synthetase duplicated in an ancestor of the Neocallimastigomycota and Dikarya. One copy of the gene was then lost in all fungi with the exception of Pezizomycotina. The gene for the subunit b of succinyl-CoA synthetase most likely duplicated in an ancestor of the Ascomycota after the divergence of this group from Basidiomycota. One copy of the gene was then lost from Saccharomycotina and Taphrinomycotina, and kept in Pezizomycotina.
In Pezizomycotina, one copy of the a subunit fused to a copy of the b subunit, generating a fusion protein while retaining the unfused copies as well. As this fusion gene involves two subunits of the same enzyme it is possible that it provides a selective advantage by leading to greater catalytic activity or more efficient co-regulation of the expression of the two succinyl-CoA subunits.
Even though the presence of this fused gene could lead to a more efficient enzyme, either by greater catalytic activity or by more efficient co-regulation, Pezizomycotina genomes maintained two copies (one fused one unfused) of each subunit of succinyl-CoA synthetase. This might indicate that this enzyme is needed in large quantities in Pezizomycotina or, more interestingly, it is possible that one of the copies of succinyl-CoA synthetase in Pezizomycotina has acquired an as yet unreported new function, as suggested by Flipphi et al. (2009) for other duplicated genes in the primary carbon metabolism of Aspergillus species.