Relationship between mitochondrial gene rearrangements and stability of the origin of light strand replication

Mitochondrial gene rearrangements are much more frequent in vertebrates than initially thought. It has been suggested that the origin of light strand replication could have an important role in the process of gene rearrangements, but this hypothesis has never been tested before. We used amphibians to test the correlation between light-strand replication origin thermodynamic stability and the occurrence of gene rearrangements. The two variables were correlated in a non-phylogenetic approach, but when tested in a phylogenetically based comparative method the correlation was not significant, although species with unstable light-strand replication origins were much more likely to have undergone gene rearrangements. This indicates that within amphibians there are stable and unstable phylogenetic groups regarding mitochondrial gene order. The species analyzed showed variability in the thermodynamic stability of the secondary structure, in the length of its stem and loop, and several species did not present the 5’-GCCGG-3’ motif reported to be necessary for efficient mitochondrial DNA replication. Future studies should focus on the role of the light-strand replication origin in mitochondrial DNA replication and gene rearrangements mechanisms.


Introduction
Mitochondrial (mt) gene order was thought to be highly conserved within vertebrates based on the gene order of the first genomes sequenced (Anderson et al., 1981(Anderson et al., , 1982;;Bibb et al., 1981;Roe et al., 1985).With the increasing number of whole mitochondrial genomes now published it is clear that gene order stability within vertebrates is group specific -mammals and birds have relatively stable gene orders, while in amphibians, reptiles and fish rearrangements are more common (Pereira, 2000).Phylogenetic lineages with high levels of gene rearrangements also have higher levels of genetic variation at the nucleotide level (Xu et al., 2006).Thus for phylogenetic studies it is critical to know what could be leading to these rearrangements.
Several mechanisms have been described to explain gene order variation.The commonly invoked "tandem du-plication-random loss model" proposes that gene rearrangements are originated by duplication of a region of the genome followed by deletion of genes.The consequent pattern of gene loss will alter or restore the original gene arrangement (Moritz and Brown, 1986;Moritz et al., 1987;Pääbo et al., 1991;Macey et al., 1997Macey et al., , 1998;;Arndt and Smith, 1998;Boore, 2000;Inoue et al., 2003;Mauro et al., 2005;Mueller and Boore, 2005).Alternative mechanisms have been used to explain some gene rearrangements, such as inversion (Smith et al., 1989), transposition (Macey et al., 1997) and intramolecular recombination (Lunt and Hyman, 1997).However, there is little support for these alternatives, and thus the mitochondrial replication mechanism is implicated in the majority of gene rearrangements (GRr).
The most widely accepted model of mitochondrial DNA (mtDNA) replication, the strand-displacement replication model (Clayton, 1982), proposes that the two strands of the mtDNA molecule have different initiation structures, the heavy-strand replication origin (O H ) and the lightstrand replication origin (O L ).Despite the importance of these structures in mtDNA replication, many aspects of their role remain unknown.
Known O L regions form characteristic stem-loop structures (Clary and Wolstenholme, 1987) and are located in the WANCY transfer RNA (tRNA) genomic region between tRNA Asn and tRNA Cys genes in most vertebrates, although in some groups, such as birds, crocodilians or some reptiles, no obvious O L exist in this region (Pereira, 2000;Macey et al., 1997Macey et al., , 2000)).In general, most studies support the strand-displacement replication model and point to a possible involvement of the O L in processes of the mtDNA molecule, such as gene rearrangements (Macey et al., 1997;Mauro et al., 2005), mutation gradients (Faith and Pollock, 2003;Raina et al., 2005) and nucleotide asymmetric compositional bias (Asakawa et al., 1991;Perna and Kocher, 1995).Recent studies have shown that genic regions flanking replication origins are hot spots for gene rearrangements (Macey et al., 2005;Mauro et al., 2005).A link between the displacement of the O L and the occurrence of gene rearrangements was suggested since alternative initiation sites could give instability to the replication mechanism, resulting in such rearrangements (Macey et al., 1997).Additionally, it has been shown that some tRNA in primate mitochondria can potentially function as an alternative O L (Seligmann et al., 2006), which may reconcile much of the controversy regarding the strand-displacement model (Brown and Clayton, 2006).The thermodynamic stability of the O L has also been correlated with developmental stability in reptiles (Seligmann and Krishnan, 2006).An unstable O L could increase the number of mutations occurring during replication by delaying the formation of the replication fork and thus increasing the time that the DNA remains single stranded, known to be correlated with mutation rate (Tanaka and Ozawa, 1994) and, more specifically, with non-synonymous substitution patterns (Broughton and Reneau, 2006).
Groups of organisms with variable gene order are good models for studying factors associated with gene rearrangements (Dowton and Campbell, 2001).Among vertebrate groups, amphibians have a high occurrence of gene rearrangements making them an appropriate model to test hypotheses regarding a link between the thermodynamic stability of the O L and gene rearrangements (Mauro et al., 2005;Mueller and Boore, 2005;Fonseca et al., 2006).
In the study described in this paper we examined the thermodynamic stability of the O L region and investigated the possible association of such stability with the incidence of gene rearrangements, including partial genome duplications (PGD).

Methodology
As mentioned above, most vertebrates present a small DNA region between tRNA Asn and tRNA Cys genes that form a stem-and-loop structure if the DNA is singlestranded, which some experimental studies have shown to be essential for the initiation of mtDNA light-strand replication in mammals (Brennicke and Clayton,1981;Hixson et al., 1986).Since this structure is generally conserved among vertebrates it has been assumed to be functional across all taxa.We examined 73 species for which the complete mitochondrial genome was available from NCBI until the end of August 2006 (Table 1).All the amphibians examined had an identifiable stem-and-loop structure within the WANCY region and we assumed that it corresponded to the mammalian O L .

Fonseca and Harris 567
Table 1 -Amphibian taxa (n = 73), gene order, Thermodynamic stability values (ΔG), length of the L-strand replication origin (O L ) region (stem in base pairs (bp), loop in nucleotides (nu)) and the presence or absence of a 5'-GCCGG-3' motif in the O L region.The species were classified into three different groups: species with a genome length < 17 kb and a "typical" vertebrate gene order (VERT, state 0); species with a genome length > 17 kb and partial genome duplications (PGD, state 1); or species with gene rearrangements (GRr, state 2).Numbers in parentheses represent the number of internal mismatches (bp) for the stem or the number of nucleotide mismatches in the internal loop.A single-stranded DNA molecule can fold onto itself so that complementary base pairs bond in a similar way to that observed for RNA.The formation of this hairpin lowers the Gibbs free energy (ΔG) of the molecule (Zuker, 2000).To assess thermodynamic stability (measured in minimum free energy, ΔG), the O L sequences were folded using the DNA version of the Mfold program (Zuker, 2003) which calculates the maximum number of pairs of base pairs (bp) capable of forming a secondary structure and measures the free energy of that particular structure based on the sum of the thermodynamic parameters for several motifs including Watson-Crick base pairs, mismatches, hairpins and internal loops (SantaLucia and Hicks, 2004).
Vertebrate mtDNA has highly conserved features, which include gene content, lacking introns and having small, or no, intergenic spacers.However, the length of the vertebrate mtDNA molecule is not conserved and its variation is typically associated with duplications in the control region, varying from less than 100 bp up to 8.0 kbp (Moritz, 1991), which means that species with or without partial genome duplications cannot be clearly separated.Nevertheless, based on the observation of all vertebrate mitochondrial genomes available, we considered 17 kbp to be an adequate limit for distinguishing between normal and abnormal genome length, Moritz (1991) having adopted a similar approach.The species were therefore classified into three different groups (Fonseca et al., 2006) as follows: species with a genome length < 17 kbp and a "typical" vertebrate gene order (VERT, state 0); species with a genome length > 17 kbp and partial genome duplications (PGD, state 1); or species with gene rearrangements (GRr, state 2).
We first tested the correlation between O L thermodynamic stability and the occurrence of GRr and/or PGD, without taking into account phylogeny.The species were listed in decreasing order of ΔG and then we divided the list into two groups, one containing the 36 most stable ΔG values and the other the 36 least stable.Since the initial list had 73 values the 37 th value was excluded so that we could compare an equal number of values , i.e. the 36 most stable values against the 36 least stable values.We then used the non-parametric Wilcoxon signed rank test to ascertain if species with GRr and/or PGD were equally represented in the two groups.
For the phylogenetic analyses an estimated phylogeny for the amphibians analyzed was determined (Figure 1) based on a combination of published trees of comparative anatomical character evidence, mitochondrial and nuclear DNA sequences (Mauro et al., 2004;Macey, 2005;Mueller and Boore, 2005;Frost et al., 2006;Gissi et al., 2006;Zhang et al., 2006).All branch lengths were considered equal since the actual length of the branches does not usually substantially affect the results of phylogenetic analyses (Martins and Garland, 1991;Walton, 1993;Díaz-Uriarte and Garland, 1996;Irschick et al., 1996).
To test whether each trait was significantly associated with its phylogenetic history we conducted the test for serial independence (TFSI) described by von Neumann et al. (1941) on the continuously valued character (the ΔG of the O L ) and the runs test on the discretely valued character (the occurrence of GRr and/or PGD) (Sokal and Rohlf, 1995;Abouheif, 1999) using the phylogenetic independence program of Reeve and Abouheif (2003).Both TFSI and the runs test were conducted with 10000 random shuffles of the original data and 10000 random rotations of each shuffle.
The tests were assumed, a priori, to be one-tailed because there was good reason to predict that the direction of the 570 MtDNA replication and gene order stability Figure 1 -Phylogenetic relationships of amphibians and history of character evolution.The estimated phylogeny was based on a combination of published trees of comparative anatomical character evidence, mitochondrial and nuclear DNA sequences (Mauro et al., 2004;Macey, 2005;Mueller and Boore, 2005;Frost et al., 2006;Gissi et al., 2006;Zhang et al., 2006).The Parsimony reconstruction method was used for the history of characters evolution.The left tree shows the O L thermodynamic stability (ΔG) with decreasing O L stability being color-coded as follows: violet (most stable), blue, green, yellow and red (least stable).The right tree shows the gene rearrangements, species with gene rearrangements being shown in black, species with partial genome duplications and a genome length > 17 kb being shown in green, while species with "normal" vertebrate gene order are in white.
autocorrelation in the two traits would be positive (selfsimilarity due to phylogenetic descent).
When traits are significantly correlated to phylogeny a phylogenetically based comparative method (PCM) must be used to transform the data and to ensure that all of the historical non-independence in the data is accounted for.The PCM used was the independent contrasts (IC) technique of Felsenstein (1985).Using the IC technique we analyzed trait 2 (the occurrence of GRr and/or PGD) in a continuously-valued form because states 0, 1 and 2 represent an increasing instability of the mtDNA (state 0 -conserved gene order and genome length; state 1 -conserved gene order with partial genome duplication; state 2 -rearranged gene order, with or without partial genome duplication).Finally, we tested for a relationship between trait 1 and trait 2 using the Y contrast vs. X contrast (positivized) test in the PDAP:PDTREE program Version 1.07(1) (Garland et al., 1999;Garland and Ives, 2000;Midford et al., 2005) implemented in the Mesquite Modular System for Evolutionary Analysis (Maddison and Maddison, 2006).

Results and Discussion
The Wilcoxon signed rank test showed a strong correlation (p = 0.001) between O L ΔG (trait 1) and the occurrence of the GRr and/or PGD (trait 2), suggesting that species with unstable O L ΔG values were much more likely to have undergone GRr and/or PGD.Despite this correlation, and because species share parts of their evolutionary history, both traits cannot be considered independent data points (Felsenstein, 1985(Felsenstein, , 1988)), because of which we tested the phylogenetic independence of the two traits and found that they were significantly autocorrelated with phylogeny (p = 0.0001 for both traits).
However, after applying the IC technique the TFSI detected no significant phylogenetic autocorrelation of the traits (p = 0.474 for trait 1, p = 0.162 for trait 2), indicating that the traits were adequately standardized in terms of the branch lengths and phylogeny used.We then applied the Y contrast vs. X contrast (positivized) test to ascertain the relationship between trait 1 (O L ΔG) and trait 2 (occurrence of GRr and /or PGD) , but no significant correlation (p = 0.77) was detected between the traits and the null hypothesis was therefore accepted.Since the analysis supported correlation due to shared phylogenetic evolutionary history the correlation between O L ΔG plus GRr and PGD was no longer significant, although species with an unstable O L ΔG, were much more likely to have undergone gene rearrangements (Figure 1).
The amphibians investigated showed a high incidence of GRr and PGD (36 out of 73) when compared with other vertebrate groups such as birds and mammals.However, within amphibians there are stable and unstable groups and this is probably relatively stable through the phylogeny.
In our study, Gymnophiona order (caecilians) showed heterogeneity in the stability of O L ΔG, gene order and genome length.Although the genome rearrangements observed appeared to be independent, the hypothesis of inherited propensity to suffer GRr and/or PGD could not be rejected.More genome sequences within Gymnophiona are needed to clarify if all of this group, or part of it (e.g., Caeciliidae family), inherited genome instability or linked mechanisms that could cause mtDNA gene rerrangements or partial genome duplications.On the other hand, we found that the species of the superorder Batrachia (composed of two taxa: Caudata and Anura) presented heterogeneity in the stability of O L ΔG, gene order and genome length, but with a strong phylogenetic history association.Within Batrachia superorder, all anuran species used in this study presented genome instability in terms of GRr and PGD and had the lowest values of O L ΔG.In fact, in Anura, the two different states of mitochondrial instability (GRr and PGD) appeared to be linked with phylogenetic history, with Phthanobatrachia species (Figure 1, node 62) presenting several GRr and the remaining anuran species showing a partial genome duplication within the major non-coding region of the mtDNA.Furthermore, Phthanobatrachia presented a common (ancestral) gene rearrangement with some species presenting also additional gene rearrangements.It is possible that all Neobatrachia (composed of two clades: Heleophrynidae and Phthanobatrachia) have GRr, although more data would be needed to confirm this.If all Anura inherited genome instability then this characteristic has been present in this order more than 268 million years ago (Mya), from the late Carboniferous to early Permian (Zhang et al., 2006).Within the order Caudata (salamanders) some groups presented a stable O L ΔG and no GRr or PGD, e.g., Cryptobranchoidei (Figure 1, node 10) and Treptobranchia (Figure 1, node 27), but the group Plethodontidae plus Rhyacotriton variegatus (Figure 1, node 34) showed independent GRr and PGD (Mueller and Boore, 2005), suggesting that this group, within amphibians, is the most unstable in terms of mitochondrial gene order.Plethodontidae is a good candidate for studying GRr and PGD plus the associated mechanisms because of the variability and independence of the recorded GRr (Mueller and Boore, 2005).Additionally, we hypothesize that amphiumas salamanders may also show mitochondrial gene rearrangements since the Amphiumidae family is closely related to Plethodontidae and Rhyacotritonidae (Frost et al. 2006).The genus Andrias (Figure 1, node 11) showed lower values for O L ΔG stability than the sister family Hynobiidae (Figure 1, node 12) and also no GRr or PGD.Nevertheless, we suggest that further complete genome sequences within the Cryptobranchidae are required in order to clarify or identify new GRr or PGD.
The results of this analysis can therefore be summarized as follow: (1) Gene rearrangements are relatively common in amphibians, and particularly in Anura, Pletho- Fonseca and Harris 571 dontid salamanders and Gymnophiona (Figure 1).( 2) All species investigated by us had a recognizable O L structure (Table 1) and its thermodynamic stability varied from ΔG = -15.47(the most stable) to ΔG = -8.69(the least stable).This further supports the strand displacement model of mtDNA replication (the only model assuming the existence of the O L ), since it is unparsimonious that all these species would have this peculiar structure located in the same place and with the same capability to form a hairpin if it did not have any functionality.(3) There was a correlation between stable O L ΔG and gene rearrangements, but this may reflect phylogeny.(4) The length of the O L structure varied between 10 to 16 base pairs for the stem and 3 to 15 base pairs for the loop.( 5) The 5'-GCCGG-3' motif, located just downstream from the O L , was present in 37 of the 73 genome sequences.This motif has been described as being needed for efficient mtDNA replication in mammals (Hixson et al., 1986) and, due to its presence in other vertebrate groups, it was expected to be an essential conserved sequence.Nonetheless, in our study the species that did not present exactly the 5'-GCCGG-3' motif showed similar sequences at the same location.
Future studies should focus on sequencing the O L and investigating its thermodynamic stability (ΔG) as an indicator of probable gene order when the sequence of the whole genome is unknown, in much the same way as base composition at redundant sites can be used (Fonseca et al., 2006).Further research is needed to clarify what is causing the majority of gene rearrangements (GRr) and whether O L instability or some other mechanism is responsible for the molecule instability.While the O L is typically found in the WANCY region, in some invertebrates it appears within protein coding genes (Mizi et al., 2005).Within many groups, such as birds, the position of the O L is unknown.Under the strand displacement model of replication, both expected levels of mutation rates and base compositions vary relatively to the position of the O L .Further research is needed to determine the structure and position of the O L in both vertebrates and invertebrates, as well as into other genome regions that could act as alternative O L .It has been described for arthropods that the substitutions rates vary considerably between species, and that species with a highly rearranged genome have a significantly higher rate of sequence evolution (Xu et al., 2006).It would be useful to do the same analysis for vertebrates and to test if rearranged genomes do also have higher substitutions rates.