ANNONACEAE SUBSTITUTION RATES – A CODON MODEL PERSPECTIVE

The Annonaceae includes cultivated species of economic interest and represents an important source of information for better understanding the evolution of tropical rainforests. In phylogenetic analyses of DNA sequence data that are used to address evolutionary questions, it is imperative to use appropriate statistical models. Annonaceae are cases in point: Two sister clades, the subfamilies Annonoideae and Malmeoideae, contain the majority of Annonaceae species diversity. The Annonoideae generally show a greater degree of sequence divergence compared to the Malmeoideae, resulting in stark differences in branch lengths in phylogenetic trees. Uncertainty in how to interpret and analyse these differences has led to inconsistent results when estimating the ages of clades in Annonaceae using molecular dating techniques. We ask whether these differences may be attributed to inappropriate modelling assumptions in the phylogenetic analyses. Specifically, we test for (clade-specific) differences in rates of non-synonymous and synonymous substitutions. A high ratio of nonsynonymous to synonymous substitutions may lead to similarity of DNA sequences due to convergence instead of common ancestry, and as a result confound phylogenetic analyses. We use a dataset of three chloroplast genes (rbcL, matK, ndhF) for 129 species representative of the family. We find that differences in branch lengths between major clades are not attributable to different rates of nonsynonymous and synonymous substitutions. The differences in evolutionary rate between the major clades of Annonaceae pose a challenge for current molecular dating techniques that should be seen as a warning for the interpretation of such results in other organisms.


INTRODUCTION
A variety of properties of Annonaceae that are useful to humans have been acknowledged for centuries.Edible fruits, such as cherimoya (Annona cherimola Mill.) have been transported in pre-Columbian times across tropical America (POZORSKI and POZORSKI, 1997).Later, in colonial times tropical fruits were carried all across the globe.By the 17 th century, guanabana (Annonamuricata L.) was already well established as a crop in the Old World tropics(e.g.ALGRA and ALGRA, 1978).In the 19 th an 20 th century, an era of many natural history explorations in the tropics, scientist documented the use of Annonaceae and showed the large extent to which species of this family are used on a local scale as food, medicine, construction material (e.g.BAILLON, 1868;FRIES, 1959).
These properties are still employed today, and research to document the way people exploit plant species is still advancing (e.g.VAN ANDEL, MYREN and VAN ONSELEN, 2012).Likewise, in the best tradition of the naturalists of the 19 th and 20 th century, the numerous observations on edible, medicinal and utilitarian Annonaceae continue to be reported on labels of herbarium sheets or in revisions and monographs (e.g.COUVREUR, 2009;MAAS and WESTRA, 1992).Many applied studies have focused on the cultivation of Annonas species and hybrids (e.g.GEORGE and NISSEN, 1987), providing a wealth of observations on patterns of variation, crossability of species, and selection of traits .
Insofar as these studies dealt with historical aspects, time spans involved were not that long, i.e. centuries.The history of cultivated Annonaceae over evolutionary timescales could nevertheless provide valuable and practical information.With the availability of DNA sequence data and the analytical toolbox offered by phylogenetics and population genetics, it has become feasible to explicitly test hypotheses on the evolutionary origin of the cultivated species of Annonaceae, on the origin of phenotypic traits, on phylogenetic relatedness of congeneric species, and on centres of diversity and areas of origin of specific clades.The early 21 st century saw the first phylogenetic analyses of DNA sequence data, providing a test of previous classifications of the Annonaceae (DOYLE, BYGRAVE and LE THOMAS, 2000).Recently, the classification of the Annonaceae has been revised based on phylogenetic analyses of chloroplast DNA sequence data representing almost all genera (CHATROU et al., 2012).This provides a general framework that allows us to zoom in on any clade of choice, such as those including cultivated species.Phylogenetic analyses have focused on species relationships and the geographic origin of ylang-ylang (Cananga odorata (Lam.)Hook.f.& Thomson) and close relatives (SURVESWARAN et al., 2010).The phylogenetic tree of the Annona muricata L. and 21 congeners (including species of the former genus Rollinia; RAINER, 2007) has been inferred based on cpDNA and SSR flanking region sequences (CHATROU et al., 2009).Annona cherimola was not included in the latter analyses, which due to their limited sampling provide at best a coarse outline of the phylogenetic relationships in Annona.
The genus Annona belongs to the subfamily Annonoideae, which has also been referred to as the 'long branch clade ' (PIRIE et al., 2005;RICHARDSON et al., 2004).The Annonoideae represents ca 60 % of the species diversity of the family, and are sister to subfamily Malmeoideae (the 'short branch clade') that contains ca 35% of the species diversity.Monophyly of both clades is well supported by maximum parsimony, maximum likelihood and Bayesian analyses (CHATROU et al., 2012).A curious phenomenon apparent from these results is that the average branch length, representing the degree of chloroplast DNA divergence, from the long branch crown node to tips is longer than that of the short branch clade (hence the informal names for these clades).There is no clear explanation for the differences in branch lengths.If they are interpreted at face value as representing real differences in molecular evolutionary rate, it is unclear in what manner (gradual or abrupt) and at what point subsequent to the divergence of their last common ancestor (immediately or more recently) such rate shift might have taken place (PIRIE and DOYLE, 2012).Uncertainty on this point has led to inconsistent age estimates derived from dating analyses using methods that make different assumptions about patterns of rate variation (COUVREUR et al., 2011;PIRIE and DOYLE, 2012;RICHARDSON et al., 2004).Molecular dating analyses assuming autocorrelation of rates among lineages resulted in roughly equal ages for the crown nodes of short and long branch clades (66 Ma and 68 Ma respectively) (PIRIE and DOYLE, 2012).However, applying relaxed clock models instead assuming a lognormal distribution of rates, the estimates differed considerably (33 Ma and 66 Ma respectively) (COUVREUR et al., 2011).This discrepancy has implications for the inference of ANNONACEAE SUBSTITUTION RATES... how the pan-tropical distribution of Annonaceae originated, as well as representing a potential problem with this important tool -the time-calibrated phylogenetic tree -for evolutionary inference in general.
One aspect of the skewed branch length distribution that needs further scrutiny is whether the differences between the two subfamilies may be attributed to inappropriate modelling assumptions in the phylogenetic analyses.The differences in branch lengths, combined with the differences in clade sizes, might suggest a higher chloroplast DNA nucleotide substitution rate for the long branch clade.This phenomenon would be of particular interest as it has been demonstrated to be correlated to species richness in Proteaceae (DUCHENE and BROMHAM, 2013).Differences in substitution rate could be the result of overall increased rate, such as resulting from shorter generation time or inherited differences in fidelity of DNA replication (DRUMMOND et al., 2006), both of which are not obviously the case in Annonaceae.Alternatively, the occurrence of biparental inheritance of plastids would result in a doubled effective population size N e (as compared with uniparental transmission) and hence could affect substitution rates (CROSBY and SMITH, 2012;DUCHENE and BROMHAM, 2013).Such an explanation would however be highly speculative: biparental inheritance of plastidsis not common in angiosperms and we have no evidence for the phenomenon in Annonaceae.
It is important to understand how branch lengths are modelled in our phylogenetic trees.In contrast with nucleotide models, codon models can be used for the detection of positive selection, or the assessment of silent and non-silent substitution rate shifts (e.g.YANG and NIELSEN, 2000).Where such phenomena are apparent, they can provide more accurate branch length estimation, expressed as substitutions per codon.For instance, extensive variation in silent substitution rates in plant mitochondrial DNA was inferred using codon models (BAKKER, BREMAN and MERCKX, 2006;MOWER et al., 2007), and in some cases it was found that the rate-variation was due entirely to silent substitutions.When inferring correlations between molecular evolutionary parameters and gross morphological trends, Lartillot and Poujol, (2011) made extensive use of codon models, as nucleotide models would have obscured several of the patterns the authors describe.Estimation of branch lengths on a phylogenetic tree becomes critical when using the estimates for subsequent dating analysis, for instance involving some fossil calibration.Especially when dealing with large branch length differences on a phylogenetic tree, obtaining a good posterior distribution on rates can become flawed (DORNBURG et al., 2012;WERTHEIM, FOURMENT and KOSAKOVSKY POND, 2012), and are difficult to handle using current clock models.
Here we present new explorative analyses of chloroplast sequences of Annonaceae using codon models.We revisit published data and assessed whether that way more even branch length distributions between long and short branch clades can be obtained.

MATERIALS AND METHODS
A 129 sequence matrix comprising the rbcL, matK and ndhF genes, representing all main Annonaceae clades, was compiled from CHATROU et al., (2012), with some additional sequences that have been published in PIRIE et al., (2005), PIRIE et al., (2006), and WANG et al., (2012).The dataset included 66 species from the long branch clade (subfamily Annonoideae), 55 species from the short branch clade (Malmeoideae), 7 species from the basal grade of Annonaceae (subfamilies Ambavioideae and Anaxagoreoideae).Eupomatia bennetti F.Muell.(Eupomatiaceae) was assigned as outgroup.Sequences for all species were complete, i.e. there were no missing data.The dataset was subjected to RAxML (STAMATAKIS, 2006) analysis using the GTR-Γ/CAT model at the CIPRES Science Gateway (MILLER, PFEIFFER and SCHWARTZ, 2010), with a total of 100 bootstrap replicates performed.The dataset was not partitioned.The resulting 'most likely' phylogenetic tree was then used as input for subsequent codon modelling in PAML v. 4.5 (YANG, 2007), which was also used to produce d N and d S trees.The APE library (PARADIS, CLAUDE and STRIMMER, 2004) was applied in order to tabulate and compare branch lengths in this tree, in order to quantify overall branch length differences.

RESULTS AND DISCUSSION
RAxML analysis of the 129 species data matrix resulted in the tree that is given in Fig. 1.The Malmeoideae (mean branch length = 0.00241) and Annonoideae (mean branch length = 0.00724) clades appear to have a significantly different distribution of branch lengths (p < 0.0001; based on Student's t-test; data not shown).The phylogenetic matrix comprised 106 codons, representing all three genes and combined with the RAxML tree topology, the L. W. CHATROU et al.
F3x4 codon frequency model was applied in PAML under various branch and site model combinations (Table 1).First we assessed whether clock differences across our tree can be inferred using branch lengths from codon models.Using M=0 (all branches assumed to have the same d n /d s ratio ω) the LnL of observing the codon data was calculated both with and without incorporating a (strict) global clock parameter.Perhaps surprisingly, 'switching on' a global clock improved the LnL significantly, indicating that under a codon model, no substantial rate differences can be observed across the tree used.Subsequently, the analysis was repeated with the Malmeoideae clade being allowed to have its own rate ('foreground rate').This resulted in worse model fit, indicating that, given a codon model, a separate model for Malmeoideae is unsupported by the data.With regards silent versus non-silent substitution rate differences we then set-up an analysis allowing ω to vary between foreground and background clades (M=2).This resulted in slightly improved LnLs, indicating that there could be a difference in d n /d s ratio between Malmeoideae and the other main clades.However, upon visual inspection of the separate d n and d s trees (Fig. 2) there does not appear to be such a difference: the disparity in branch lengths is still apparent.It appears that the d n and d s change approximately in parallel across the tree.We note that PAML does not allow modelling both a clock and different ω per branch simultaneously.Therefore, we cannot conclude whether this would have improved the fit to the data further.Extending the PAML analyses to include additional foreground branches (i.e.Ambavioideae, Greenwayodendron) did not alter this pattern, indicating that the Malmeoideae can be considered not to differ significantly from the other clades, from the perspective of codon modelled branch lengths.
The d n and d s trees that we found mirror the branch length patterns that have been found in Annonaceae phylogenetics thus far, regardless of the phylogenetic inference method (CHATROU et al., 2012;COUVREUR et al., 2011;RICHARDSON et al., 2004).It allows eliminating a possible source of error in molecular dating analyses of the family.There are no underlying differences in synonymous and non-synonymous rates between the two major clades of the Annonaceae that could have caused the erroneous modelling of substitution rates in previous analyses.Strong convergence of chloroplast DNA sequences due to positive selection (as found in reptile mitochondrial DNA, for example; CASTOE et al., 2009), mimicking similarity by shared ancestry, can be ruled out as a possible confounding factor in Annonaceae phylogenetics.
One unusual result of the RAxML analyses is the sister group relationship between the two subfamilies Annonoideae and Ambavioideae (Figure 1).The node connecting these two clades receives 94% bootstrap support.This contradicts previous, consistent, results of maximum parsimony, maximum likelihood and Bayesian analyses of various datasets, including those of much larger datasets containing more species as well as more sequence data (CHATROU et al., 2012;COUVREUR et al., 2011).These inferred a more basal position of Ambavioideae, as sister to a clade comprising Annonoideae and Malmeoideae.Various explanations are possible for the aberrant position of the Ambavioideae found in the RaxML analyses we present here.One might relate it to the reduced taxon sampling employed, both overall and in the exclusion of particular key taxa.Meiocarpidium lepidotum is a taxon whose absence (due to the unavailability of an ndhF sequence) in these analyses could conceivably be important.Meiocarpidium, a monotypic genus from Africa, is sister to the other species of the Ambavioideae, and its presence would have broken up the relatively long branch subtending this subfamily.The sampling of other, more derived, species of Ambavioideae is reduced too compared to Chatrou et al. (2012).However, the Malmeoideae/Annonoideae sister group relationship has been inferred consistently in previous analyses (e.g.COUVREUR et al., 2008;PIRIE et al., 2006;RICHARDSON et al., 2004), irrespective of whether Meiocarpidium was included.The deviant phylogenetic position of the Ambavioideae could alternatively be explained by the particular reduced character sampling that was employed compared to more extensive phylogenetic analyses of the family -i.e., our focus on coding genes only, to the exclusion of more variable non-coding spacer and intron sequences.Signs of the impact of this character exclusion are low support values for nodes that previously were well supported, such as the crown node of the clade containing the tribes Malmeeae, Maasieae, Monocarpieae and Miliuseae, and the crown node of the Miliuseae.However, simple reduction in the number of characters seems an inadequate explanation for the high support for the Annonoideae/Ambavioideae sister-group relationship inferred here.This result may therefore point rather to systematic error associated with the use of coding data with an inappropriate (likely overly simple) substitution model, which in turn might imply that codon or partitioned substitution models may be more appropriate both in Annonaceae ANNONACEAE SUBSTITUTION RATES... and in the many studies that use (more or less exclusively) coding data, particularly rbcL (also employed here) because it is widely available and easily aligned.
Having ruled out the possible effect of differences in non-synonymous and synonymous rates on branch length differences in Annonaceae, the question remains what possible mechanisms can cause the disparity, especially between Malmeoideae and Annonoideae.The answer to that question would have important repercussions, for example for the modelling of future molecular dating analyses.Firstly, it should be noted that we, and all studies on phylogenetics and molecular dating of Annonaceae so far, have analysed chloroplast DNA sequences.It would be important to investigate Annonaceae phylogenetics from a nuclear perspective, to establish whether the lineage-specific effects that we have observed so far would be upheld.Smith and Donoghue (2008), in the most comprehensive study of angiosperm substitution rates so far, found lineage-specific differences by analysing nuclear, mitochondrial and chloroplast data simultaneously.Though not mentioned explicitly, this concatenated approach reflects the notion that lineage-specific differences in substitution rates are manifest in all three plant genomes.In the absence of data from other genomes, this can be considered the standing hypothesis for Annonaceae; it is however one that should be tested.
Correlations have been suggested between substitution rates and generation time (GAUT et al., 2011, and references therein), and height of plants (LANFEAR et al., 2013).None of these mechanisms seem to be applicable to Annonaceae, however. Barraclough and Savolainen, (2001) and some additional studies reviewed in Gaut et al., (2011), demonstrated the correlation between diversification rate and molecular substitution rate.In Annonaceae, Erkens, Chatrou and Couvreur (2012) found an increase in diversification rate in a large derived clade within the Annonoideae, and a decrease in the Miliuseae, nested in the Malmeoideae, but the relationship was weak.We expect that further analyses of phylogenetic patterns in Annonaceae, based on a more intensive taxon and character sampling and on further scrutinizing of model assumptions, will allow us to disentangle the effects of substitution rates and time on branch lengths, and will start to unveil the mechanisms behind the branch lengths differences.In the absence of direct fossil evidence this may be our best hope of discerning between relaxed clock models and inferring accurately the ages of clades within Annonoideae and particularly Malmeoideae.

CONCLUSIONS
We demonstrated that the two major clades of Annonaceae, the Annonoideae and the Malmeoideae, have significantly different branch length distributions.This difference, however, is not attributable to different rates of non-synonymous and synonymous substitutions.The differences in evolutionary rate between the major clades of Annonaceae pose a challenge for current molecular dating techniques, leading to conflicting results depending on the underlying assumptions of the methods.This issue is obvious in Annonaceae, but should also be seen as a warning for the interpretation of such results in other organisms.

FIGURE 2 -
FIGURE 2-RAxML topology with PAML codon branch lengths (substitutions per codon), for nonsynonymous (d n ) and synonymous substitutions (d s ) separately.Both trees have the same scale, i.e. branch lengths are proportional to the same substitutions / site ratio; ω Malm = 0.33 ω Rest = 0.36.

TABLE 1 -
Summary of the results of the PAML analyses.Malm = Malmeoideae, Amb = Ambavioideae, Gre = Greenwayodendron.* Relative rate, in substitutions per codon; background rates are set to 1.The best LnL value is indicated in bold.
FIGURE 1-Phylogram of the 129-species data set obtained from RAxML analysis.Black-filled circles indicate bootstrap support ≥ 95%.Grey-filled circles indicate bootstrap support between 85% and 94%.Bootstrap support values lower than 85% are not indicated.