An enhanced calibration of a recently released megatree for the analysis of phylogenetic diversity

Dated or calibrated phylogenetic trees, in which branch lengths correspond to evolutionary divergence times between nodes, are important requirements for computing measures of phylogenetic diversity or phylogenetic community structure. The increasing knowledge about the diversification and evolutionary divergence times of vascular plants requires a revision of the age estimates used for the calibration of phylogenetic trees by the bladj algorithm of the Phylocom 4.2 package. Comparing the recently released megatree R20120829.new with two calibrated vascular plant phylogenies provided in the literature, we found 242 corresponding nodes. We modified the megatree (R20120829mod. new), inserting names for all corresponding nodes. Furthermore, we provide files containing age estimates from both sources for the updated calibration of R20120829mod.new. Applying these files consistently in analyses of phylogenetic community structure or diversity serves to avoid erroneous measures and ecological misinterpretation.


Introduction
Dated or calibrated phylogenetic trees, in which branch lengths correspond to evolutionary divergence times, are important requirements for computing measures of phylogenetic diversity (Faith, 1992) or phylogenetic community structure (Webb et al., 2002;Gastauer and Meira-Neto, 2014).The importance of correct calibration has been highlighted to avoid ecological misinterpretation, i.e., the outcome of a correctly calibrated tree indicates a phylogenetic community structure different from the outcome of a wrongly calibrated tree (Gastauer and Meira-Neto, 2013), but achieving an updated calibration is still not an easy task.Furthermore, constantly increasing knowledge about the phylogenetic relationships among vascular plants, especially angiosperms (Smith et al., 2011), and about the diversification times of different clades (i.e., Magallón and Castillo, 2009;Lemaire et al., 2011;Magallón et al., 2013) require regular revisions of the database used to achieve updated tree calibrations.
The rapid increase in information results in the periodic release of new, updated megatrees, i.e., phylogenetic hypotheses containing all euphyllophyte families (GitHub, 2014).For phylogenetic community analyses or the computation of phylogenetic diversity measures, these megatrees may be pruned down to a user-supplied list of species by the phylomatic function of the Phylocom 4.2 package (Webb and Donoghue, 2005).By this procedure, all species are inserted as terminals, and branches of the megatree without terminals are cut.Internal node names are maintained.For the calibration of such community trees, the bladj algorithm from the Phylocom package is run (Webb et al., 2008).This algorithm recognizes the names of internal nodes and dates them according to clade dating information provided in a separate ages file.Not matched internal nodes are smoothed by mean dates between fixed divergence times.
The Phylocom package includes the wikstrom.agesfile, which provides the minimum age estimates for 176 internal nodes within the angiosperms proposed by Wikström et al. (2001).Revised age estimates are available (e.g.Bell et al., 2010), but still not incorporated within the Phylocom package.
Therefore, the aim of this note is to outline how community trees derived from the megatree R20120829.new, recently released for phylogenetic community analysis and the computation of phylogenetic diversity among vascular plants, are calibrated in the most updated way.
For that, we identified and provided recent age estimates from literature of a maximum number of internal nodes that are distributed over the entire megatree.

Material and Methods
We compared the topology of R20120829.newwith two comprehensive revisions about vascular plant diversification (Hedges and Kumar, 2009;Bell et al., 2010).Corresponding nodes, i.e., nodes with the same descending clades or taxa, were identified.If not already named by the authors of R20120829.new,these corresponding nodes were named within the plain text archive of the megatree.
Age estimates from both revisions were pooled to provide updated ages files for tree calibration.In several cases, two or three subsequent nodes within the same lineage were estimated to identical ages by literature.In these cases, the bladj algorithm calibrates the node that appears first in the ages file and ignores age estimates for the other node(s), smoothing by mean dates between fixed divergence times.This causes distortions of the tree altering patterns of phylogenetic diversity, because the subsequent node(s) is/are always fixed to more recent divergence times than hypothetically assumed by literature.To avoid that, we corrected the age estimates of the most recent node of the sequence by -0.1 Myr, while in case of three subsequent nodes with identical age estimations, the oldest one is corrected by +0.1 Myr.Due to this procedure, the topology of R20120829.new is conserved.
To illustrate advances in tree calibration, age estimates of these corresponding nodes were used to calibrate a hypothetic community phylogeny that contains two species from each monophyletic family from R20120829.new.Resulting trees are visualized with FigTree v1.4.2.

Results and Discussion
Bell et al. ( 2010) revised Wikstrom's age estimates by dating an angiosperm phylogeny using a relaxed molecular clock calibrated by 36 fossils considered as minimum ages of the most recent common ancestor.They used two slightly different approaches to estimate node ages, one in which they included these minimum age constraints from fossil data into an exponential distribution (BEAST a analysis in the paper's terminology) and one in which they used a lognormal distribution (BEAST b ), two different ages files are provided.This results in two sets of calibration points labeled as ages_exp and ages_logn available from the authors in the Supplementary Material*; either of them might be used to arrive at a revised calibration of the angiosperm clade in a user-specific community tree, given the user is aware of the possible shortcomings and pitfalls of both types of approaches.Furthermore, age estimates for gymnosperms and ferns, among other clades, have been compiled by Hedges and Kumar (2009).
We identified 242 corresponding nodes (Table 1).Of these nodes, 26 were identified by comparing gymnosperm and fern clades from Hedges and Kumar (2009) with R20120829.new.The other nodes represented correspondences between Bell et al. (2010) and R20120829.new.Node names lacking in the plain text archive of the megatree were added.This modified R20120829mod.new is available in the Supplementary Material*.
Age estimates of all 242 nodes are available as ages files from the authors (Table 1).Four groups of subsequent nodes within the same lineages were estimated to have identical ages by Bell et al. (2010).As shown in Table 2, we corrected the age estimates by + or -0.1 Myr to avoid distortion of the community tree.
Calibrating the modified version R20120829mod.newwith provided ages files produces a phylogeny that differs substantially from the original file (R20120829.new)calibrated with the age estimates from Wikström et al. (2001), especially regarding gymnosperms and ferns (Figures 1 and 2), because these clades were previously not covered by the wikstrom.agesfile.Important to note is that difference in trees results from varying divergence times, while tree topology is maintained.Such differences indicate that phylogenetic community analyses can generate relatively different results depending on the chosen databases, which can lead to ecological misinterpretation if based on incorrect data (Gastauer and Meira-Neto, 2013).* Supplementary Material: http://www.leep.ufv.br/pt-BR/noticia/pesquisadores-da-floresta-escola-lancam-artigocientifico-sobre-computacao-da-diversidade-filogenetica

Figure 1 .
Figure 1.Hypothetical trees of a community composed of two species from all monophyletic families from APG III (APG III, 2009) that were inserted in the original megatree R20120829.newthat was calibrated by agesclIII (see Gastauer and Meira-Neto, 2013 for details, upper tree) and R20120829mod.newcalibrated by ages_exp (lower tree).

Figure 2 .
Figure 2. Hypothetical community trees containing two species of all monophyletic families from APG III (APG III, 2009) and resulting from insertion in the original megatree R20120829.newthat was calibrated by agesclIII (see Gastauer and Meira-Neto, 2013 for details, upper tree) and R20120829mod.newcalibrated by ages_logn (lower tree).
1 Exponential distribution of minimum age constraints from fossil data. 2 Lognormal distribution. 1 Exponential distribution of minimum age constraints from fossil data. 2 Lognormal distribution.
1 Exponential distribution of minimum age constraints from fossil data. 2 Lognormal distribution.
1 Exponential distribution of minimum age constraints from fossil data. 2 Lognormal distribution.
1 Exponential distribution of minimum age constraints from fossil data. 2 Lognormal distribution.

Table 2 .
Subsequent nodes in R20120829mod.newwith identical age estimates according to Bell et al. (2010) exponential distribution (BEAST a ) or lognormal distribution (BEAST b ) of minimum age constraints plus corrected diversification time in ages_exp and ages_logn (available from Supplementary Material*).