A comparison of metrics for estimating phylogenetic signal under alternative evolutionary models

Several metrics have been developed for estimating phylogenetic signal in comparative data. These may be important both in guiding future studies on correlated evolution and for inferring broad-scale evolutionary and ecological processes (e.g., phylogenetic niche conservatism). Notwithstanding, the validity of some of these metrics is under debate, especially after the development of more sophisticated model-based approaches that estimate departure from particular evolutionary models (i.e., Brownian motion). Here, two of these model-based metrics (Blomberg’s K-statistics and Pagel’s λ) are compared with three statistical approaches [Moran’s I autocorrelation coefficient, coefficients of determination from the autoregressive method (ARM), and phylogenetic eigenvector regression (PVR)]. Based on simulations of a trait evolving under Brownian motion for a phylogeny with 209 species, we showed that all metrics are strongly, although non-linearly, correlated to each other. Our analyses revealed that statistical approaches provide valid results and may be still particularly useful when detailed phylogenies are unavailable or when trait variation among species is difficult to describe by more standard Brownian or O-U evolutionary models.


Introduction
Phylogenetic signal was defined by Blomberg and Garland (2002) "...as a tendency for related species to resemble each other more than they resemble species drawn at random from a tree". The lack of statistical independence among species implied by a phylogenetic signal precludes the use of traditional statistical tests in studies of correlated evolution (Felsenstein, 1985). However, it is important to note that the lack of independence should first be tested, and not simply assumed a priori, before using phylogenetically-based statistical methods (Gittleman and Kot, 1990). Thus, one initial motivation for estimating phylogenetic signal was to establish which (in any) correction must be made to take the phylogenetic relationships among species into account (see Martins and Garland 1991;Martins et al., 2002).
In addition to this methodological issue, there has been a growing interest on how phylogenetic signal can be used to infer broad-scale evolutionary and ecological processes (Martins, 2000;Diniz-Filho, 2001;Cooper et al. 2010;Hof et al., 2010; but also see Revell et al., 2008;Losos, 2008). For instance, when a trait is positively autocorrelated across the phylogeny, the most common explanation involves inheritance from a common ancestor or phylogenetic niche conservatism. Negative phylogenetic autocorrelations (i.e., when close relatives are more different in a given trait than randomly chosen pairs of taxa), although not so common, may arise due to recent events of evolutionary divergence induced by character displacement.
Currently, there are several ways to quantify phylogenetic signal in comparative data (see Blomberg and Garland, 2002). The earliest techniques were based on statistical methods (e.g., phylogenetic autocorrelation coefficients, phylogenetic correlograms and autoregressive models), that quantify the level of phylogenetic autocorrelation for a given trait of interest throughout the phylogeny (see Cheverud et al., 1985;Gittleman and Kot, 1990;Diniz-Filho et al., 1998), or under non-stationary processes . When more detailed and accurate phylogenies are available, it is also possible to ascertain the expected divergence between species by assuming a theoretical model of trait evolution, and thus derive modelbased metrics to compare expected and observed diver-gences (e.g., Pagel, 1999;Freckleton et al., 2002;Blomberg et al., 2003). In this case, Brownian motion is the most commonly used reference model for inferring changes in quantitative traits. Under Brownian motion, phenotypic divergence (covariance) among species increases linearly with time (Felsenstein, 1985(Felsenstein, , 1988Hansen and Martins, 1996), and this can be interpreted as resulting from a pure neutral evolutionary model or from rapid and independent responses of species traits to randomly changing environments (e.g., see Hansen et al., 2008).
In spite of the debate on the relative merits of statistical and model-based approaches (e.g., Martins et al., 2002), they usually produce similar results when evaluating evolutionary correlations between traits (see Martins et al., 2002;Diniz-Filho and Tôrres, 2002). Thus, it is likely that these approaches would also provide similar results when estimating phylogenetic signal (for a theoretical evaluation of the metrics underlying phylogenetic autocorrelation analysis see Pavoine et al., 2007;Diniz-Filho et al., 2012a). Here we use simulated data and show that, for a constant phylogeny, all metrics are comparable and provide similar results, despite their different statistical and conceptual backgrounds. Focus was not placed on the statistical performance of well-established methods (Martins 1996;Diniz-Filho et al., 2012a), but rather on the correlations among estimates across evolutionary models.

Simulations
We performed simulations for a trait with zero mean and unity variance evolving under Brownian motion. We also generated simulations for Ornstein-Uhlenbeck (O-U) processes with a-parameter equal to 2, 4, 6, 8 and 10, which tend to progressively eliminate phylogenetic signal (see Diniz-Filho, 2001). For each parameter, 200 simulations were performed using the routine PDSIMUL of "Phenotypic Diversity Analysis Program" (PDAP) (Garland et al., 1993).
In all simulations, the phylogenetic relationships among the 209 species were based on the terrestrial Carnivora supertree (Bininda-Emonds et al., 2008). This phylogeny was used only as a reference for Brownianmotion evolution and to represent a real topology and observed distribution of branch lengths. The phylogenetic distances among species were expressed as patristic distances, forming the matrix D, or conversely as phylogenetic covariances (matrix C), the proportion of shared branch length from root to tip between pairs of species (in this case with an ultrametric tree, the main diagonal was set to 1.0, so that this C matrix can also be viewed as a pairwise matrix of phylogenetic correlations between species -see Helmus et al., 2007). This C matrix is the expectation correlation among phenotypes under Brownian motion (Hansen and Martins, 1996).

Metrics for phylogenetic signal
Five methods for measuring phylogenetic signal were used for each simulation, which can be broadly divided into "statistical" and "model-based" approaches. A detailed description of these metrics can be found elsewhere (Cheverud et al., 1985;Kot, 1990, Diniz-Filho et al. 1998;Pagel, 1999;Freckleton et al., 2002;Blomberg et al., 2003). First, for statistical-based metrics, we used the Moran's I autocorrelation coefficient, which is given by where n is the number of species, y i and y j are the trait values in a vector Y for species i and j (i.e., in this case the values of species is simulated under a given process -see above), with average y, and w ij are the elements of the weighting matrix W with the pairwise phylogenetic relationship between species. Here the matrix W was replaced by the phylogenetic covariance (matrix C). We also estimated Moran's I in the first class of a phylogenetic correlogram, by using a binary weighting matrix W, with values of one indicating pairs of species that are separated by less than 12 million years (about 0.8 of phylogenetic correlation) (see Diniz-Filho, 2001; see also  for details of the phylogeny). The second metric was given by the coefficient of determination (R 2 ) estimated by the Cheverud et al. (1985) autoregressive model (ARM), which can be written as where r is an autoregressive coefficient estimated by maximum likelihood, W is the weighting matrix (as defined above, and equals to covariance among species V here in this application), and e are the model residuals.
The third metric was based on the coefficient of determination (R 2 ) derived from a multiple regression model called Phylogenetic eigenVector Regression (PVR; Diniz-Filho et al., 1998;see Desdevises et al., 2003;Kuhn et al., 2009;Staggemeier et al., 2010;Diniz-Filho et al., 2011 andSafi andPettorelli, 2010 for recent applications and expansions). The PVR model is given by Y = Xb + e, where b is a vector with regression coefficients, X is a matrix with k (k < n-1) selected eigenvectors extracted from the phylogenetic distance matrix (D), and e are the model residuals as before. The eigenvectors represent the patterns of phylogenetic relationships among the species at distinct hierarchical levels throughout the phylogeny, and PVR is part of a group of techniques that have been widely used to take autocorrelation into account in spatial, phylogenetic and temporal data (see Peres-Neto, 2006; Peres-Neto and 674 Diniz-Filho et al. Legendre, 2010). For PVR, we used k = 90 eigenvectors which explained 95% of the variability in the phylogenetic distance matrix (D) because of computational constraints. Diniz-Filho et al. (2012a) showed that, under Brownian motion, the amount of variation explained by PVR (the coefficient of determination R 2 ) must be correlated to the amount of variation explained by the eigenvectors. Because here we are using the same set of eigenvectors for all comparisons, the R 2 can be directly comparable among simulations and can be correlated with other metrics. Thus, the coefficients of determination (R 2 ) of both models (ARM and PVR) were used to quantify the amount of phylogenetic signal in the simulated data sets for the so-called "partition methods".
The other two metrics we used to quantify phylogenetic signal explicitly assume a Brownian motion model of trait evolution as a reference for estimating the phylogenetic signal. Thus, the first "model-based" statistic was the (Pagel, 1999) l, which is equal to 0 when the trait is evolving independently of the phylogeny, while a value of 1.0 indicates that the trait under study is evolving according to Brownian motion. Details on how to estimate Pagel's l can be found in Freckleton et al. (2002). Blomberg et al. (2003) K was the second model-based metric used in this study. Values of K-statistics lower than 1.0 indicate that related species resemble each other less than expected under the Brownian motion model of trait evolution, while values greater than 1.0 signify that more related species are more similar, for the trait under study, than predicted by this model. Because K has a right-skewed distribution (see results), we used a logarithmic transformation (base 10) to normalize this metric, so that the neutral expectation (i.e., evolution under Brownian motion) can be inferred when K = zero.
The final dataset used for analyses was then comprised by the 200 simulated values of the five metrics (Moran's I, R 2 ARM , R 2 PVR , l and K). A stochastic variation in the magnitude of phylogenetic signal is expected under Brownian motion, so that the convergence among the metrics can be evaluated in terms of the amount of phylogenetic structure observed in each simulation. We used Spearman's rank correlation to measure to level of association between pairs of metrics because it is robust against outliers and can also measure nonlinear relationships. We also calculated correlation among mean estimates of phylogenetic signal among the different methods when increasingly a-parameter along O-U simulations.
Moran's I coefficients, ARM's and PVR's R 2 were calculated with the use of a software written in the Basic and Delphi programming languages, available from the authors upon request. K and l were calculated by using the package picante (Kembel et al., 2010) and geiger  respectively, both implemented in the R software environment (R Development Core Team, 2009).

Results
Pagel's l was equal to 1.0 in 97% of the simulations performed, thus revealing high statistical power and ability to correctly detect Brownian motion. However, this metric was not sensitive to stochastic variation in species values generated under Brownian motion and disturbing the magnitude of phylogenetic signals, being not correlated with the other metrics.
Moran's I varied between 0.03 and 0.89, with a mean of 0.31 ± 0.19, with a slightly right-skewed distribution (median = 0.27) (Figure 1a). The R 2 from ARM also varied widely (Figure 1b), ranging from 0.07 to 0.89 (mean = 0.43 ± 0.17), whereas values of R 2 from PVR ( Figure 1c) were higher (mean = 0.94 ± 0.02) and varied much less (0.88 and 0.99) than those estimated from ARM. Comparatively, the distributions of both R 2 tended to be more symmetric ( Figure 1). K-statistics varied from 0.41 to 4.44, with a mean of 0.99 (as expected under Brownian motion), but the values were strongly right-skewed, and this remained even after logarithm transformation (Figure 1d).
Under Brownian motion, all metrics (Moran's I, R 2 from PVR and ARM and K-statistics) were highly correlated to each other, with Spearman correlation higher than 0.70 (Table 1). The lowest correlation (r S = 0.728) was between Moran's I and PVR's R 2 . Because Moran's I and ARM's and PVR's R 2 are all empirical (statistical) metrics and K-statistics was explicitly designed to work under Brownian motion, we show the relationships between Kstatistics and each one of these metrics in Figure 2. In general, as indicated by the Spearman's correlations, these empirical metrics were clearly associated to K, but the relationships were curvilinear.
Increasing the value of the a-parameter in the O-U process caused, as expected, a monotonic decrease in all metrics (all Spearman's correlation among metrics higher than 0.95), and thus the mean phylogenetic signals estimated using different methods are also strongly correlated across simulations, with decreasing signal when increasing O-U parameter (Table 2). However, the rate of decrease varied among them. For l, the best indicator of deviation from Brownian motion was not the mean, but the frequency with which l differed from 1.0. For PVR, even when signal was nearly absent, the mean R 2 was still very high (0.724 ± 0.048). However, these coefficients of determination were usually not significant due to the small degrees of freedom left after using 90 eigenvectors. Thus, the adjusted R 2 , ranging from 0.027 (a = 10) to 0.785 (a = 0), provided a more standardized metrics for the decrease in phylogenetic signal.

Discussion
Our analyses show that all metrics for estimating phylogenetic signal in quantitative traits are strongly, albeit Evolutionary models and estimation of phylogenetic signal non-linearly, associated to each other. Thus, criticisms to some of these metrics, including PVR and ARM (e.g., Rohlf, 2001), although referring to more detailed behavior of these methods under specific evolutionary processes, do not invalidate results of empirical analyses if phylogenetic structures are well defined and distance/weighting matrices reflect evolutionary dynamics and expectations for trait variation among taxa, as done here.
High Moran's I coefficients and R 2 values derived from PVR and ARM are consistent with the expected structure in data generated under Brownian motion process, in which closely related species will tend to be similar and this similarity decreases with phylogenetic distance (see Hansen and Martins, 1996). However, on average, both global Moran's I and R 2 from ARM were low, indicating that these metrics have a low statistical power in detecting signal, by considering the wide range of values observed under Brownian motion (which is expected to generate strong and linear relationship between species divergence and time, especially with large, i.e., n > 50, sample sizes). Note that we estimated a single (global) Moran's I statistics, instead of a phylogenetic correlogram composed by several Moran's I (see Gittleman and Kot, 1990;Diniz-Filho, 2001), so that it was directly comparable with the other metrics. Because of random process at larger phylogenetic distances, global Moran's I cannot be so high, even under Brownian motion. However, Moran's I in the first distance class was closer to 1.0 under Brownian motion and also decreased 676 Diniz-Filho et al.  with the increase of the a-parameter in the O-U process, although at a much lower rate (because, even under strong O-U, similarity between closely related taxa persists -see Hansen and Martins, 1996). Thus, using only first distance class is more effective in taking into account more complex, non-linear, models of evolution. Also, Moran's I coefficients tend to be more independent of errors in the phylogeny in general, being only affected by errors at a particular distance class (see ). On the other hand, coefficients of determination derived from PVR were usually very high and close to 1.0 under Brownian motion (see Diniz-Filho et al., 2012b). More importantly, they were highly correlated with K-statistics, so PVR can be considered a powerful way to estimate the relative amount of phylogenetic signal. Rohlf (2001) criticized PVR as producing a trivial result when all eigenvectors are used to fully represent the phylogeny. Indeed, when all eigenvectors are used, a trivial R 2 of 1.0 is obtained. Thus, although all eigenvectors are indeed necessary to fully summarize the original phylogenetic distance matrix (as in any Principal Coordinate Analysis), our results indicate that not all of them are necessary to correctly estimate phylogenetic signal. Also, it is important to remember that a phylogenetic distance matrix is not error-free and, therefore, some of the (residual) variation in this matrix, summarized by eigenvectors associated to very low (absolute) eigenvalues can be discarded (as in any multivariate analysis). In terms of scaling, we realize that using the adjusted R 2 offers a more appropriate range of values, being closer to zero under strong O-U processes that minimizes phylogenetic structure in data.
The main advantage of the model-based approaches, such as K-statistics or l, is that they provide a reference value for departure from Brownian motion, whereas the pure statistical methods only indicate whether a small or large amount of signal is present in data. Even so, the similarity among the metrics suggests that they all must be someway related to this evolutionary model and thus must be calibrated according to neutral expectations. Indeed, Diniz-Filho (2001) showed that, under Brownian motion, phylogenetic correlograms were better described by a linear model (i.e., large positive Moran's I coefficients in the first distance classes, followed by non-significant coefficients and afterward by negative and significant coefficients at high phylogenetic distances). Here there is also a conspicuous change in the correlogram profile and the elimination of the phylogenetic signal with the increase of the alpha parameter (which controls the strength of stabiliz-Evolutionary models and estimation of phylogenetic signal 677   (Helmus et al., 2007), which is actually the denominator of the K-statistics of Blomberg et al. (2003). For the phylogeny used here, the PSV was equal to 0.781, whose complement (1 -PSV) is then equal to 0.219 and thus close to median Moran's I (= 0.278) from the simulations. This value of PSV indicates that, due to the shape of the tree, a trait evolving under Brownian motion would not attain a strong phylogenetic signal, if measured by ratios of variances. Notice that K-statistics takes this possibility of different expectation due to shape of the phylogeny into account by dividing the observed by expected (under Brownian motion) ratio of variances.
For the PVR's R 2 the interpretation of Brownian motion expectation is even more obvious after the comparisons between metrics (i.e., PVR's R 2 and K), and follows the reasoning of Rohlf (2001)for criticizing the PVR. If all eigenvectors are needed to describe the full structure of the phylogeny, and if using them will produce (by definition) an R 2 equal to 1.0, then not using all of them would produce a drop in the R 2 proportional to the importance of the eigenvalues not used. Thus, if the eigenvectors used explain 95%, this value would be the expected R 2 if a trait is evolving linearly along the phylogeny. Indeed, the mean R 2 from PVR was equal to 0.94, and it is possible to observe in Figure 2c that this value corresponds to a K-statistics close to zero (see Diniz-Filho et al., 2012b).
Notice that all these comparisons between PVR, ARM and Moran's I with K-statistics were performed using simulated data on the same phylogeny, and thus the denominator of K-statistics is constant throughout the comparisons. Thus, the quantity being compared is actually the ratio between observed and phylogenetic mean squares, the "absolute" magnitude of the phylogenetic effect in data (see Blomberg et al., 2003). Further comparisons in which several phylogenies are used are still necessary, and in this case both the magnitude of the phylogenetic effect and the shape of the phylogeny (the denominator of K-statistics) would be taken into account.
Therefore, criticisms about these methods should be re-examined by considering the phylogenetic relationships used (e.g. Pavoine et al., 2007). Further analyses addressing more complex scenarios, including different tree shapes, sample sizes and evolutionary models, can validate their robustness and better establish their statistical power under more variable situations. Also, now that theoretical issues require the measurement of phylogenetic signal as deviations (both positive and negative) from neutral expectation of species divergence modeled by Brownian motion (e.g., Cooper et al., 2010), it should be important to derive neutral (not null) expectation for these statistics. However, our analyses reveal that old metrics based on autocorrelation analyses may still be valid and useful especially when detailed phylogenies are unavailable or when trait variation among species is difficult to describe by more standard Brownian or O-U evolutionary models.