The analysis of diversification and character evolution using phylogenetic data attracts increasing interest from biologists. Recent statistical developments have resulted in a variety of tools for the inference of macroevolutionary processes in a phylogenetic context. In a recent paper Maddison (2006 Evolution, 60: 1743–1746) pointed out that uncareful use of some of these tools could lead to misleading conclusions on diversification or character evolution, and thus to difficulties in distinguishing both phenomena. I here present guidelines for the analyses of macroevolutionary data that may help to avoid these problems. The proper use of recently developed statistical methods may help to untangle diversification and character change, and so will allow us to address important evolutionary questions.

The last 20 years have witnessed a remarkable change in paradigm for evolutionists: the variation in speciation and extinction rates (i.e., the tempo of evolution), and the variation in the rates of character change (the mode of evolution) are now ideally studied with molecular phylogenies and data collected on recent species. These issues, traditionally called macroevolution, were the domain of paleontologists during several decades (Simpson 1953), whereas molecular data were used to address microevolutionary mechanisms (to have an idea of this changing paradigm, compare the two editions of the same book by Futuyma 1986, 1998).

In a recent paper, Maddison (2006) used simulated data following two scenarios to bring attention to some limits of this new paradigm. In the first scenario, some clades were simulated starting from the root, and a character evolved along the tree: two states were allowed (0 and 1) with a constant probability of change between them. The speciation rate was related to the state of the character so that species in state 1 split at a higher rate than those in state 0. Maddison showed that the estimates of the ratio of character transition rates were correlated with the ratio of the speciation rates. In the second scenario, Maddison used a similar setting except that the speciation rate was constant, but the rates of transition of the character in one direction were four times higher than in the other. He, then, showed that the analysis of diversification led to infer, more frequently than by chance, that the speciation rate was different between the states of the character.

Maddison (2006) concludes that, if simple data analyses are done, biased diversification with respect to a character may lead to wrong conclusions on the evolution of this character. On the other hand, biased character evolution may lead to false association of this character to biased diversification. In other words, this raises the question whether we may be able to untangle differential evolution of character from differential diversification. The question may be framed in more specific terms from Maddison's simulation study: if a character state is observed to be relatively rare in a clade, can we distinguish whether this state is associated with a low speciation rate, or evolution toward this state occurs at a low rate? (Or both?)

This issue is of great importance because, if we can answer positively, we could point which effect most contributes to the generation of diversity. Heterogeneous diversification rates and heterogeneous character change rates are likely linked to different evolutionary mechanisms. Character states with a high transition rate may be the result of counter-selection, developmental instability, or low plasticity of the character. On the other hand, a high speciation rate related to a character state may be the result of its adaptative value, or its association with the rate of reproductive isolation.

My aim in the present article is to show that appropriate data analysis methods can separate the effects of biased character change and biased diversification. This suggests that heterogeneous character evolution and heterogeneous diversification rates can be untangled in future analyses of macroevolutionary data. I point to some recommendations for data analyses in macroevolutionary studies with recent species, as well as further needs for future research in this area.

Methods

ANALYSIS OF CHARACTER CHANGE

Maddison (2006) showed that the maximum-likelihood estimates of the ratio of character transition rates were correlated with the actual ratio in speciation rates. One may be tempted to interpret a high ratio of character transition rates as evidence for a strong bias in character change, thus concluding that rates are actually different. However, Maddison did not address the issue of testing whether these rates are significantly different. Indeed, looking at parameter estimates is not the ideal way to assess the validity of an hypothesis. The results from Maddison's analyses illustrate that these parameter estimates can be strongly biased when a wrong model is used, not that they are significantly different.

With simulated data, we know the model used to generate the data, but with real data we have to test whether a given model is appropriate. In a likelihood framework, the likelihood ratio test (LRT) is the canonical way for hypothesis testing (Lindsey 1996; Ewens and Grant 2005). To fix ideas, let us write explicitly the model used to simulate the data in Maddison's first scenario (this model is referred to as model 1 in the text); its rate matrix (often denoted Q) is

(1)

where the row labels denote the initial states of the character (denoted x in this article), the column labels the final ones, and r is the instantaneous rate of change, that is the probability that x changes of state during a very short interval of time. It is difficult to interpret the parameter r biologically, but its virtue is that it is independent of time. To obtain real probabilities, we have to fix a time interval, say t, and compute the matrix exponential e^tQ (see below).

Maddison estimated the rates ratio using the following two-parameter model (referred to as model 2)

(2)

We know, in this particular situation, that this model has an extra parameter because the data were simulated with r₂=r₁. Models 2 and 1 can be compared with an LRT because they are nested: the latter is a particular case of the former (Pagel 1994; Nosil and Mooers 2005). The test follows a chi-squared distribution with df = 1 (the difference in number of parameters).

In real situations we cannot be sure that either model 1 or model 2 generated the observed data. If neither of them is the true model, the tests of hypotheses could be biased in the same way than Maddison showed that parameter estimates are biased. It is thus necessary to assess the goodness of fit of a model in a general way. In the present context, the shape of the likelihood function is an indication of the poor or good fit of a model. If a model poorly fits the data, the likelihood function is relatively flat. It is possible to examine the shape of the likelihood function by plotting it against a range of parameter values, but an easier procedure is to look at the estimated standard errors of the parameters that are derived, under the standard likelihood method, from the second derivatives of the likelihood function. Thus, the smaller these standard errors, the narrower the likelihood function. When no analytical expression of the second derivatives is possible, which is precisely the case for the Markovian models considered here, a numerical computation may be done using nonlinear optimization (Schnabel et al. 1985). The ratio of the parameter estimate, inline image

, on its standard error, se inline image

, can be used as an indication of the shape of the likelihood. This ratio is in fact a formulation of the Wald test that, under the null hypothesis that r is equal to zero, follows a standard normal distribution (see Rao 1973)

The LRT and the Wald test can thus be used to test the same hypothesis, but the latter is rarely used because it has generally poorer statistical performance than the LRT, particularly for small sample sizes (Agresti 1990; McCulloch and Searle 2001). However, both tests are expected to give the same results for large sample sizes (Rao 1973; McCulloch and Searle 2001).

I propose the following rule of thumb: when this ratio is less than 2, it is likely that the model under consideration is not appropriate. A justification for this rule is that, under the assumption that the maximum-likelihood estimates are normally distributed, (a standard assumption of the likelihood theory of estimation), then a 95% confidence interval may be calculated with inline image . Consequently, if then zero is included in this interval suggesting the presence of a flat likelihood function. The method should be used as follows. First, perform the LRT comparing both models. If this test is significant, then examine the ratios of the rate estimates of model 2 on their standard errors: if one of them is less than 2, then it is likely that the LRT is biased and the null hypothesis is true.

ANALYSIS OF DIVERSIFICATION

Maddison (2006) inferred differences in speciation rates by comparing the number of species between sister clades that are different with respect to the character, but all species within each clade have the same character state. This method does not use all available information as it considers only a subset of the nodes of the tree, and it ignores branch lengths. Several methods have been proposed to analyze diversification that essentially differ in the type of data under consideration (see reviews in Sanderson and Donoghue 1996; Mooers and Heard 1997; Pagel 1999). For instance, some methods consider tree topology and balance (e.g., Aldous 2001), whereas others consider the distribution of branch lengths (e.g., Pybus and Harvey 2000). When the topology and branch lengths of the analyzed phylogeny are available, it is possible to refine the analyses and use more elaborate methods. Recent developments have been done in the inference of diversification, particularly the Yule model with covariates, which takes full account of all phylogenetic information (tree topology and branch lengths) to infer the effects of species traits on speciation rates (Paradis 2005). In this model the speciation rate depends on a linear combination of some variables measured on each species. This approach is similar to a standard linear regression where the mean of the response is given by a linear combination of variables. Consequently, a wide variety of models may be fitted to the same phylogenetic and species traits data. The latter could be continuous and/or discrete.

Because the Yule model with covariates is fitted by maximum likelihood, different models can be compared with an LRT if they are nested. In the present context of testing the effect of x on the speciation rate (λ), the general model is

where α and β are parameters, so that the right-hand side of the above equation is equal to α if x= 0, or to α+β if x= 1 (see Paradis 2005, for details). The LRT comparing this model to the standard Yule model tests the hypothesis of the effect of x on λ, that is, whether β is significantly different from zero. The Wald test may also be performed by computing inline image

. The same criterion proposed above for the analysis of character was applied as well.

DATA SIMULATION

I simulated some data with known parameters to assess the statistical performance of the methods described above. The idea was to generate some datasets in which the majority of the species are in a state due to, either different speciation rates, or different rates of character change, or both. The data were then analyzed to assess whether the two processes can be untangled. The simulations were started from the root of the tree, that is the initial bifurcation. At each time-step, each species present in the clade had a given probability (λ) to split into two, and a probability (p) to change the state of its character x. When a species split, the daughter species inherited its value of x. Both λ and p depend on x, and so are hereafter denoted λ₀, p₀, λ₁, and p₁, where the subscript indicates the value of x. Two of these parameters were fixed for all simulations: λ₀= 10⁻⁴ and p₀= 5 × 10⁻⁵. Three different combinations of parameters were used for λ₁ and p₁: (1) λ₁= 5λ₀, p₁=p₀, (2) λ₁=λ₀, p₁=p₀/10, and (3) λ₁= 2.5λ₀, p₁=p₀/4.

These settings correspond to the three plausible biological scenarios leading to the abundance of a trait in a clade: (1) this trait is associated with a high speciation rate, (2) species without this trait tend to evolve toward acquiring it, and (3) a mixture of both processes. The parameter values were chosen so that ca. 90% of species had x= 1 at the end of the simulation. These were found analytically in setting (2), because speciation was homogeneous, using the matrix exponentiation explained above, and the fact that the expected number of species after t time-steps is given by 2e^λt (Kendall 1948). It was obviously unnecessary to consider here the null setting λ₁=λ₀ and p₁=p₀ because this would yield 50% of species in each state, and so little difficulty for data analysis. The first setting is similar to Maddison's (2006) first scenario, whereas the second setting is close to his second one: the difference is that he used a ratio of 4 instead of 10.

The time-step of the simulations was transformed in time unit equal to 0.001. Giving a probability of change of 5 × 10⁻⁵ for t= 0.001, we can find by back-transformation with the matrix exponentiation and interpolation that the actual parameter of the first setting was r≈ 0.05. All simulations were run until 100 species were present. This was replicated 100 times for each possible initial state at the root (0 or 1) and each combination of the parameters. The trees and values for x at the end of the simulation were saved (the files are available from the author). All simulations were programmed in R version 2.4.0 (R Development Core Team 2006). The data were analyzed as if they were real data: the procedures described above were applied to all replicates using R's standard looping functions. Both LRTs and Wald tests were computed, as well as the proportion of cases in which the tests agreed in rejecting the null hypothesis. The analyses of character change and of diversification were done with the package APE (Paradis et al. 2004).

Results

The results were slightly affected by the state of the root as it changed slightly the proportion of species in state 1 at the end of the simulation: 83.8% and 91.6%, for a root in state 0 or 1, respectively (data pooled over all simulations). This proportion had in fact a skewed distribution when calculated for each replicate; the corresponding medians were thus slightly larger: 85% (range: 28–99) and 92% (63–99). For simplicity, the results below are presented for the two series of simulations pooled because they were overall consistent.