This tutorial will take you through the simple steps of simulating phylogenies under the Yule and birth-death models, then creating lineage-through-time (LTT) plots. The exercises presented here are based on tutorials written by Paul Lewis (http://hydrodictyon.eeb.uconn.edu/eebedia/index.php/Phylogenetics:_APE_Lab) and Liam Revell (http://www.phytools.org/mpma/Exercise_9.1/).

First load the phytools package in R. This package was written by Liam Revell and provides functions for performing a wide range of analyses in comparative phylogenetics. When you load this library, it will also load the dependent libraries ape and maps.

library(phytools)
## Loading required package: ape
## Loading required package: maps
## 
##  # ATTENTION: maps v3.0 has an updated 'world' map.        #
##  # Many country borders and names have changed since 1990. #
##  # Type '?world' or 'news(package="maps")'. See README_v3. #

We will also load another package that simulates birth-death phylgoenies called TreeSim, writen by Tanja Stadler. This package will automatically load the packages laser and geiger.

library(TreeSim)
## Loading required package: laser
## Loading required package: geiger

These exercises will take you through simulating trees under stochastic branching models.

First generate a tree under the Yule model (where extinction = 0) with 100 tips. To do this, we will use the sim.bd.taxa() function in TreeSim.

trees <- sim.bd.taxa(n=100,numbsim=1,lambda=0.03,mu=0,frac=1,complete=FALSE)

The sim.bd.taxa() function simulates trees under the birth-death process conditional on the number of sampled taxa. Here, we simulated a tree with 100 tips (n=100) under a pure-birth model without extinction (mu=0) and with complete taxon sampling (frac=1). The tree stored in the trees variable is the reconstructed tree, meaning that the extinct lingeages (found in the complete tree) are all pruned away. We specify this tree by setting complete=FALSE.

The sim.bd.taxa() function returns a vector of trees and we can generate many replicates by setting numbsim to a value greater than 1. We can store our single tree object in a separate vairable.

t <- trees[[1]]

Now let’s plot the tree to see what it looks like and make the edges red, for fun. And we will also make the tip labels pretty small using the cex=0.2 argument.

plot(t,edge.color="red",cex=0.2)

Note that your tree will not look identical to this one, unless by some chance the starting seed for your simulation was exactly the same as this one.

And we can visualize the LTT plot:

ltt_plot <- ltt(t,log=FALSE)

The plot above shows the unscaled lineages through time. We simulated a tree with a constant rate of speciation and no extinction or sampling. Thus, this is the same a exponential growth model.

When the lineages are plotted on the log scale, the number of species increases linearly over time. We can also add a line corresponding to linear growth.

ln_ltt_plot <- ltt(t,log=TRUE)
lines(c(0, max(nodeHeights(t))), c(log(2), log(length(t$tip.label))),lty = "dashed", lwd = 2, col = "red")

We can calculate the slope of the line:

(log(length(t$tip.label)) - log(2)) / (max(nodeHeights(t)))
## [1] 0.01936927

And this should be close to the speciation rate we simulated under (0.3).

The ape package provides a way to fit a Yule model to a phylogenetic tree using the yule() function:

yule(t)
## $lambda
## [1] 0.03254857
## 
## $se
## [1] 0.003287902
## 
## $loglik
## [1] -74.51794
## 
## attr(,"class")
## [1] "yule"

This prints the speciation rate (lambda) estimated under maximum likelihood, the standard error of lambda (se) and the maximum log-likelihood (loglik). “The maximum likelihood estimate of the speciation rate is obtained by the ratio of the number of speciation events on the cumulative number of species through time; these two quantities are obtained with the number of nodes in the tree, and the sum of the branch lengths, respectively.” (from the ape help: help(yule)).

Now let’s simulate a tree with extinction:

bd_trees <- sim.bd.taxa(n=100,numbsim=1,lambda=0.03,mu=0.025,frac=1,complete=FALSE)
bd_t <- bd_trees[[1]]
plot(bd_t,edge.color="blue",cex=0.2)

Now for the log-lineages through time plot:

bd_ln_ltt_plot <- ltt(bd_t,log=TRUE)
lines(c(0, max(nodeHeights(bd_t))), c(log(2), log(length(bd_t$tip.label))),lty = "dashed", lwd = 2, col = "red")

Like for the Yule tree, we can estimate the diversification parameters of the birth-death tree using ape and the birthdeath() function.

bd_stats <- birthdeath(bd_t)
bd_stats
## 
## Estimation of Speciation and Extinction Rates
##             with Birth-Death Models
## 
##      Phylogenetic tree: bd_t 
##         Number of tips: 100 
##               Deviance: 269.3902 
##         Log-likelihood: -134.6951 
##    Parameter estimates:
##       d / b = 0.825178   StdErr = 0.08051848 
##       b - d = 0.005445748   StdErr = 0.002177078 
##    (b: speciation rate, d: extinction rate)
##    Profile likelihood 95% confidence intervals:
##       d / b: [0.7566881, 0.8744123]
##       b - d: [0.004094419, 0.007163502]

The summary tells you what the estimates for diversification (b - d) and turnover (d / b) are. We can easily get the values for speciation and extinction from these. First, let’s check how they are stored using the str() function:

str(bd_stats)
## List of 6
##  $ tree: chr "bd_t"
##  $ N   : int 100
##  $ dev : num 269
##  $ para: Named num [1:2] 0.82518 0.00545
##   ..- attr(*, "names")= chr [1:2] "d/b" "b-d"
##  $ se  : Named num [1:2] 0.08052 0.00218
##   ..- attr(*, "names")= chr [1:2] "d/b" "b-d"
##  $ CI  : num [1:2, 1:2] 0.75669 0.00409 0.87441 0.00716
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "d/b" "b-d"
##   .. ..$ : chr [1:2] "lo" "up"
##  - attr(*, "class")= chr "birthdeath"

We can see that the diversification and turnover are stored as an array with the label para. We can now set these to some variables:

r <- bd_stats$para[[1]]
d <- bd_stats$para[[2]]
r
## [1] 0.825178
d
## [1] 0.005445748

And we can calculate lambda:

d/(1-r)
## [1] 0.03115025

And mu:

(r*d) / (1-r)
## [1] 0.0257045

We can also produce LTT plots for lists of trees. To do this we will use a different function for simulating the birth-death trees: pbtree(). The return value of this simulation function is of the type multiPhylo, which is easier to input to the ltt function.

yule_trees <- pbtree(b=0.3,d=0.0,n=100,nsim=100,scale=1)

This time we are going to scale all of the trees to the same height so that they can be visualized on a relative scale.

temp <- ltt(yule_trees, log = TRUE)
lines(c(0, 1), c(log(2), log(100)), lty = "dashed", lwd = 2, col = "red")