The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

pbd_ML demo

Richel Bilderbeek & Rampal S. Etienne

2017-05-04

This document gives a demonstration how to use the package to obtain a maximum-likelihood estimate of the protracted birth-death speciation model.

First thing is to load the PBD package itself:

rm(list = ls())
library(PBD)

We will also need ape for branching.times:

library(ape)

Here we simulate a tree with known parameters:

seed <- 43
set.seed(seed)
b_1 <- 0.3 # speciation-initiation rate of good species
la_1 <- 0.2 # speciation-completion rate
b_2 <- b_1 # the speciation-initiation rate of incipient species
mu_1 <- 0.1 #  extinction rate of good species
mu_2 <- mu_1 # extinction rate of incipient species 
pars <- c(b_1, la_1, b_2, mu_1, mu_2)
age <- 15 # the age for the simulation 
phylogenies <- pbd_sim(pars = pars, age = age)
plot(phylogenies$recontree)

plot(phylogenies$igtree.extant)
## no colors provided. using the following legend:
##       g       i 
## "black"   "red"

plot(phylogenies$tree)

names(phylogenies)
##  [1] "tree"           "stree_random"   "stree_oldest"   "stree_youngest"
##  [5] "L"              "sL_random"      "sL_oldest"      "sL_youngest"   
##  [9] "igtree.extinct" "igtree.extant"  "recontree"      "reconL"        
## [13] "L0"

Now we try to recover the parameters by maximum likelihood estimation:

brts <- branching.times(phylogenies$recontree)  # branching times
init_b <- 0.2  # speciation-initiation rate
init_mu_1 <- 0.05  # extinction rate of good species
init_la_1 <- 0.3 # speciation-completion rate
#init_mu_2 <- 0.05  # extinction rate of incipient species  # not used

# The initial values of the parameters that must be optimized
initparsopt <- c(init_b, init_mu_1, init_la_1)

# The extinction rates between incipient and good species are equal
exteq <- TRUE

# The first element of the branching times is the crown age (and not the stem age)
soc <- 2

# Conditioning on non-extinction of the phylogeny
# as I actively selected for a nice phylogeny
cond <- 1

# Give the likelihood of the phylogeny (instead of the likelihood of the branching times)
btorph <- 1

Maximum likelihood estimation can now be performed:

r <- pbd_ML(
  brts = brts,
  initparsopt = initparsopt, 
  exteq = exteq,
  soc = soc, 
  cond = cond,
  btorph = btorph,
  verbose = FALSE
)

The ML parameter estimates are:

knitr::kable(r)
b mu_1 lambda_1 mu_2 loglik df conv
0.2817166 0.1003351 0.20635 0.1003351 -37.1221 3 0

Comparing the known true value with the recovered values:

loglik_true <- PBD::pbd_loglik(pars, brts = brts)
## Parameters: 0.3, 0.2, 0.3, 0.1, 0.1, Loglikelihood: -37.324677
df <- as.data.frame(r)
df <- rbind(df, c(b_1, mu_1, la_1, mu_2, loglik_true, NA, NA))
row.names(df) <- c("ML", "true")
knitr::kable(df)
b mu_1 lambda_1 mu_2 loglik df conv
ML 0.2817166 0.1003351 0.20635 0.1003351 -37.12210 3 0
true 0.3000000 0.1000000 0.20000 0.1000000 -37.32468 NA NA

Ideally, all parameter columns should have the same values.

To test for the uncertainty of our ML estimate, we can do a parametric bootstrap.

The function pbd_bootstrap consists of a few steps:

  1. Do a ML estimate
  2. Run a simulation with those estimates
  3. Perform ML estimation on the simulated data
  4. Go to 2 depending on the setting of endmc.
endmc <- 10 # Sets the number of simulations for the bootstrap

b <- pbd_bootstrap(
  brts = brts,
  initparsopt = initparsopt, 
  exteq = exteq,
  soc = soc, 
  cond = cond,
  btorph = btorph,
  plotltt = FALSE,
  endmc = endmc,
  seed = seed
)
## Finding the maximum likelihood estimates ...
## 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -37.25841 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.281715, mu_1: 0.100335, lambda_1: 0.206352, mu_2: 0.100335 
##  Maximum loglikelihood: -37.122103 
##  The expected duration of speciation for these parameters is: 2.807343 
##  The median duration of speciation for these parameters is: 2.206200 
## Bootstrapping ...
## 
## Simulated data set 1 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -22.03909 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.375753, mu_1: 0.311977, lambda_1: 0.230678, mu_2: 0.311977 
##  Maximum loglikelihood: -21.596081 
##  The expected duration of speciation for these parameters is: 2.030928 
##  The median duration of speciation for these parameters is: 1.543204 
## Simulated data set 2 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -57.99061 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.312123, mu_1: 0.081059, lambda_1: 0.190419, mu_2: 0.081059 
##  Maximum loglikelihood: -57.720450 
##  The expected duration of speciation for these parameters is: 2.943368 
##  The median duration of speciation for these parameters is: 2.365253 
## Simulated data set 3 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -115.261 
## Optimizing the likelihood - this may take a while. 
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## 
##  Maximum likelihood parameter estimates: b: 751.298215, mu_1: 751.000348, lambda_1: 0.004574, mu_2: 751.000348 
##  Maximum loglikelihood: -111.629201 
##  The expected duration of speciation for these parameters is: 0.386462 
##  The median duration of speciation for these parameters is: 0.310353 
## Simulated data set 4 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -107.9402 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.271860, mu_1: 0.000000, lambda_1: 0.126555, mu_2: 0.000000 
##  Maximum loglikelihood: -106.870092 
##  The expected duration of speciation for these parameters is: 4.218409 
##  The median duration of speciation for these parameters is: 3.570808 
## Simulated data set 5 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -47.84195 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.709877, mu_1: 0.574456, lambda_1: 0.225125, mu_2: 0.574456 
##  Maximum loglikelihood: -46.601652 
##  The expected duration of speciation for these parameters is: 1.643556 
##  The median duration of speciation for these parameters is: 1.279479 
## Simulated data set 6 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -187.144 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 13.979806, mu_1: 13.679004, lambda_1: 0.020678, mu_2: 13.679004 
##  Maximum loglikelihood: -183.432908 
##  The expected duration of speciation for these parameters is: 1.413933 
##  The median duration of speciation for these parameters is: 1.171087 
## Simulated data set 7 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -41.38109 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.492110, mu_1: 0.309936, lambda_1: 0.059444, mu_2: 0.309936 
##  Maximum loglikelihood: -40.448749 
##  The expected duration of speciation for these parameters is: 4.546741 
##  The median duration of speciation for these parameters is: 3.828187 
## Simulated data set 8 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -87.55681 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.300024, mu_1: 0.000001, lambda_1: 0.153513, mu_2: 0.000001 
##  Maximum loglikelihood: -86.151794 
##  The expected duration of speciation for these parameters is: 3.610683 
##  The median duration of speciation for these parameters is: 3.031341 
## Simulated data set 9 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -100.522 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.209810, mu_1: 0.000000, lambda_1: 0.393479, mu_2: 0.000000 
##  Maximum loglikelihood: -99.916686 
##  The expected duration of speciation for these parameters is: 2.036930 
##  The median duration of speciation for these parameters is: 1.540704 
## Simulated data set 10 out of 10 
## You are optimizing b mu_1 lambda_1 
## You are fixing nothing 
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -61.11511 
## Optimizing the likelihood - this may take a while. 
## 
##  Maximum likelihood parameter estimates: b: 0.173103, mu_1: 0.000000, lambda_1: 0.890545, mu_2: 0.000000 
##  Maximum loglikelihood: -60.368468 
##  The expected duration of speciation for these parameters is: 1.026129 
##  The median duration of speciation for these parameters is: 0.738873
knitr::kable(b[[3]])
ntips b mu_1 lambda_1 mu_2 loglik df conv exp_durspec median_durspec
9 0.3757529 0.3119773 0.2306781 0.3119773 -21.59608 3 0 2.0309277 1.5432035
23 0.3121229 0.0810590 0.1904194 0.0810590 -57.72045 3 0 2.9433681 2.3652530
40 751.2982150 751.0003476 0.0045737 751.0003476 -111.62920 3 0 0.3864617 0.3103533
41 0.2718601 0.0000000 0.1265550 0.0000000 -106.87009 3 0 4.2184093 3.5708078
19 0.7098773 0.5744562 0.2251248 0.5744562 -46.60165 3 0 1.6435563 1.2794793
69 13.9798056 13.6790037 0.0206780 13.6790037 -183.43291 3 0 1.4139334 1.1710870
15 0.4921098 0.3099361 0.0594435 0.3099361 -40.44875 3 0 4.5467409 3.8281874
35 0.3000241 0.0000011 0.1535128 0.0000011 -86.15179 3 0 3.6106829 3.0313409
38 0.2098101 0.0000000 0.3934793 0.0000000 -99.91669 3 0 2.0369305 1.5407041
23 0.1731033 0.0000000 0.8905452 0.0000000 -60.36847 3 0 1.0261295 0.7388728

From the bootstrap analysis, we get

Putting this in a table:

dg <- rbind(df, 
  list(
    b = b[[1]]$b, 
    mu_1 = b[[1]]$mu_1, 
    lambda_1 = b[[1]]$lambda_1, 
    mu_2 = b[[1]]$mu_2,
    loglik = b[[1]]$loglik,
    df = b[[1]]$df,
    conv = b[[1]]$conv
  ),
  list(
    b = b[[3]]$b, 
    mu_1 = b[[3]]$mu_1, 
    lambda_1 = b[[3]]$lambda_1, 
    mu_2 = b[[3]]$mu_2,
    loglik = b[[3]]$loglik,
    df = b[[3]]$df,
    conv = b[[3]]$conv
  )
)
dg
##                b         mu_1    lambda_1         mu_2     loglik df conv
## ML     0.2817166 1.003351e-01 0.206350020 1.003351e-01  -37.12210  3    0
## true   0.3000000 1.000000e-01 0.200000000 1.000000e-01  -37.32468 NA   NA
## 3      0.2817148 1.003351e-01 0.206351748 1.003351e-01  -37.12210  3    0
## 4      0.3757529 3.119773e-01 0.230678073 3.119773e-01  -21.59608  3    0
## 5      0.3121229 8.105904e-02 0.190419381 8.105904e-02  -57.72045  3    0
## 6    751.2982150 7.510003e+02 0.004573677 7.510003e+02 -111.62920  3    0
## 7      0.2718601 3.331886e-08 0.126555041 3.331886e-08 -106.87009  3    0
## 8      0.7098773 5.744562e-01 0.225124828 5.744562e-01  -46.60165  3    0
## 9     13.9798056 1.367900e+01 0.020678046 1.367900e+01 -183.43291  3    0
## 10     0.4921098 3.099361e-01 0.059443526 3.099361e-01  -40.44875  3    0
## 11     0.3000241 1.094309e-06 0.153512850 1.094309e-06  -86.15179  3    0
## 12     0.2098101 4.833861e-08 0.393479254 4.833861e-08  -99.91669  3    0
## 13     0.1731033 9.079251e-10 0.890545200 9.079251e-10  -60.36847  3    0
row.names(dg) <- c("ML", "true", "ML2", paste("BS", 1:endmc, sep = ""))
knitr::kable(dg)
b mu_1 lambda_1 mu_2 loglik df conv
ML 0.2817166 0.1003351 0.2063500 0.1003351 -37.12210 3 0
true 0.3000000 0.1000000 0.2000000 0.1000000 -37.32468 NA NA
ML2 0.2817148 0.1003351 0.2063517 0.1003351 -37.12210 3 0
BS1 0.3757529 0.3119773 0.2306781 0.3119773 -21.59608 3 0
BS2 0.3121229 0.0810590 0.1904194 0.0810590 -57.72045 3 0
BS3 751.2982150 751.0003476 0.0045737 751.0003476 -111.62920 3 0
BS4 0.2718601 0.0000000 0.1265550 0.0000000 -106.87009 3 0
BS5 0.7098773 0.5744562 0.2251248 0.5744562 -46.60165 3 0
BS6 13.9798056 13.6790037 0.0206780 13.6790037 -183.43291 3 0
BS7 0.4921098 0.3099361 0.0594435 0.3099361 -40.44875 3 0
BS8 0.3000241 0.0000011 0.1535128 0.0000011 -86.15179 3 0
BS9 0.2098101 0.0000000 0.3934793 0.0000000 -99.91669 3 0
BS10 0.1731033 0.0000000 0.8905452 0.0000000 -60.36847 3 0

We expect rows ML and ML2 to be identical. Their values are indeed very similar.

We can calculate the loglikelihood for

ml_b <- b[[1]]$b
ml_mu_1 <- b[[1]]$mu_1
ml_la_1 <- b[[1]]$lambda_1
ml_mu_2 <- b[[1]]$mu_2
ml_pars1 <- c(ml_b, ml_mu_1, ml_la_1, ml_mu_2)
ml_pars2 <- c(cond, btorph, soc, 0, "lsoda")

l <- pbd_loglik(
  pars1 = ml_pars1,
  pars2 = ml_pars2,
  brts = brts
)
print(l)
## [1] -37.1221
# Create .md, .html, and .pdf files
setwd(paste(getwd(), "vignettes", sep = "/"))
knit("PBD_ML_demo.Rmd")
markdown::markdownToHTML('PBD_ML_demo.md', 'PBD_ML_demo.html', options=c("use_xhml"))
system("pandoc -s PBD_ML_demo.html -o PBD_ML_demo.pdf")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.