The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Phylogenetic comparative methods (PCMs) propagate uncertainty imperfectly. The standard pipeline takes one tree, fits one model, and reports one set of confidence intervals. But the tree itself is typically a point estimate from a posterior, the trait data typically have missing values, and each of those is its own source of uncertainty.
This vignette walks through a pipeline that handles both:
The two packages compose: prepR4pcm produces multiPhylo,
pigauto consumes multiPhylo. Once the contract is set up,
swapping the backend (rtrees vs datelife vs fishtree) doesn’t change the
rest of the pipeline.
A model-ready dataset for fish, with:
m_per_tree upward for a real
analysis).Start with whatever names appear in your dataset. They never quite
match a tree’s tip labels — formatting drifts, synonyms creep in.
pr_get_tree() copes: it normalises every name (underscores,
whitespace, authority strings) and, for the fishtree
backend, resolves synonyms through Open Tree of Life’s Taxonomic Name
Resolution Service before querying the backend.
# Your trait data — typically loaded from disk. Names need not be
# tidy: `Esox_lucius` carries an underscore, which pr_get_tree()
# normalises away.
my_data <- data.frame(
species = c("Salmo salar", "Esox_lucius", "Oncorhynchus mykiss"),
body_size = c(98, NA, 50) # NA is fine; pigauto will impute it
)Every backend that can return multiple trees does so via
n_tree > 1. Here we ask fishtree for 50
stochastic resolutions (Chang, Rabosky, Alfaro 2019, Syst.
Biol.).
trees <- pr_get_tree(my_data,
species_col = "species",
source = "fishtree",
n_tree = 50)
class(trees$tree) # "multiPhylo"
length(trees$tree) # 50
trees$matched # the 3 species, in their original input form
trees$unmatched # character(0) — every species was placed
# Per-tree provenance: one list entry per returned tree, each
# recording source_index, citation, calibration_method, and n_tips.
length(trees$backend_meta$tree_provenance) # 50
trees$backend_meta$tree_provenance[[1]] # the provenance of tree 1pr_get_tree() records what happened to every name in
trees$mapping (one row per input species), so a name that
quietly fell out of the tree cannot go unnoticed.
For the universal-but-slower option, use
source = "datelife" — the chronogram-database backend that
returns one calibrated tree per source (Hedges, Bininda-Emonds, Upham,
etc.).
trees <- pr_get_tree(my_data,
species_col = "species",
source = "datelife",
n_tree = 50)
# trees$tree is multiPhylo; per-source citations are in
# trees$backend_meta$tree_provenance[[i]]$citationIf you already have a topology and want to date it (rather
than fetch a new one), use pr_date_tree():
my_topology <- ape::read.tree("my_topology.nex")
dated_trees <- pr_date_tree(my_topology, n_dated = 50)
class(dated_trees$tree) # "multiPhylo"
length(dated_trees$tree) # up to 50What n_dated = 50 actually returns: 50
chronograms that all share my_topology but have
different branch lengths. Each variant comes from a different
source paper in DateLife’s database (e.g. variant 1 dated against Hedges
et al. 2015, variant 2 against Bininda-Emonds et al. 2007, etc.). The
topology is fixed; the dating varies.
To get 50 different topologies, fetch them separately first
(e.g.
pr_get_tree(species, source = "rtrees", taxon = "mammal")
returns 100 mammal topologies sampled from VertLife / Upham et
al. 2019), then pass that multiPhylo to
pr_date_tree() to date each one. That gives you the
topology × source cross-product.
pr_get_tree() resolves names well enough to
retrieve a tree, but the modelling step needs a data frame
whose rows line up one-to-one with the tree’s tips.
reconcile_tree() matches your data against the retrieved
topology, and reconcile_apply() returns the aligned
data–tree pair. The 50 posterior trees share one tip set, so reconciling
against the first tree aligns the data to all of them.
rec <- reconcile_tree(my_data, trees$tree[[1]],
x_species = "species",
authority = NULL) # tree tips are plain binomials
rec # match summary: 1 exact, 2 normalized, 0 unresolved
aligned <- reconcile_apply(rec,
data = my_data,
tree = trees$tree[[1]],
species_col = "species",
drop_unresolved = TRUE)
aligned$data # one row per tree tip — the model-ready frame
#> species body_size
#> 1 Salmo salar 98
#> 2 Esox_lucius NA
#> 3 Oncorhynchus mykiss 50aligned$data carries one row per species in the tree,
with Esox_lucius keeping the NA that pigauto
will impute; aligned$tree is the matching pruned tree.
Reviewers ask. Save yourself the round-trip.
This is the contract.
pigauto::multi_impute_trees(df, trees, m_per_tree) takes
df (data with NAs) and trees
(multiPhylo) and returns m_per_tree
complete-data imputations per tree. The output knows which
imputation came from which tree, and pigauto’s pooling step later uses
that index.
library(pigauto)
# 5 imputations per tree × 50 trees = 250 complete datasets
mi <- multi_impute_trees(aligned$data,
trees = trees$tree,
m_per_tree = 5)
# Fit your model to each completed dataset. Replace the formula
# with whatever your hypothesis is.
fits <- with_imputations(mi, function(df) {
glmmTMB::glmmTMB(body_size ~ 1 + (1 | species),
data = df)
})
# Pool via Rubin's rules — the standard errors reflect BOTH the
# imputation uncertainty AND the tree-posterior uncertainty.
pooled <- pool_mi(fits,
coef_fun = function(fit) fixef(fit)$cond,
vcov_fun = function(fit) vcov(fit)$cond)
pooledThe reported intervals are wider than the equivalent single-tree-single-imputation pipeline — and correctly so. The narrower intervals were always overconfident.
Verified on a clean macOS R 4.4 install on 2026-05-01. See the Comparing tree backends vignette for the full status table.
| If your taxon is… | And you want… | Use | Status |
|---|---|---|---|
| Universal | A single best-guess tree | pr_get_tree(source = "rotl") |
✅ verified — returns the Open Tree of Life synthesis tree |
| Universal | A posterior of dated trees | pr_get_tree(source = "datelife", n_tree = 50) |
likely ✅ — datelife is in Enhances;
install separately with
pak::pak("phylotastic/datelife") |
| Birds | A single tree, current Clements | pr_get_tree(source = "clootl", n_tree = 1) |
✅ verified — works out of the box (uses the v1.6/2025 taxonomy
bundled with clootl) |
| Birds | A posterior of trees, current Clements | pr_get_tree(source = "clootl", n_tree = 50) |
⚠️ requires clootl::get_avesdata_repo(".") once before
this works; without it sampleTrees() errors with
AvesData repo not found. Capped at 100 upstream. |
| Fish | Time-calibrated, posterior | pr_get_tree(source = "fishtree", n_tree = 50) |
✅ verified — returns multiPhylo[50] |
| Bird, mammal, fish, amphibian, reptile, plant, shark, bee, butterfly | Taxon-specific mega-tree, possibly grafted | pr_get_tree(source = "rtrees", taxon = "<group>") |
✅ works; ⚠️ n_tree is informational
only — rtrees::get_tree() has no
n_tree argument, so the count is fixed by the chosen
mega-tree (taxon = "bird" → 100,
taxon = "mammal" → 100, taxon = "fish" → 1,
etc.). |
For the dating-only case (you already have a topology):
| If your topology is… | Use | Status |
|---|---|---|
| Universal taxa | pr_date_tree(my_tree, n_dated = 50) |
likely ✅ — untested in this run; same dependency story as
datelife |
pr_tree_compare() for quick pairwise Jaccard /
Robinson-Foulds / branch-length comparisons (see the comparing tree backends
vignette). For richer visual comparison use
phytools::cophyloplot() or
phangorn::densiTree().impute() / multi_impute(). This vignette skips
straight to the trees-aware variant because that’s the integration
point.For repeated runs of the same retrieval (e.g. while iterating on a
manuscript), pass cache = TRUE to
pr_get_tree(). The cache lives under
tools::R_user_dir() by default; redirect with
pr_tree_cache_dir() if you want it inside the project.
?pr_get_tree, ?pr_date_tree,
?pr_cite_tree, ?reconcile_augment — the four
entry points for prepR4pcm’s tree-handling.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.