The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
prepR4pcm can fetch a phylogenetic tree for your species
from five different backends — external packages, each wrapped
by pr_get_tree(), that supply a tree from a different
reference source. The five backends draw on different reference trees,
different taxonomies, and different calibration approaches, so the tree
you get depends on which backend you pick. This vignette answers how
much they disagree, and where.
This vignette walks through pr_tree_compare() and shows
how to use it to make a defensible backend choice for your dataset — or
to report which backend choice you made and what the alternatives would
have given.
Comparison requires multiple tree-retrieval backends installed. Each backend lives in a different package because each draws on a different reference tree or taxonomy:
rotl (CRAN): client for the Open Tree of Life synthesis
tree — the only backend with universal taxon coverage. Required for
source = "rotl".fishtree (CRAN): wraps the Fish Tree of Life (Rabosky et al.
2018), a time-calibrated ray-finned fish phylogeny. Required for
source = "fishtree".rtrees (github.com/daijiang/rtrees):
grafts species onto taxon-specific mega-trees (mammals, birds, fish,
amphibians, etc.). Required for source = "rtrees".clootl (github.com/eliotmiller/clootl):
supplies a phylogeny in the current Clements bird taxonomy. Required for
source = "clootl".datelife: synthesis chronograms
assembled from many per-clade published trees. Required for
source = "datelife".Install whichever you need. CRAN packages first, then GitHub-only
backends via pak:
library(prepR4pcm)
# install.packages(c("rotl", "fishtree")) # CRAN
# pak::pak("daijiang/rtrees") # GitHub
# pak::pak("eliotmiller/clootl") # GitHub
# pak::pak("phylotastic/datelife") # GitHub (heavy)Check what’s installed and reachable in your session:
pr_get_tree_status()
#> source installed version needs_network reachable
#> 1 rotl TRUE 3.1.1 TRUE NA
#> 2 rtrees TRUE 1.0.4 FALSE NA
#> 3 clootl TRUE 0.1.4 FALSE NA
#> 4 fishtree TRUE 0.3.4 TRUE NA
#> 5 datelife FALSE <NA> FALSE NA
#> install_hint source_repo
#> 1 install.packages("rotl") CRAN
#> 2 pak::pak("daijiang/rtrees") github::daijiang/rtrees
#> 3 pak::pak("eliotmiller/clootl") github::eliotmiller/clootl
#> 4 install.packages("fishtree") CRAN
#> 5 pak::pak("phylotastic/datelife") github::phylotastic/datelifePick a small species list. Retrieve the same set from two or three backends. Compare.
species <- c("Salmo salar", "Esox lucius", "Oncorhynchus mykiss",
"Gadus morhua", "Thunnus thynnus")
# Three backends that all cover fish
res_rotl <- pr_get_tree(species, source = "rotl")
res_fishtree <- pr_get_tree(species, source = "fishtree")
res_rtrees <- pr_get_tree(species, source = "rtrees", taxon = "fish")About the TNRS replaced ... warning.
pr_get_tree() runs the Open Tree of Life Taxonomic Name
Resolution Service (TNRS) on the input names before passing them to the
backend, so that minor spelling or capitalisation differences don’t
cause a name to fall out of the tree silently. When TNRS finds a
canonical form that differs from the input (e.g. an old binomial
replaced by the current synonym), the wrapper substitutes the canonical
form and emits a warning listing each replacement, so the substitution
is visible in the run log. The retrieval still succeeds; the warning is
informational.
The three results are independent. Now compare:
The print method shows up to three pairwise matrices, depending on what is computable for your set of trees. Reading them:
NA off-diagonal
means the shared-tip subtree has fewer than four taxa (Robinson-Foulds
is undefined below four shared tips), or one of the input trees could
not be pruned to the shared subtree without becoming degenerate.
Diagonal values are always 0 (a tree against itself).rotl trees (which carry
unit-length placeholder branches) are part of the comparison, the matrix
may collapse to NA and the print method drops it; that is why you
sometimes see only two matrices rather than three.Backends differ in coverage and in what n_tree > 1
does. Verified on a clean macOS R 4.4 install on 2026-05-01. Column
meanings:
source = "..." to pr_get_tree().pr_get_tree(..., n_tree = 1) (the default) return a single
ape::phylo object cleanly?n_tree > 1) — does the
backend support returning a posterior or sample of trees? Some backends
ship a single mega-tree and ignore n_tree; others support a
true posterior sample.| Backend | Single tree | Multi-tree (n_tree > 1) |
Notes |
|---|---|---|---|
rotl |
✅ | n/a — synthesis is a single tree | Returns lowercase tip labels (e.g. salmo salar) because
Open Tree taxonomy names are case-folded; reconcile against the tree if
your input mixes cases. |
rtrees |
✅ | ✅ for taxa whose mega-tree is a posterior
(e.g. taxon = "bird", "mammal" → 100 trees);
returns 1 tree for taxa with a single mega-tree
(e.g. taxon = "fish"). |
n_tree is informational only —
rtrees::get_tree() has no n_tree argument, so
the count is fixed by the chosen mega-tree. |
clootl |
✅ | ❌ unless the AvesData repo is installed (run
clootl::get_avesdata_repo(".") once).
n_tree = 1 calls clootl::extractTree() and
works out of the box; n_tree > 1 calls
clootl::sampleTrees(count = …) and needs AvesData. |
The single-tree path uses the v1.6 / 2025 taxonomy bundled with
clootl. Posterior sampling caps at 100 upstream. |
fishtree |
✅ | ✅ — n_tree > 1 switches to
fishtree::fishtree_complete_phylogeny() and returns the
requested count. |
Time-calibrated. |
datelife |
likely ✅ | likely ✅ | Untested in this run because datelife is in
Enhances (heavy Bioconductor / BOLD deps). Install with
pak::pak("phylotastic/datelife"). |
pr_date_tree() |
likely ✅ | likely ✅ | Same dependency story as datelife. |
If you hit a broken row above on a fresh install,
please open an issue at itchyshin/prepR4pcm
with your pr_get_tree_status() output and the error.
Phylogenetic comparative methods (PGLS with Pagel’s λ, Brownian motion, OU, phylogenetic meta-analysis) need branch lengths that correspond to time — not just topology. Backends differ in what branch lengths they produce:
| Backend | Branch lengths produced | Real time? | Taxonomic scope |
|---|---|---|---|
rotl (default) |
None — synthesis topology only | No | Universal |
rotl +
resolve_polytomies = TRUE, branch_lengths = "grafen" |
Grafen ρ = 1 arbitrary depths | No (arbitrary) | Universal |
fishtree |
Yes — divergence times from Rabosky et al. 2018 | Yes | Ray-finned fish |
clootl |
Yes — Bird Tree consensus branches | Yes | Birds (current Clements) |
rtrees (any taxon) |
Yes — branches from megatrees posterior |
Yes | Birds, mammals, fish, amphibians, reptiles, plants, sharks, bees, butterflies |
datelife |
Yes — SDM-summary chronograms or per-source candidates | Yes | Universal |
pr_date_tree(tree) |
Yes — calibrates your topology via DateLife | Yes | Universal |
Manual ape::chronos() with user calibration points |
Yes | Yes | Universal |
The decision tree, in practice:
fishtree; for birds,
mammals, amphibians, squamates, sharks, or plants use
rtrees with the matching taxon. These are
pre-computed, pre-dated, and need no extra dependencies.datelife for a real time-calibrated solution. Install
via pak::pak("phylotastic/datelife"). Heavy dependency tree
(Bioconductor, BOLD); usually works on macOS / Linux with system libs,
sometimes flaky on Windows.rotl + resolve_polytomies = TRUE, branch_lengths = "grafen"
for Grafen pseudo-time. Defensible for phylogenetic meta- analysis where
you mainly need a correlation structure, not real divergence times (this
is the pattern Cinar et al. 2022 and Pottier et al. 2022 use). See the
phylogenetic meta-analysis
vignette for a worked example.ape::chronos() if you
have published divergence-time estimates for a small set of nodes.pr_date_tree(your_tree) wraps
datelife::datelife_use() and has the same install
requirement as the datelife backend.If datelife is the option you want and it won’t install
on your system, the practical fallbacks (in roughly preferred order)
are: the dedicated taxon backend if one exists for your taxa; Grafen
pseudo-time for meta-analysis-style use; or hand calibration for small,
well-studied taxon sets.
A common question: “I want the VertLife / Upham et al. 2019 mammal posterior — which backend has it?”
Short answer: source = "rtrees" has
them. rtrees depends on the megatrees
package, which ships 100 randomly-sampled posterior trees from each of
the major VertLife datasets (mammals: Upham et al. 2019; amphibians:
Jetz & Pyron 2018; squamates: Tonini et al. 2016; sharks: Stein et
al. 2018; birds: Jetz et al. 2012). When you call
pr_get_tree(species, source = "rtrees", taxon = "mammal"),
the returned multiPhylo is exactly that 100-tree subset of
the VertLife posterior, grafted to your species set.
If you need the full 10,000-tree posterior (rather
than the 100-tree subset), you currently have to download the source
archive manually from vertlife.org (each archive is
0.5–2 GB). A future round may add a source = "vertlife"
backend that automates this caching step; for now, the 100-tree subset
via rtrees is what you get out of the box, which is
sufficient for the great majority of phylogenetic comparative
analyses.
The three backends use different conventions for tip labels:
rotl appends an Open Tree OTT id to
each tip (e.g. Salmo_salar_ott854188) and lower-cases the
binomial.fishtree returns the canonical
binomial with spaces (Salmo salar).rtrees returns the canonical binomial
with underscores (Salmo_salar).pr_tree_compare() matches tips by case- and
separator-folded binomial (it strips
_ott<digits>, converts underscores to spaces, and
lower-cases) before computing Jaccard, so you do not need to clean the
labels yourself for the comparison to work. The folded form is used only
for matching; the original tip labels are preserved in each tree.
If you intend to use one of the returned trees downstream (e.g. feed
it to pr_phylo_cor() and then to
metafor::rma.mv()), you will usually want to strip those
suffixes yourself with gsub("_ott\\d+", "", tree$tip.label)
first. See the meta-analysis-with-rotl vignette for that
workflow.
The backends return the same species set (Jaccard ≈ 1) and broadly
similar topologies (low Robinson-Foulds distance). Pick whichever
backend best matches your taxonomic / temporal needs. Document the
choice. Call pr_cite_tree() to capture the citation:
The backends disagree on which species are valid. Inspect the unique-to lists:
cmp$unique_to$rotl # species rotl placed but the others didn't
cmp$unique_to$fishtree # species fishtree placed but the others didn't
cmp$shared_tips # species placed by all backendsCommon causes:
Mitigation: re-run with tnrs = "always" to harmonise
names via Open Tree’s TNRS first. See ?pr_get_tree.
The trees agree on which species exist but disagree on how they’re related. This is the genuinely interesting case.
fishtree for ray-finned fish, clootl for
birds, and datelife for groups that need time-calibrated
branch lengths and have no taxon-specific tree.The companion package pigauto is built for this: hand it the multiPhylo of every backend’s tree, fit your model on each, and pool the results via Rubin’s rules. A single-tree analysis reports its confidence intervals as if the tree were known exactly — but it is not, and a different backend would have given a different tree. Pooling across the backends’ trees widens the intervals to absorb that tree-choice uncertainty, so the reported intervals are not falsely narrow.
The trees agree on topology but not on branch-length scale. Usually that is because one tree is time-calibrated (branch lengths in millions of years) and the other is not (branch lengths in unit-length placeholders, or in Grafen-units, or in some other non-time scale):
rotl returns the synthesis tree with no
calibration.rtrees grafts onto a reference tree; calibration is
whatever the reference uses.fishtree and clootl and
datelife all return time-calibrated trees.If your downstream model expects calibrated branches (most BM / OU /
lambda models do), use a calibrated backend. If the only calibrated
option for your taxon is datelife, see
?pr_date_tree for dating an existing topology.
Each retrieval is slow. If you’re going to compare backends, set
cache = TRUE so re-running the vignette is instant:
old_cache <- getOption("prepR4pcm.cache_dir", NULL)
tmp_cache <- tempfile("prepR4pcm-cache-")
pr_tree_cache_dir(tmp_cache)
res_rotl <- pr_get_tree(species, source = "rotl",
cache = TRUE)
res_fishtree <- pr_get_tree(species, source = "fishtree",
cache = TRUE)
res_rtrees <- pr_get_tree(species, source = "rtrees",
taxon = "fish", cache = TRUE)
# See what's cached
pr_tree_cache_status()
# Wipe just the rotl entries (e.g. after an OTT version refresh)
pr_tree_cache_clear(source = "rotl", confirm = FALSE)
options(prepR4pcm.cache_dir = old_cache)
unlink(tmp_cache, recursive = TRUE)By default the cache lives at tools::R_user_dir(). Pass
an explicit temporary or project cache directory if you want a separate
cache for a specific analysis:
pr_tree_compare() doesn’t do (yet)phytools::cophyloplot() or
phangorn::densiTree() for visual comparison of two or many
trees.phangorn::consensus(c(res_rotl$tree, res_fishtree$tree, res_rtrees$tree[[1]])).?pr_tree_compare — full reference for the comparison
function.?pr_get_tree — the main function for fetching a
tree.?pr_get_tree_status — lists which backends are
installed and reachable in your session.?pr_tree_cache_dir — manage the on-disk cache of
retrieved trees.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.