rotl
provides an interface to the Open Tree of Life (OTL) API and allows users to query the API, retrieve parts of the Tree of Life and integrate these parts with other R packages.
The OTL API provides services to access:
ott ids
).In rotl
, each of these services correspond to functions with different prefixes:
Service | rotl prefix |
---|---|
Tree of Life | tol_ |
Graph of Life | gol_ |
TNRS | tnrs_ |
Studies | studies_ |
rotl
also provides a few other functions that can be used to extract relevant information from the objects returned by these functions.
The most common use for rotl
is probably to start from a list of species and get the relevant parts of the tree for these species. This is a two step process:
ott_id
(the Open Tree Taxonomy identifiers) using the Taxonomic name resolution services (TNRS)ott_id
will then be used to retrieve the relevant parts of the Tree of Life.ott_id
Let’s start by doing a search on a diverse group of taxa: a tree frog (genus Hyla), a fish (genus Salmo), a sea urchin (genus Diadema), and a nautilus (genus Nautilus).
library(rotl)
taxa <- c("Hyla", "Salmo", "Diadema", "Nautilus")
resolved_names <- tnrs_match_names(taxa)
It’s always a good idea to check that the resolved names match what you intended:
search_string | unique_name | approximate_match | ott_id | is_synonym | is_deprecated | number_matches |
---|---|---|---|---|---|---|
hyla | Hyla | FALSE | 1062216 | FALSE | FALSE | 1 |
salmo | Salmo | FALSE | 982359 | FALSE | FALSE | 1 |
diadema | Diadema (genus in Holozoa) | FALSE | 631176 | FALSE | FALSE | 2 |
nautilus | Nautilus | FALSE | 616358 | FALSE | FALSE | 1 |
The column unique_name
sometimes indicates the higher taxonomic level associated with the name. The column number_matches
indicates the number of ott_id
that corresponds to a given name. In this example, our search on Diadema returns 2 matches, and the one returned by default is indeed the sea urchin that we want for our query. The argument context_name
allows you to limit the taxonomic scope of your search. Diadema is also the genus name of a fungus. To ensure that our search is limited to animal names, we could do:
resolved_names <- tnrs_match_names(taxa, context_name = "Animals")
If you are trying to build a tree with deeply divergent taxa that the argument context_name
cannot fix, see “How to change the ott ids assigned to my taxa?” in the FAQ below.
Now that we have the correct ott_id
for our taxa, we can ask for the tree using the tol_induced_subtree()
function. By default, the object returned by tol_induced_subtree
is a phylo object (from the ape package), so we can plot it directly.
my_tree <- tol_induced_subtree(ott_ids = resolved_names$ott_id)
plot(my_tree, no.margin=TRUE)
If you realize that tnrs_match_names
assigns the incorrect taxonomic group to your name (e.g., because of synonymy) and changing the context_name
does not help, you can use the function inspect
. This function takes the object resulting from tnrs_match_names()
, and either the row number, the taxon name (you used in your search in lowercase), or the ott_id
returned by the initial query.
To illustrate this, let’s re-use the previous query but this time pretending that we are interested in the fungus Diadema and not the sea urchin:
taxa <- c("Hyla", "Salmo", "Diadema", "Nautilus")
resolved_names <- tnrs_match_names(taxa)
resolved_names
## search_string unique_name approximate_match ott_id
## 1 hyla Hyla FALSE 1062216
## 2 salmo Salmo FALSE 982359
## 3 diadema Diadema (genus in Holozoa) FALSE 631176
## 4 nautilus Nautilus FALSE 616358
## is_synonym is_deprecated number_matches
## 1 FALSE FALSE 1
## 2 FALSE FALSE 1
## 3 FALSE FALSE 2
## 4 FALSE FALSE 1
inspect(resolved_names, taxon_name = "diadema")
## search_string unique_name approximate_match ott_id
## 1 diadema Diadema (genus in Holozoa) FALSE 631176
## 2 diadema Diadema (genus in Nucletmycea) FALSE 4930522
## is_synonym is_deprecated number_matches
## 1 FALSE FALSE 2
## 2 FALSE FALSE 2
In our case, we want the second row in this data frame to replace the information that initially matched for Diadema. We can now use the update()
function, to change to the correct taxa (the fungus not the sea urchin):
resolved_names <- update(resolved_names, taxon_name = "diadema",
new_row_number = 2)
## we could also have used the ott_id to replace this taxon:
## resolved_names <- update(resolved_names, taxon_name = "diadema",
## new_ott_id = 4930522)
And now our resolved_names
data frame includes the taxon we want:
search_string | unique_name | approximate_match | ott_id | is_synonym | is_deprecated | number_matches |
---|---|---|---|---|---|---|
hyla | Hyla | FALSE | 1062216 | FALSE | FALSE | 1 |
salmo | Salmo | FALSE | 982359 | FALSE | FALSE | 1 |
diadema | Diadema (genus in Nucletmycea) | FALSE | 4930522 | FALSE | FALSE | 2 |
nautilus | Nautilus | FALSE | 616358 | FALSE | FALSE | 1 |
The function taxonomy_taxon()
takes ott_ids
as arguments and returns taxonomic information about the taxa. This output can be passed to some helpers functions to extract the relevant information. Let’s illustrate this with our Diadema example
diadema_info <- taxonomy_taxon(631176)
tax_rank(diadema_info)
## 631176
## "genus"
synonyms(diadema_info)
## $`631176`
## [1] "Diamema" "Centrechinus (Diadema)"
## [3] "Cidaris (Diadema)" "Diadema"
## [5] "Centrechinus"
ott_taxon_name(diadema_info)
## 631176
## "Diadema"
In some cases, it might also be useful to investigate the taxonomic tree descending from an ott_id
to check that it’s the correct taxon and to determine the species included in the Open Tree Taxonomy:
diadema_tax_tree <- taxonomy_subtree(631176)
diadema_tax_tree
## $tip_label
## [1] "Diadema_principeana_ott5725746"
## [2] "Diadema_vetus_ott5725747"
## [3] "Diadema_africana_ott5502180"
## [4] "Diadema_sp._CS-2014_ott5502179"
## [5] "Diadema_pseudodiadema_ott4950421"
## [6] "Diadema_lobatum_ott4950422"
## [7] "Diadema_ascensionis_ott4950423"
## [8] "Diadema_africanum_ott4147369"
## [9] "Diadema_antillarum_antillarum_ott4147370"
## [10] "Diadema_antillarum_scensionis_ott220009"
## [11] "Diadema_palmeri_ott836860"
## [12] "Diadema_sp._DSM6_ott771059"
## [13] "Diadema_mexicanum_ott639130"
## [14] "Diadema_setosum_ott631175"
## [15] "Diadema_sp._SETO15_ott587479"
## [16] "Diadema_sp._seto17_ott587478"
## [17] "Diadema_sp._DSM7_ott587487"
## [18] "Diadema_sp._DSM8_ott587486"
## [19] "Diadema_sp._seto9_ott587485"
## [20] "Diadema_sp._seto10_ott587484"
## [21] "Diadema_sp._DSM2_ott587483"
## [22] "Diadema_sp._DSM3_ott587482"
## [23] "Diadema_sp._DSM4_ott587481"
## [24] "Diadema_sp._dsm5_ott587480"
## [25] "Diadema_savignyi_ott395692"
## [26] "Diadema_paucispinum_ott312263"
## [27] "Diadema_sp._seto16_ott312262"
## [28] "Diadema_sp._DSM1_ott219999"
## [29] "Diadema_sp._DJN9_ott66626"
## [30] "Diadema_sp._seto19_ott66624"
## [31] "Diadema_sp._seto38_ott66625"
## [32] "Diadema_sp._seto18_ott66623"
## [33] "Diadema_sp._seto35_ott66618"
##
## $edge_label
## [1] "Diadema_antillarum_ott1022356" "Diadema_ott631176"
By default, this function return all taxa (including self, and internal) descending from this ott_id
but it also possible to return phylo
object.
If you are looking to get the tree for a particular taxonomic group, you need to first identify it by its node id or ott id, and then use the tol_subtree()
function:
mono_id <- tnrs_match_names("Monotremes")
mono_tree <- tol_subtree(ott_id = mono_id$ott_id[1])
plot(mono_tree)
The function studies_find_trees()
allows the user to search for studies matching a specific criteria. The function studies_properties()
returns the list of properties that can be used in the search.
furry_studies <- studies_find_studies(property="ot:focalCladeOTTTaxonName", value="Mammalia")
furry_ids <- unlist(furry_studies$matched_studies)
Now that we know the study_id
, we can ask for the meta data information associated with this study:
furry_meta <- get_study_meta("pg_2550")
get_publication(furry_meta) ## The citation for the source of the study
## [1] "O'Leary, Maureen A., Marc Allard, Michael J. Novacek, Jin Meng, and John Gatesy. 2004. \"Building the mammalian sector of the tree of life: Combining different data and a discussion of divergence times for placental mammals.\" In: Cracraft J., & Donoghue M., eds. Assembling the Tree of Life. pp. 490-516. Oxford, United Kingdom, Oxford University Press."
## attr(,"DOI")
## [1] ""
get_tree_ids(furry_meta) ## This study has 10 trees associated with it
## [1] "tree5513" "tree5515" "tree5516" "tree5517" "tree5518" "tree5519"
## [7] "tree5520" "tree5521" "tree5522" "tree5523"
candidate_for_synth(furry_meta) ## None of these trees are yet included in the OTL
## NULL
Using get_study("pg_2550")
would returns a multiPhylo
object (default) with all the trees associated with this particular study, while get_study_tree("pg_2550", "tree5513")
would return one of these trees.
You may encounter the following error message:
Error in rncl(file = file, ...) : Taxon number 39 (coded by the token Pratia
angulata) has already been encountered in this tree. Duplication of taxa in a
tree is prohibited.
This message occurs as duplicate labels are not allowed in the NEXUS format and it is stricly enforced by the part of the code used by rotl
to import the trees in memory.
If you use a version of rotl
more recent than 0.4.1, this should not happen by default for the function get_study_tree
. If it happens with another function, please let us know.
The easiest way to work around this is to save the tree in a file, and use APE to read it in memory:
get_study_tree(study_id="pg_710", tree_id="tree1277",
tip_label='ott_taxon_name', file = "/tmp/tree.tre",
file_format = "newick")
tr <- ape::read.tree(file = "/tmp/tree.tre")
Some taxonomic names that can be retrieved through the taxonomic name resolution service are not part of the Open Tree’s synthesis tree. These are usually traditional higher-level taxa that have been found to be paraphyletic.
For instance, if you wanted to fetch a tree relating the three birds that go into a Turkducken you might search for the turkey, “duck” and chicken genera:
turducken <- c("Meleagris", "Anas", "Gallus")
taxa <- tnrs_match_names(turducken, context="Animals")
taxa
## search_string unique_name approximate_match ott_id
## 1 meleagris Meleagris (genus in Protostomia) FALSE 5665324
## 2 anas Anas FALSE 765185
## 3 gallus Gallus (genus in Protostomia) FALSE 5295932
## is_synonym is_deprecated number_matches
## 1 FALSE FALSE 2
## 2 FALSE FALSE 1
## 3 FALSE FALSE 2
Looks good, we have IDS for each genus. But if we try to get a subtree from the Open Tree we get an error:
tr <- tol_induced_subtree(taxa$ott_id)
What’s going on? It turns out that Meleagris and Anas are not included in the synthetic tree (the first because of uncertainty about the rank for that name and the second because Anas as normally defined is not monophyletic).
The best way to avoid these problems is to specify complete species names (species being the lowest level of classification in the Open Tree taxonomy they are guaranteed to be monophyletic):
turducken_spp <- c("Meleagris gallopavo", "Anas platyrhynchos", "Gallus gallus")
taxa <- tnrs_match_names(turducken_spp, context="Animals")
tr <- tol_induced_subtree(taxa$ott_id)
plot(tr)