The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
We try to make this package as easy and intuitive to use as possible, but it is still often easiest to start with our vignette. Particularly useful are the “Function Names” and “Function Directory” sections.
The function names follow a consistent naming strategy, and mostly consist of 3 parts:
As a complete example, the function
BIEN_occurrence_species
returns occurrence records for a
given species (or set of species).
Currently we have 9 function families in BIEN. These are sets of functions that access a given type of data.
BIEN_occurrence_...
)BIEN_ranges_...
)BIEN_plot_...
)BIEN_trait_...
)BIEN_taxonomy_...
)BIEN_phylogeny_...
)BIEN_stem_...
)BIEN_list_...
)BIEN_metadata_...
)We’ll walk through each of the function families and take a look at some the options available within each.
These functions begin with the prefix
BIEN_occurrence_...
and allow you to query occurrences by
either taxonomy or geography. Functions include:
BIEN_occurrence_country
Returns all occurrence
records within a given country
BIEN_occurrence_state
Returns all occurrences
records within a given state/province
BIEN_occurrence_county
Returns all occurrences
records within a given state/province
BIEN_occurrence_family
Returns all occurrence
records for a specified family
BIEN_occurrence_genus
Returns all occurrence records
for a specified genus
BIEN_occurrence_species
Returns all occurrence
records for a specified species
Each of these functions has a number of different arguments that modify your query, either refining your search criteria or returning more data for each record. These arguments include:
cultivated
If TRUE
, records known to be
cultivated will be returned.
new.world
If TRUE
, records returned are
limited to those in North and South America, where greater data cleaning
and validation has been done. IF FALSE
, records will be
limited to the Old World. If NULL
(the default), global
records will be returned.
all.taxonomy
If TRUE
, the query will
return additional taxonomic data, including the uncorrected taxonomic
information for those records.
native.status
If TRUE
, additional
information will be returned regarding whether a species is native in a
given region.
natives.only
If TRUE
, the default,
information for occurrences flagged as introduced will not be
returned.
observation.type
If TRUE
, the query
will return whether each record is from either a plot or a specimen.
This may be useful if a user believes one type of information may be
more accurate.
political.boundaries
If TRUE
, the query
will return information on which country, state, etc. that an occurrence
is found within.
collection.info
If TRUE
, the quest will
return additional information about the collection and identification of
that specimen.
Example 1: Occurrence records for a species
Okay, enough reading. Let’s get some data.
Let’s say we’re interested in the species Xanthium
strumarium and we’d like some occurrence data. We’ll use the
function BIEN_occurrence_species
to grab the occurrence
data.
## Getting page 1 of records
Take a moment and view the dataframe and take a look at the structure
## 'data.frame': 2344 obs. of 10 variables:
## $ scrubbed_species_binomial : chr "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" "Xanthium strumarium" ...
## $ latitude : num 39 32.6 33.7 25.3 15.8 ...
## $ longitude : num -77.2 -110.8 -110.6 -105.7 -96 ...
## $ date_collected : Date, format: "1903-08-21" NA ...
## $ datasource : chr "GBIF" "ARIZ" "GBIF" "GBIF" ...
## $ dataset : chr "US" NA "ASU" "IBUNAM" ...
## $ dataowner : chr "US" NA "ASU" "IBUNAM" ...
## $ custodial_institution_codes: chr "US" NA "ASU" "IBUNAM" ...
## $ collection_code : chr "Botany" NA "Plants" "MEXU" ...
## $ datasource_id : num 4885 341 4193 2859 2859 ...
## scrubbed_species_binomial latitude longitude date_collected datasource
## 1 Xanthium strumarium 38.96940 -77.17670 1903-08-21 GBIF
## 2 Xanthium strumarium 32.61090 -110.77090 <NA> ARIZ
## 3 Xanthium strumarium 33.69000 -110.57000 1968-09-27 GBIF
## 4 Xanthium strumarium 25.33028 -105.71972 1989-09-10 GBIF
## 5 Xanthium strumarium 15.83722 -95.97417 1999-03-09 GBIF
## 6 Xanthium strumarium 38.39890 -76.99140 1945-08-26 GBIF
## dataset dataowner custodial_institution_codes collection_code datasource_id
## 1 US US US Botany 4885
## 2 <NA> <NA> <NA> <NA> 341
## 3 ASU ASU ASU Plants 4193
## 4 IBUNAM IBUNAM IBUNAM MEXU 2859
## 5 IBUNAM IBUNAM IBUNAM MEXU 2859
## 6 US US US Botany 4885
The default data that is returned consists of the latitude, longitude
and date collected, along with a set of attribution data. The meaning of
some of these columns is obvious (e.g. latitude, longitude), however
others may be less clear. The meanings of these columns and the
information within is explained in more detail in our data dictionary,
which can be accessed using the function
BIEN_metadata_data_dictionaries
.
If we want more information on these occurrences, we just need to change the arguments:
Xanthium_strumarium_full <- BIEN_occurrence_species(species = "Xanthium strumarium",
cultivated = TRUE,
all.taxonomy = TRUE,
native.status = TRUE,
observation.type = TRUE,
political.boundaries = TRUE)
We now have considerably more information.
If we want to take a quick look at where those occurrences are we could use:
# Make a quick map to plot our points on
map('world', fill = TRUE, col= "grey", bg = "light blue")
# Plot the points from the full query in red
points(cbind(Xanthium_strumarium_full$longitude,
Xanthium_strumarium_full$latitude),
col = "red",
pch = 20,
cex = 1)
# Plot the points from the default query in blue
points(cbind(Xanthium_strumarium$longitude,
Xanthium_strumarium$latitude),
col = "blue",
pch = 20,
cex = 1)
Example 2: Occurrence records for a country
Since we may be interested in a particular geographic area, rather than a particular set of species, there are also options to easily extract data by political region as well.
We’ll choose a relatively small region, the Bahamas, for our demonstration.
Bahamas <- BIEN_occurrence_country(country = "Bahamas")
#Let's see how many species we have
length(unique(Bahamas$scrubbed_species_binomial))
#About 400 species with valid occurrence records.
#Now, let's take a look at where those occurrences are:
map(regions = "Bahamas" ,
fill = TRUE ,
col= "grey",
bg = "light blue")
points(cbind(Bahamas$longitude,Bahamas$latitude),
col = "blue",
pch = 20,
cex = 1)
#Looks like some islands are considerably better sampled than others.
These functions begin with the prefix BIEN_ranges_...
and return (unsurprisingly) species ranges. Most of these functions work
by saving the downloaded ranges to a specified directory in shapefile
format, rather than by loading them into the R environment.
Functions include:
BIEN_ranges_species
Downloads range maps for given
species and save them to a specified directory.
BIEN_ranges_genus
Saves range maps for all species
within a genus to a specified directory.
BIEN_ranges_load_species
This function returns the
ranges for a set of species as a sf object.
BIEN_range_sf
This function returns all ranges that
intersect a user-specified sf object.
The range functions have different arguments than we have seen so far, including:
directory
This is where the function will be saving
the shapefiles you download
matched
If TRUE
, the function will
return a dataframe listing which species ranges were downloaded and
which weren’t.
match_names_only
If TRUE
, the function
will check whether a map is available for each species without actually
downloading it
include.gid
If TRUE
, the function will
append a unique gid number to each range map’s filename. This argument
is designed to allow forward compatibility when BIEN contains multiple
range maps for each species.
Example 3: Range maps and occurrence points
If we have a species we’re interested in, and would like to load the
range map into the environment, we can use the function
BIEN_ranges_load_species
. Let’s try this for Xanthium
strumarium.
## Getting page 1 of records
The range map is now in our global environment as an sf object. Let’s plot the map and see what it looks like.
#First, let's add a base map so that our range has some context:
map('world', fill = TRUE ,
col= "grey",
bg = "light blue",
xlim = c(-180, -20),
ylim = c(-60, 80))
#Now, we can add the range map:
plot(Xanthium_strumarium_range[1],
col = "green",
add = TRUE)
Now, let’s add those occurrence points from earlier to this map:
These functions begin with the prefix “BIEN_plot_” and return ecological plot data. Functions include:
BIEN_plot_list_sampling_protocols
Returns the
different plot sampling protocols found in the BIEN database.
BIEN_plot_list_datasource
Returns the different
datasources that are available in the BIEN database.
BIEN_plot_sampling_protocol
Downloads data for a
specified sampling protocol
BIEN_plot_datasource
Downloads data for a specific
datasource
BIEN_plot_country
BIEN_plot_state
BIEN_plot_dataset
Downloads data for a given dataset
(which is nested within a datasource)
BIEN_plot_name
Downloads data for a specific plot
name (these are nested within a given dataset)
Again we have some of the same arguments available for these queries
that we saw for the occurrence functions. We also have the new argument
all.metadata
, which causes the functions to return
more metadata for each plot.
Example 4: Plot data by plot name
Let’s take a look at the data for an individual plot.
We can see that this is a 0.1 hectare transect where stems >= 2.5 cm diameter at breast height were included. If we’d like more detail, we can use additional arguments:
LUQUILLO_full <- BIEN_plot_name(plot.name = "LUQUILLO",
cultivated = TRUE,
all.taxonomy = TRUE,
native.status = TRUE,
political.boundaries = TRUE,
all.metadata = TRUE)
The dataframe LUQUILLO_full
contains more useful
information, including metadata on which taxa were included, which
growth forms were included and information on whether species are known
to be native or introduced.
These functions begin with the prefix BIEN_trait_...
and
access the BIEN trait database. Note that the spelling of the trait
names must be precise, so we recommend using the function
BIEN_trait_list
first. Traits names are standardized to
follow https://www.top-thesaurus.org/ where available. Trait
units have been standardized for each trait.
Functions include:
BIEN_trait_list
Start with this. It returns a
dataframe of the traits available.
BIEN_trait_family
Returns a dataframe of all trait
data for a given family (or families).
BIEN_trait_genus
BIEN_trait_species
BIEN_trait_trait
Downloads all records of a
specified trait (or traits).
BIEN_trait_mean
Estimates species mean trait values
using genus or family level means where species-level data is
absent.
BIEN_trait_traitbyfamily
Downloads data for a given
family (or families) and trait(s).
BIEN_trait_traitbygenus
BIEN_trait_traitbyspecies
Example 5: Accessing trait data
If you’re interested in accessing all traits for a taxon, say the genus Salix, just go ahead and use the corresponding function:
If instead we’re interested in a particular trait, the first step is
to check if that trait is present and verify the spelling using the
function BIEN_trait_list
.
If we’re interested in leaf area, we see that this is indeed called
“leaf area” in the database. Now that we know the proper spelling, we
can use the function BIEN_trait_trait
to download all
observations of that trait.
Note that the units have been standardized and that there is a full set of attribution data for each trait.
While there are existing packages that query taxonomic data (e.g. those included in the excellent taxize package), the BIEN taxonomy functions access the taxonomic information that underlies the BIEN database, ensuring consistency.
BIEN_taxonomy_family
Downloads all taxonomic
information for a given family.
BIEN_taxonomy_genus
BIEN_taxonomy_species
Example 6: Taxonomic data
Let’s say we’re interested in the genus Asclepias, and we’d like to get an idea of how many species there are in this genus and what higher taxa it falls within.
## Getting page 1 of records
#We see that the genus Asclepias falls within the family Apocynaceae and the order Gentianales.
#You'll also notice that a given species may appear more than once (due to multiple circumscriptions, some of which may be illegitimate).
#If we'd just like to know all the speciess that aren't illegitimate:
Asclepias_species <- unique(Asclepias_taxonomy$scrubbed_species_binomial[Asclepias_taxonomy$scrubbed_taxonomic_status %in% c("accepted", "no opinion")])
The BIEN database currently contains 101 phylogenies for new world plants. This includes 100 replicated phylogenies that include a large fraction of New World plant species (“complete phylogenies”) and 1 phylogeny containing only those New World plant species for which molecular data were available (“conservative phylogeny”). Currently, there are only 2 functions available:
BIEN_phylogeny_complete
This function will return a
specified number of the replicated “complete” phylogenies. Note that
each phylogeny is several Mb in size, so downloading many may take a
while on slow connections.
BIEN_phylogeny_conservative
This function returns
the conservative phylogeny.
Arguments: The function BIEN_phylogeny_complete
has a
few arguments that are worth explaining:
n_phylogenies
This is the number of replicated
phylogenies that you want to download (between 1 and 100)
seed
This function sets the seed for the random
number generator before randomly drawing the phylogenies to be
downloaded. This is useful for replicating analyses.
replicates
This function allows you to specify WHICH
of the 100 phylogenies to download, rather than having them selected
randomly.
Example 7: Phylogenies
Let’s say we want to download the conservative phylogeny.
The BIEN database contains stem data associated with many of the plots. This is typically either diameter at breast height or diameter at ground height. At present, there is only one stem function (although expect more in the future):
BIEN_stem_species
This function downloads all of the
stem data for a given species (or set of species)BIEN_stem_genus
BIEN_stem_family
BIEN_stem_datasource
This function downloads all of the
stem data for a given datasource.Arguments:
The arguments for this function are the same that we have seen in the occurrence and plot functions.
Example 8: Stem data
If we’d like stem data for the species Cupressus arizonica
These functions begin with the prefix BIEN_list_
and
allow you to quickly get a list of all the species in a geographic unit.
Functions include:
BIEN_list_country
Returns all species found within a
country.
BIEN_list_state
Returns all species found within a
given state/province or other 2nd level political division.
BIEN_list_county
Returns all species found within a
given county/parish/or other 3rd level political division.
Some of the same arguments we saw in the occurrence functions appear
here as well, including cultivated
and
new.world
.
Example 9: Species list for a country
Let’s return to our previous example. What if we just need a list of
the species in the Bahamas, rather than the specific details of each
occurrence record? We can instead use the function
BIEN_list_country
to download a list of species, which
should be much faster than using BIEN_occurrence_country
to
get a species list.
Bahamas_species_list <- BIEN_list_country(country = "Bahamas")
#Notice that we find many more species listed than we found occurrence records for. What happened? There are many records coming from the Bahamas that lack coordinates. These records are used used in the "_list_" functions, but not the occurrence functions.
If we wanted to retrieve the results for multiple countries at once, that is simple as well. We just need to supply a vector of countries.
country_vector <- c("Haiti","Dominican Republic")
Haiti_DR <- BIEN_list_country(country = country_vector)
We can also use political division codes (from geonames.org) instead of writing out the full country names.
#To see all of the political division names, and associated codes, we can use this function:
political_names <- BIEN_metadata_list_political_names()
## Getting page 1 of records
## Getting page 2 of records
## Getting page 3 of records
## Getting page 4 of records
## Getting page 5 of records
## country country_iso state_province state_province_ascii state_code
## 40001 Kazakhstan KZ South Kazakhstan South Kazakhstan 10
## 40002 Kazakhstan KZ South Kazakhstan South Kazakhstan 10
## 40003 Kazakhstan KZ South Kazakhstan South Kazakhstan 10
## 40004 Kazakhstan KZ South Kazakhstan South Kazakhstan 10
## 40005 Kazakhstan KZ South Kazakhstan South Kazakhstan 10
## 40006 Kazakhstan KZ South Kazakhstan South Kazakhstan 10
## county_parish county_parish_ascii county_code
## 40001 Shymkent Qalasy Shymkent Qalasy 1537272
## 40002 Türkistan Qalasy Turkistan Qalasy 1537276
## 40003 Shardara Audany Shardara Audany 1524885
## 40004 Töle Bi Audany Tole Bi Audany 1521378
## 40005 Qazyqurt Audany Qazyqurt Audany 1521354
## 40006 Sozaq Audany Sozaq Audany 1518678
#In addition to the standardized country, state (state_province_ascii) and county (county_parish_ascii) names, we have the associated codes that can be used in BIEN functions.
#Note that 'state' refers to any primary political division (e.g. province), and 'county' refers to any secondary political division (e.g. parish).
#Looking at the political_names dataframe, we see that the Dominican Republic has country code "DO", and Haiti has country code "HT"
Haiti_DR_from_codes <- BIEN_list_country(country.code = c("HT","DO"))
## Getting page 1 of records
## Getting page 2 of records
The BIEN metadata functions start with the prefix
BIEN_metadata_...
and provide useful metadata for the BIEN
database.
BIEN_metadata_database_version
Returns the current
version number of the BIEN database and the release date.
BIEN_metadata_match_data
Rudimentary function to
check for changed records between old and current queries.
BIEN_metadata_citation
Function to generate bibtex
citations for use in reference managers.
BIEN_metadata_list_political_names
Returns a
dataframe containing political division names and associate
codes.
Example 10: Metadata
To check what the current version of the BIEN database is (which we recommend reporting when using BIEN data):
## Getting page 1 of records
## db_version db_release_date
## 1 4.2.8 2023-06-27
Example 11: Citations
One of the more innovative features of the BIEN package is that it will generate custom attribution data for you based on what data you downloaded through the package.
Let’s say we’re interested in Selaginella selaginoides, and we’d like to download some occurrence data:
Selaginella_selaginoides_occurrences <- BIEN_occurrence_species("Selaginella selaginoides", new.world = NULL)
If we plan on using those data in a publication, we’ll need proper
attribution. We can use BIEN_metadata_citation
to do this
for us:
citation_info is a list that contains 3 elements: 1. A bit of general information on how to use the list. 2. A set of bibtex formatted references. 3. Acknowledgement text.
To make things even easier on ourselves, we can use some of the
additional functionality of the BIEN_metadata_citation
function:
temp_dir <- file.path(tempdir(), "BIEN_temp") #Set a temporary working directory
citation_info <- BIEN_metadata_citation(dataframe = Selaginella_selaginoides_occurrences,
bibtex_file = file.path(temp_dir,"selaginella_selaginoides.bib"),
acknowledgement_file = file.path(temp_dir,"selaginella_selaginoides.txt"))
Now, we have a bibtex file,
selaginella_selaginoides.bib
, that can be loaded into a
reference manager (e.g. Endnote, Paperpile, etc.), and a text file,
selaginella_selaginoides.txt
, containing text that can be
pasted into the acknowledgements section of a publication.
What if we also have some trait data? No problem there, the code handles that as well:
#First, let's get some trait data:
selaginella_selaginoides_traits <- BIEN_trait_species(species = "Selaginella selaginoides")
#Now, we just need to modify our previous bit of code to include the trait data as well:
temp_dir <- file.path(tempdir(), "BIEN_temp")
citation_info <- BIEN_metadata_citation(dataframe = Selaginella_selaginoides_occurrences,
trait.dataframe = selaginella_selaginoides_traits,
bibtex_file = file.path(temp_dir,"selaginella_selaginoides.bib"),
acknowledgement_file = file.path(temp_dir,"selaginella_selaginoides.txt"))
The updated citation information will now contain references for both trait and occurrence records.
Example 11: Putting it all together Coming soon!
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.