Type: | Package |
Title: | Download and Extract Data from US EPA's ECOTOX Database |
Version: | 1.2.1 |
Author: | Pepijn de Vries |
Maintainer: | Pepijn de Vries <pepijn.devries@outlook.com> |
Description: | The US EPA ECOTOX database is a freely available database with a treasure of aquatic and terrestrial ecotoxicological data. As the online search interface doesn't come with an API, this package provides the means to easily access and search the database in R. To this end, all raw tables are downloaded from the EPA website and stored in a local SQLite database <doi:10.1016/j.chemosphere.2024.143078>. |
Depends: | R (≥ 4.1.0), RSQLite (≥ 2.3.4) |
Imports: | crayon (≥ 1.5.2), dbplyr (≥ 2.4.0), dplyr (≥ 1.1.4), httr2 (≥ 1.0.0), jsonlite (≥ 1.8.8), lifecycle (≥ 1.0.4), purrr (≥ 1.0.2), rappdirs (≥ 0.3.3), readr (≥ 2.1.4), readxl (≥ 1.4.3), rlang (≥ 1.1.2), rvest (≥ 1.0.3), stringr (≥ 1.5.1), tibble (≥ 3.2.1), tidyr (≥ 1.3.0), tidyselect (≥ 1.2.0), units (≥ 0.8.5), utils |
Suggests: | DBI, htmltools, kableExtra, knitr, rmarkdown, standartox, testthat (≥ 3.0.0), webchem |
URL: | https://github.com/pepijn-devries/ECOTOXr, https://pepijn-devries.github.io/ECOTOXr/, https://doi.org/10.1016/j.chemosphere.2024.143078 |
BugReports: | https://github.com/pepijn-devries/ECOTOXr/issues |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
Collate: | 'ECOTOXr-package.r' 'cas_handlers.r' 'database_access.r' 'helpers.r' 'imports.r' 'init.r' 'online.r' 'process_date.r' 'process_unit.r' 'process_numeric.r' 'wrappers.r' |
NeedsCompilation: | no |
Packaged: | 2025-04-08 07:29:48 UTC; vries171 |
Repository: | CRAN |
Date/Publication: | 2025-04-08 10:20:07 UTC |
Package description
Description
Everything you need to know when you start using the ECOTOXr package.
Details
The ECOTOXr provides the means to efficiently search, extract and analyse US EPA ECOTOX data, with a focus on reproducible results. Although the package creator/maintainer is confident in the quality of this software, it is the end users sole responsibility to assure the quality of his or her work while using this software. As per the provided license terms the package maintainer is not liable for any damage resulting from its usage. That being said, below we present some tips for generating reproducible results with this package.
How do I get started?
Installing this package is only the first step to get things started. You need to perform the following steps in order to use the package to its full capacity.
First download a copy of the complete EPA database. This can be done by calling
download_ecotox_data()
. This may not always work on all machines as R does not always accept the website SSL certificate from the EPA. In those cases the zipped archive with the database files can be downloaded manually with a different (more forgiving) browser. The files from the zip archive can be extracted to a location of choice. Alternatively, the user could try to use[download_ecotox_data](ssl_verifypeer = 0L)
when the download URL is trusted.Next, an SQLite database needs to be build from the downloaded files. This will be done automatically when you used
download_ecotox_data()
in the previous step. When you have manually downloaded the files you can callbuild_ecotox_sqlite()
to build the database locally.When the previous steps have been performed successfully, you can now search the database by calling
search_ecotox()
. You can also usedbConnectEcotox()
to open a connection to the database. You can query the database using this connection and any of the methods provided from the DBI or RSQLite packages.
How do I obtain reproducible results?
Each individual user is responsible for evaluating the reproducibility of his or her work. Although this package offers instruments to achieve reproducibility, it is not guaranteed. In order to increase the chances of generating reproducible results, one should adhere at least to the following rules:
Always use an official release from CRAN, and cite the version used in your analyses (
citation("ECOTOXr")
). Different versions, may produce different end results (although we will strive for backward compatibility).Make sure you are working with a clean (unaltered) version of the database. When in doubt, download and build a fresh copy of the database (
download_ecotox_data()
). Also cite the (release) version of the downloaded database (cite_ecotox()
), and the system operating system in which the local database was buildget_ecotox_info()
). Or, just make sure that you never modify the database (e.g., write data to it, delete data from it, etc.)In order to avoid platform dependencies it is advised to only include non-accented alpha-numerical characters in search terms. See also search_ecotox and build_ecotox_sqlite.
When trying to reproduce database extractions from earlier database releases, filter out additions after that specific release. This can be done by adding output fields 'tests.modified_date', 'tests.created_date' and 'tests.published_date' to your search and compare those with the release date of the database you are trying to reproduce results from.
Why isn't the database included in the package?
This package doesn't come bundled with a copy of the database which needs to be downloaded the first time the package is used. Why is this? There are several reasons:
The database is maintained and updated by the US EPA. This process is and should be outside the sphere of influence of the package maintainer.
Packages on CRAN are not allowed to contain large amounts of data. Publication on CRAN is key to control the quality of this package and therefore outweighs the convenience of having the data bundled with the package.
The user has full control over the release version of the database that is being used.
Why does this package promotes using a local copy of the ECOTOX database?
Although this package offers experimental features for searching online, there are several reasons why we opted for creating a local copy:
The user would be restricted to the search options provided on the website (ECOTOX).
The online database doesn't come with an API that would allow for convenient interface. This is why features implemented in this package are experimental.
The user is not limited by an internet connection and its bandwidth.
Not all database fields, and only a limited number of records, can be retrieved from the online interface.
Author(s)
Maintainer: Pepijn de Vries pepijn.devries@outlook.com (ORCID) [data contributor]
References
Official US EPA ECOTOX website: https://cfpub.epa.gov/ecotox/
Olker, J.H., Elonen, C.M., Pilli, A., Anderson, A., Kinziger, B., Erickson, S., Skopinski, M., Pomplun, A., LaLone, C.A., Russom, C.L. and Hoff, D. (2022), The ECOTOXicology Knowledgebase: A Curated Database of Ecologically Relevant Toxicity Tests to Support Environmental Research and Risk Assessment. Environ Toxicol Chem, 41: 1520-1539.
See Also
Useful links:
Report bugs at https://github.com/pepijn-devries/ECOTOXr/issues
Values represented by ECOTOX character
to dates
Description
Similar to
as.Date()
, but it also
performs some text sanitising before coercing text to dates.
Usage
as_date_ecotox(x, dd = 1L, mm = 1L, nr = 1L, ..., warn = TRUE)
Arguments
x |
A vector of |
dd |
Replacement values for unspecified days in a date. Defaults to |
mm |
Replacement values for unspecified months in a date. Defaults to |
nr |
Replacement values for generically unspecified values in a date.
Defaults to |
... |
Passed to |
warn |
If set to |
Details
The following steps are performed (in the order as listed) to sanitise text before coercing it to numerics:
Trim whitespaces
Replace hyphens with forward slashes
Replace double forward slashes, forward slashes followed by a zero and spaces, with a single forward slash
Replace
"mm"
or"dd"
(case insensitive) with the value specified as argument. Add a forward slash to it when missing.Treat
"na"
,"nr"
,"xx"
and"00"
(case insensitive) as unreported values when followed by a forward slash. Replace it with thenr
argumentRemove alphabetical characters when directly followed by a numerical character.
Replace literal month names with its numerical calendar value (1-12).
When the date consists of one value, assume it is a calender year and add
dd
andmm
as day and month value.If a date consists of two numbers, assume it is month, followed by year. In that case insert the
dd
value for the day.
It is your own responsibility to check if the sanitising steps are appropriate for your analyses.
Value
A vector of Date
class objects with the same length as x
.
Author(s)
Pepijn de Vries
See Also
Other ecotox-sanitisers:
as_numeric_ecotox()
,
as_unit_ecotox()
,
mixed_to_single_unit()
,
process_ecotox_dates()
,
process_ecotox_numerics()
,
process_ecotox_units()
Examples
## a vector of commonly used notations in the database to represent
## dates. Most frequent format is %m/%d/%Y
char_date <- c("5-19-1987 ", "5/dd/2021", "3/19/yyyy", "1985", "mm/19/1999",
"October 2004", "nr/nr/2015")
as_date_ecotox(char_date)
## Set unspecified days to 15:
as_date_ecotox(char_date, dd = 15L)
## Unspecified days should result in NA:
as_date_ecotox(char_date, dd = -1L)
## Set unspecified months to 6:
as_date_ecotox(char_date, mm = 6L)
## Set generically unspecified value to 6:
as_date_ecotox(char_date, nr = 6L)
Values represented by ECOTOX character
to numeric
Description
Similar to
as.numeric()
, but it also
performs some text sanitising before coercing text to numerics.
Usage
as_numeric_ecotox(x, range_fun = NULL, ..., warn = TRUE)
Arguments
x |
A vector of |
range_fun |
Function to summarise range values. If |
... |
Arguments passed to |
warn |
If set to |
Details
The following steps are performed to sanitise text before coercing it to numerics:
Notes labelled with
"x"
or"\*"
are removed.Operators (
">"
,">="
,"<"
,"<="
,"~"
,"="
,"ca"
,"er"
) are removed.Text between brackets (
"()"
) is removed (including the brackets)Comma's are considered to be a thousand separator when they are located at any fourth character (from the right) and removed. Comma's at any other location is assumed to be a decimal separator and is replaced by a period.
If there is a hyphen present (not preceded by an "
"e"
or"E"
) it is probably representing a range of values. Whenrange_fun
isNULL
it will result in aNA
. Otherwise, the numbers are split at the hyphen and aggregated withrange_fun
It is your own responsibility to check if the sanitising steps are appropriate for your analyses.
Value
A vector of numeric
values with the same length as x
.
Author(s)
Pepijn de Vries
See Also
Other ecotox-sanitisers:
as_date_ecotox()
,
as_unit_ecotox()
,
mixed_to_single_unit()
,
process_ecotox_dates()
,
process_ecotox_numerics()
,
process_ecotox_units()
Examples
## a vector of commonly used notations in the database to represent
## numeric values
char_num <- c("10", " 2", "3 ", "~5", "9.2*", "2,33",
"2,333", "2.1(1.0 - 3.2)", "1-5", "1e-3")
## Text fields reported as ranges are returned as `NA`:
as_numeric_ecotox(char_num, warn = FALSE)
## Text fields reported as ranges are processed with `range_fun`
as_numeric_ecotox(char_num, range_fun = median)
Text from the ECOTOX database to mixed_units
Description
Convert text to units after
sanitising.
Usage
as_unit_ecotox(
x,
type = c("concentration", "duration", "length", "media", "application", "size",
"weight", "unknown"),
...,
warn = TRUE
)
Arguments
x |
A vector of |
type |
The type of unit that can help the sanitation process. See the 'usage'
section for available options. These options are linked to the different unit tables
in the database (see |
... |
Ignored. |
warn |
If set to |
Details
The following steps are performed (in the order as listed) to sanitise text before coercing it to units:
The following is removed:
Leading/trailing white spaces
Square brackets and commas
A list of common prefixes
Double spaces are replaced by single spaces
Brackets around multiply symbol
The following is corrected/adjusted:
'for' is interpreted as multiplication
Scientific notation of numbers is standardised where possible.
A list of ambiguous patterns is replaced with more explicit strings. For instance, 'deg' is replaced with 'degree'.
The following miscellaneous corrections are made:
A list of 'known' annotations are removed from the units
A list of elements kown to represent counts are renamed 'counts'.
Percentages are renamed as explicit concentration in mass per volume or volume per volume units where possible
'CI' is renamed 'Curies'.
'M' is renamed 'mol/L'.
Units expressed as 'parts per ...' are explicitly renamed to mass over volume, or volum over volume where possible
Type specific sanitation steps
Concentration units:
'K' is renamed 'Karmen'
'dpm' is renamed 'counts/min' (i.e., disintegrations per minute)
Media units:
'K' is renamed 'Kelvin'
'C' is renamed 'Celsius'
Some final miscellaneous adjustments:
Scientific notation in numbers is not supported by the units package. Numbers are formatted in decimal notation where possible.
Spaces are removed if preceded by numeric and followed by alphabetical character
All equivalents of ambiguous synonyms for time units are explicitly renamed to their respective unit (e.g., 'dph' (days post hatching) -> 'day')
unreported/missing units are renamed 'unit'
It is your own responsibility to check if the sanitising steps are appropriate for your analyses.
Value
A vector of ?units::unit
class objects with the same length as x
.
Author(s)
Pepijn de Vries
See Also
Other ecotox-sanitisers:
as_date_ecotox()
,
as_numeric_ecotox()
,
mixed_to_single_unit()
,
process_ecotox_dates()
,
process_ecotox_numerics()
,
process_ecotox_units()
Examples
## Try parsing a random set of units from the database:
c("ppm-d", "ml/2.5 cm eu", "fl oz/10 gal/1k sqft", "kg/100 L",
"mopm", "ng/kg", "ug", "AI ng/g", "PH", "pm", "uM/cm3", "1e-4 mM",
"degree", "fs", "mg/TI", "RR", "ug/g org/d", "1e+4 IU/TI", "pg/mg TE",
"pmol/mg", "1e-9/l", "no >15 cm", "umol/mg pro", "cc/org/wk", "PIg/L",
"ug/100 ul/org", "ae mg/kg diet/d", "umol/mg/h", "cmol/kg d soil",
"ug/L diet", "kg/100 kg sd", "1e+6 cells", "ul diet", "S", "mmol/h/g TI",
"g/70 d", "vg", "ng/200 mg diet", "uS/cm2", "AI ml/ha", "AI pt/acre",
"mg P/h/g TI", "no/m", "kg/ton sd", "ug/g wet wt", "AI mg/2 L diet",
"nmol/TI", "umol/g wet wt", "PSU", "Wijs number") |>
as_unit_ecotox(warn = FALSE)
## Adding the type of measurement can affect interpretation:
as_unit_ecotox(c("C", "K"), type = "concentration")
as_unit_ecotox(c("C", "K"), type = "media")
Build an SQLite database from zip archived tables downloaded from EPA website
Description
This function is called automatically after
download_ecotox_data()
. The database
files can also be downloaded manually from the EPA website from which a local
database can be build using this function.
Usage
build_ecotox_sqlite(source, destination = get_ecotox_path(), write_log = TRUE)
Arguments
source |
A |
destination |
A |
write_log |
A |
Details
Raw data downloaded from the EPA website is in itself not very efficient to work with in R. The files are large and would put a large strain on R when loading completely into the system's memory. Instead use this function to build an SQLite database from the tables. That way, the data can be queried without having to load it all into memory.
EPA provides the raw table from the ECOTOX database as text files with pipe-characters ('|') as table column separators. Although not documented, the tables appear not to contain comment or quotation characters. There are records containing the reserved pipe-character that will confuse the table parser. For these records, the pipe-character is replaced with a dash character ('-').
In addition, while reading the tables as text files, this package attempts to decode the text as UTF8. Unfortunately, this process appears to be platform-dependent, and may therefore result in different end-results on different platforms. This problem only seems to occur for characters that are listed as 'control characters' under UTF8. This will have consequences for reproducibility, but only if you build search queries that look for such special characters. It is therefore advised to stick to common (non-accented) alpha-numerical characters in your searches, for the sake of reproducibility.
Use 'suppressMessages()
' to suppress the progress report.
Value
Returns NULL
invisibly.
Author(s)
Pepijn de Vries
See Also
Other database-build-functions:
check_ecotox_build()
,
check_ecotox_version()
,
download_ecotox_data()
,
get_ecotox_url()
Examples
source_path <- tempfile()
dir.create(source_path)
## This is a small mockup file resembling the larger zip
## files that can be downloaded with `download_ecotox_data()`:
source_file <- system.file("ecotox-test.zip", package = "ECOTOXr")
unzip(source_file, exdir = source_path)
build_ecotox_sqlite(source_path, tempdir())
Functions for handling chemical abstract service (CAS) registry numbers
Description
Functions for handling chemical abstract service (CAS) registry numbers
Usage
cas(length = 0L)
is.cas(x)
as.cas(x)
## S3 method for class 'cas'
x[[i]]
## S3 method for class 'cas'
x[i]
## S3 replacement method for class 'cas'
x[[i]] <- value
## S3 replacement method for class 'cas'
x[i] <- value
## S3 method for class 'cas'
format(x, hyphenate = TRUE, ...)
## S3 method for class 'cas'
as.character(x, ...)
show.cas(x, ...)
## S3 method for class 'cas'
print(x, ...)
## S3 method for class 'cas'
as.list(x, ...)
## S3 method for class 'cas'
as.double(x, ...)
## S3 method for class 'cas'
as.integer(x, ...)
## S3 method for class 'cas'
c(...)
## S3 method for class 'cas'
as.data.frame(...)
Arguments
length |
A non-negative |
x |
Object from which data needs to be extracted or replaced, or needs to be coerced into a specific
format. For nearly all of the functions documented here, this needs to be an object of the S3 class 'cas',
which can be created with |
i |
Index specifying element(s) to extract or replace. See also |
value |
A replacement value, can be anything that can be converted into an S3 cas-class object with |
hyphenate |
A |
... |
Arguments passed to other functions |
Details
In the database CAS registry numbers are stored
as text (type character
). As CAS numbers can consist of a maximum of 10 digits (plus two hyphens) this means
that each CAS number can consume up to 12 bytes of memory or disk space. By storing the data numerically, only
5 bytes are required. These functions provide the means to handle CAS registry numbers and coerce from and to
different formats and types.
Value
Functions cas
, c
and as.cas
return S3 class 'cas' objects. Coercion functions
(starting with 'as') return the object as specified by their respective function names (i.e., integer
,
double
, character
, list
and data.frame
). The show.cas
and print
functions
also return formatted charater
s. The function is.cas
will return a single logical
value,
indicating whether x
is a valid S3 cas-class object. The square brackets return the selected index/indices,
or the vector
of cas objects where the selected elements are replaced by value
.
Author(s)
Pepijn de Vries
Examples
## This will generate a vector of cas objects containing 10
## fictive (0-00-0), but valid registry numbers:
cas(10)
## This is a cas-object:
is.cas(cas(0L))
## This is not a cas-object:
is.cas(0L)
## Three different ways of creating a cas object from
## Benzene's CAS registry number (the result is the same)
as.cas("71-43-2")
as.cas("71432")
as.cas(71432L)
## This is one way of creating a vector with multiple CAS registry numbers:
cas_data <- as.cas(c("64175", "71432", "58082"))
## This is how you select a specific element(s) from the vector:
cas_data[2:3]
cas_data[[2]]
## You can also replace specific elements in the vector:
cas_data[1] <- "7440-23-5"
cas_data[[2]] <- "129-00-0"
## You can format CAS numbers with or without hyphens:
format(cas_data, TRUE)
format(cas_data, FALSE)
## The same can be achieved using as.character
as.character(cas_data, TRUE)
as.character(cas_data, FALSE)
## There are also show and print methods available:
show(cas_data)
print(cas_data)
## Numeric values can be obtained from CAS using as.numeric, as.double or as.integer
as.numeric(cas_data)
## Be careful, however. Some CAS numbers cannot be represented by R's 32 bit integers
## and will produce NA's. This will work OK:
huge_cas <- as.cas("9999999-99-5")
## Not run:
## This will not:
as.integer(huge_cas)
## End(Not run)
## The trick applied by this package is that the final
## validation digit is stored separately as attribute:
unclass(huge_cas)
## This is how cas objects can be concatenated:
cas_data <- c(huge_cas, cas_data)
## This will create a data.frame
as.data.frame(cas_data)
## This will create a list:
as.list(cas_data)
Check whether a ECOTOX database exists locally
Description
Tests whether a local copy of the US EPA ECOTOX database exists in
get_ecotox_path()
.
Usage
check_ecotox_availability(target = get_ecotox_path())
Arguments
target |
A |
Details
When arguments are omitted, this function will look in the default directory (get_ecotox_path()
).
However, it is possible to build a database file elsewhere if necessary.
Value
Returns a logical
value indicating whether a copy of the database exists. It also returns
a files
attribute that lists which copies of the database are found.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_build()
,
check_ecotox_version()
,
cite_ecotox()
,
dbConnectEcotox()
,
get_ecotox_info()
,
get_ecotox_sqlite_file()
,
list_ecotox_fields()
Examples
check_ecotox_availability()
Check the locally build database for validity
Description
Performs some simple tests to check whether the
locally built database is not corrupted.
Usage
check_ecotox_build(path = get_ecotox_path(), version, ...)
Arguments
path |
A |
version |
A |
... |
Arguments that are passed to |
Details
For now this function tests if all expected tables are present in the locally built
database. Note that in later release of the database some tables were added. Therefore
for older builds this function might return FALSE
whereas it is actually just fine
(just out-dated).
Furthermore, this function tests if all tables contain one or more records. Obviously, this is no guarantee that the database is valid, but it is a start.
More tests may be added in future releases.
Value
Returns an indicative logical value whether the database is not corrupted.
TRUE
indicates the database is most likely OK. FALSE
indicates that something might
be wrong. Additional messages (when FALSE
) are included as attributes containing hints
on the outcoming of the tests. See also the 'details' section.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_availability()
,
check_ecotox_version()
,
cite_ecotox()
,
dbConnectEcotox()
,
get_ecotox_info()
,
get_ecotox_sqlite_file()
,
list_ecotox_fields()
Other database-build-functions:
build_ecotox_sqlite()
,
check_ecotox_version()
,
download_ecotox_data()
,
get_ecotox_url()
Examples
if (check_ecotox_availability()) {
check_ecotox_build()
}
Check if the locally build database is up to date
Description
Checks the version of the database available online
from the EPA against the specified version (latest by default) of the database build
locally. Returns
TRUE
when they are the same.
Usage
check_ecotox_version(path = get_ecotox_path(), version, verbose = TRUE, ...)
Arguments
path |
When you have a copy of the database somewhere other than the default
directory ( |
version |
A |
verbose |
A |
... |
Arguments passed to |
Value
Returns a logical
value invisibly indicating whether the locally build
is up to date with the latest release by the EPA.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_availability()
,
check_ecotox_build()
,
cite_ecotox()
,
dbConnectEcotox()
,
get_ecotox_info()
,
get_ecotox_sqlite_file()
,
list_ecotox_fields()
Other database-build-functions:
build_ecotox_sqlite()
,
check_ecotox_build()
,
download_ecotox_data()
,
get_ecotox_url()
Examples
if (check_ecotox_availability()) {
check_ecotox_version()
}
Cite the downloaded copy of the ECOTOX database
Description
Cite the downloaded copy of the ECOTOX database and this package
(
citation("ECOTOXr")
) for reproducible results.
Usage
cite_ecotox(path = get_ecotox_path(), version)
Arguments
path |
A |
version |
A |
Details
When you download a copy of the EPA ECOTOX database using download_ecotox_data()
,
a BibTex file is stored that registers the database release version and the access (= download) date. Use this
function to obtain a citation to that specific download.
In order for others to reproduce your results, it is key to cite the data source as accurately as possible.
Value
Returns a vector
of bibentry()
's, containing a reference to the downloaded database
and this package.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_availability()
,
check_ecotox_build()
,
check_ecotox_version()
,
dbConnectEcotox()
,
get_ecotox_info()
,
get_ecotox_sqlite_file()
,
list_ecotox_fields()
Examples
## In order to cite downloaded database and this package:
cite_ecotox() |> suppressWarnings()
Open or close a connection to the local ECOTOX database
Description
Wrappers for
dbConnect()
and
dbDisconnect()
methods.
Usage
dbConnectEcotox(path = get_ecotox_path(), version, ...)
dbDisconnectEcotox(conn, ...)
Arguments
path |
A |
version |
A |
... |
Arguments that are passed to |
conn |
An open connection to the ECOTOX database that needs to be closed. |
Details
Open or close a connection to the local ECOTOX database. These functions are only required when you want
to send custom queries to the database. For most searches the search_ecotox()
function
will be adequate.
Value
A database connection in the form of a DBI::DBIConnection-class()
object.
The object is tagged with: a time stamp; the package version used; and the
file path of the SQLite database used in the connection. These tags are added as attributes
to the object.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_availability()
,
check_ecotox_build()
,
check_ecotox_version()
,
cite_ecotox()
,
get_ecotox_info()
,
get_ecotox_sqlite_file()
,
list_ecotox_fields()
Examples
## This will only work when a copy of the database exists:
if (check_ecotox_availability()) {
con <- dbConnectEcotox()
## check if the connection works by listing the tables in the database:
dbListTables(con)
## Let's be a good boy/girl and close the connection to the database when we're done:
dbDisconnectEcotox(con)
}
Download and extract ECOTOX database files and compose database
Description
In order for this package to fully function, a local copy of the ECOTOX database
needs to be build. This function will download the required data and build the database.
Usage
download_ecotox_data(
target = get_ecotox_path(),
write_log = TRUE,
ask = TRUE,
verify_ssl = getOption("ECOTOXr_verify_ssl"),
...
)
Arguments
target |
Target directory where the files will be downloaded and the database compiled. Default is
|
write_log |
A |
ask |
There are several steps in which files are (potentially) overwritten or deleted. In those cases
the user is asked on the command line what to do in those cases. Set this parameter to |
verify_ssl |
When set to |
... |
Arguments passed on to |
Details
This function will attempt to find the latest download url for the ECOTOX database from the
EPA website (see get_ecotox_url()
).
When found it will attempt to download the zipped archive containing all required data. This data is then
extracted and a local copy of the database is build.
Use 'suppressMessages()
' to suppress the progress report.
Value
Returns NULL
invisibly.
Known issues
On some machines this function fails to connect to the database download URL from the
EPA website due to missing
SSL certificates. Unfortunately, there is no easy fix for this in this package. A work around is to download and
unzip the file manually using a different machine or browser that is less strict with SSL certificates. You can
then call build_ecotox_sqlite()
and point the source
location to the manually extracted zip
archive. For this purpose get_ecotox_url()
can be used. Alternatively, one could try to call download_ecotox_data()
by setting verify_ssl = FALSE
; but only do so when you trust the download URL from get_ecotox_URL().
Author(s)
Pepijn de Vries
See Also
Other database-build-functions:
build_ecotox_sqlite()
,
check_ecotox_build()
,
check_ecotox_version()
,
get_ecotox_url()
Other online-functions:
get_ecotox_url()
,
websearch_ecotox()
Examples
## Not run:
## This will download and build the database in your temp dir:
if (interactive()) {
download_ecotox_data(tempdir())
}
## End(Not run)
Get information on the local ECOTOX database when available
Description
Get information on how and when the local ECOTOX database was build.
Usage
get_ecotox_info(path = get_ecotox_path(), version)
Arguments
path |
A |
version |
A |
Details
Get information on how and when the local ECOTOX database was build. This information is retrieved
from the log-file that is (optionally) stored with the local database when calling download_ecotox_data()
or build_ecotox_sqlite()
.
Value
Returns a vector
of character
s, containing a information on the selected local ECOTOX database.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_availability()
,
check_ecotox_build()
,
check_ecotox_version()
,
cite_ecotox()
,
dbConnectEcotox()
,
get_ecotox_sqlite_file()
,
list_ecotox_fields()
Examples
if (check_ecotox_availability()) {
## Show info on the current database (only works when one is downloaded and build):
get_ecotox_info()
}
The local path to the ECOTOX database (directory or sqlite file)
Description
Obtain the local path to where the ECOTOX database is
(or will be) placed.
Usage
get_ecotox_sqlite_file(path = get_ecotox_path(), version)
get_ecotox_path()
Arguments
path |
When you have a copy of the database somewhere other than the default
directory ( |
version |
A |
Details
It can be useful to know where the database is located on your disk. This function
returns the location as provided by rappdirs::app_dir()
, or as
specified by you using options(ECOTOXr_path = "mypath")
.
Value
Returns a character
string of the path.
get_ecotox_path
will return the default directory of the database.
get_ecotox_sqlite_file
will return the path to the sqlite file when it exists.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_availability()
,
check_ecotox_build()
,
check_ecotox_version()
,
cite_ecotox()
,
dbConnectEcotox()
,
get_ecotox_info()
,
list_ecotox_fields()
Examples
get_ecotox_path()
if (check_ecotox_availability()) {
## This will only work if a local database exists:
get_ecotox_sqlite_file()
}
Get ECOTOX download URL from EPA website
Description
This function downloads the webpage at https://cfpub.epa.gov/ecotox/index.cfm. It then searches for the
download link for the complete ECOTOX database and extract its URL.
Usage
get_ecotox_url(verify_ssl = getOption("ECOTOXr_verify_ssl"), ...)
Arguments
verify_ssl |
When set to |
... |
arguments passed on to |
Details
This function is called by download_ecotox_data()
which tries to download the file from the resulting
URL. On some machines this fails due to issues with the SSL certificate. The user can try to download the file
by using this URL in a different browser (or on a different machine). Alternatively, the user could try to use
[download_ecotox_data](verify_ssl = FALE)
when the download URL is trusted.
Value
Returns a character
string containing the download URL of the latest version of the EPA ECOTOX
database.
Author(s)
Pepijn de Vries
See Also
Other database-build-functions:
build_ecotox_sqlite()
,
check_ecotox_build()
,
check_ecotox_version()
,
download_ecotox_data()
Other online-functions:
download_ecotox_data()
,
websearch_ecotox()
Examples
if (interactive()) {
get_ecotox_url()
}
List the field names that are available from the ECOTOX database
Description
List the field names (table headers) that are available from the ECOTOX database
Usage
list_ecotox_fields(
which = c("default", "extended", "full", "all"),
include_table = TRUE
)
Arguments
which |
A |
include_table |
A |
Details
This can be useful when specifying a search_ecotox()
, to identify which fields
are available from the database, for searching and output.
Not that when requesting 'all
' fields, you will get all fields available from the
latest EPA release of the ECOTOX database. This means that not necessarily all
fields are available in your local build of the database.
Value
Returns a vector
of type character
containing the field names from the ECOTOX database.
Author(s)
Pepijn de Vries
See Also
Other database-access-functions:
check_ecotox_availability()
,
check_ecotox_build()
,
check_ecotox_version()
,
cite_ecotox()
,
dbConnectEcotox()
,
get_ecotox_info()
,
get_ecotox_sqlite_file()
Examples
## Fields that are included in search results by default:
list_ecotox_fields("default")
## All fields that are available from the ECOTOX database:
list_ecotox_fields("all")
## All except fields from the tables 'chemical_carriers', 'media_characteristics',
## 'doses', 'dose_responses', 'dose_response_details', 'dose_response_links' and
## 'dose_stat_method_codes' that are available from the ECOTOX database:
list_ecotox_fields("full")
Convert mixed units to a specific unit
Description
Converts a list of mixed units to a specific unit, using the units
package.
Usage
mixed_to_single_unit(x, target_unit)
Arguments
x |
A mixed units object ( |
target_unit |
A |
Value
Returns a units object (?units::units
). Values with units
that cannot be converted to the target_unit
is returned as NA
.
Author(s)
Pepijn de Vries
See Also
Other ecotox-sanitisers:
as_date_ecotox()
,
as_numeric_ecotox()
,
as_unit_ecotox()
,
process_ecotox_dates()
,
process_ecotox_numerics()
,
process_ecotox_units()
Examples
mishmash <- as_unit_ecotox(c("mg/L", "ppt w/v", "% w/v", "mmol/L"))
## Note that 'mmol/L' cannot be converted to 'ug/L'
## without a molar mass. It is returned as `NA`
mixed_to_single_unit(mishmash, "ug/L")
mishmash <- as_unit_ecotox(c("h", "sec", "mi", "dph"))
mixed_to_single_unit(mishmash, "h")
Process ECOTOX search results by converting character
to dates where relevant
Description
The function
search_ecotox()
returns fields
from the ECOTOX database as is. Fields that represent dates are usually formatted as
"%m\%d\%Y"
. Unfortunately, this format is not consistently used throughout the
database. process_ecotox_dates()
takes a data.frame
returned by
search_ecotox()
, locates date columns, represented by text, sanitises the text
and converts them to Date
objects. It will sanitise the date fields as much as possible.
It will correct most dates. Dates without a specified calender year, a date range,
illegal date format (even after sanitation) are returned as NA
.
Usage
process_ecotox_dates(x, .fns = as_date_ecotox, ..., .names = NULL)
Arguments
x |
A |
.fns |
Function to convert |
... |
Arguments passed to |
.names |
A 'glue' specification used to rename the date columns. By default
it is |
Value
Returns a data.frame
in which the columns containing date information
is converted from the character format from the database to actual date objects (
"POSIXlt"
and "POSIXct"
).
Author(s)
Pepijn de Vries
See Also
Other ecotox-sanitisers:
as_date_ecotox()
,
as_numeric_ecotox()
,
as_unit_ecotox()
,
mixed_to_single_unit()
,
process_ecotox_numerics()
,
process_ecotox_units()
Examples
if (check_ecotox_availability()) {
df <- search_ecotox(
list(
latin_name = list(
terms = c("Skeletonema", "Daphnia"),
method = "contains"
),
chemical_name = list(
terms = "benzene",
method = "exact"
)
), list_ecotox_fields("full"))
df_dat <-
process_ecotox_dates(df, warn = FALSE)
}
Process ECOTOX search results by converting character
to numeric
where relevant
Description
The function
search_ecotox()
returns fields
from the ECOTOX database as is. Many numeric values are stored in the database as
text. It is not uncommon that these text fields cannot be converted directly and need
some sanitising first. process_ecotox_numerics()
takes a data.frame
returned by
search_ecotox()
, locates numeric columns, represented by text, sanitises the text
and converts them to numerics.
Usage
process_ecotox_numerics(
x,
.fns = as_numeric_ecotox,
...,
add_units = FALSE,
.names = NULL
)
Arguments
x |
A |
.fns |
Function to convert |
... |
Arguments passed to |
add_units |
A |
.names |
A 'glue' specification used to rename the numeric columns. By default
it is |
Value
Returns a data.frame
in which the columns containing numeric information
is converted from the character format from the database to actual numerics.
Author(s)
Pepijn de Vries
See Also
Other ecotox-sanitisers:
as_date_ecotox()
,
as_numeric_ecotox()
,
as_unit_ecotox()
,
mixed_to_single_unit()
,
process_ecotox_dates()
,
process_ecotox_units()
Examples
if (check_ecotox_availability()) {
df <- search_ecotox(
list(
latin_name = list(
terms = c("Skeletonema", "Daphnia"),
method = "contains"
),
chemical_name = list(
terms = "benzene",
method = "exact"
)
), list_ecotox_fields("full"))
df_num <-
process_ecotox_numerics(df, add_units = TRUE, warn = FALSE)
}
Process ECOTOX search results by converting character
to units where relevant
Description
The function
search_ecotox()
returns fields
from the ECOTOX database as is. Fields that represent units are not standardised in
the database. Therefore, this format is not consistently used throughout the
database. process_ecotox_units()
takes a data.frame
returned by
search_ecotox()
, locates unit columns, represented by text, sanitises the text
and converts them to units::mixed_units()
objects. It will sanitise the unit fields as
much as possible. Units that could not be interpreted are returned as arbitrary unit
.
Usage
process_ecotox_units(x, .fns = as_unit_ecotox, ..., .names = NULL)
Arguments
x |
A |
.fns |
Function to convert |
... |
Arguments passed to |
.names |
A 'glue' specification used to rename the unit columns. By default
it is |
Value
Returns a data.frame
in which the columns containing unit information
is converted from the character format from the database to actual unit objects (
?units::units
).
Author(s)
Pepijn de Vries
See Also
Other ecotox-sanitisers:
as_date_ecotox()
,
as_numeric_ecotox()
,
as_unit_ecotox()
,
mixed_to_single_unit()
,
process_ecotox_dates()
,
process_ecotox_numerics()
Examples
if (check_ecotox_availability()) {
df <- search_ecotox(
list(
latin_name = list(
terms = c("Skeletonema", "Daphnia"),
method = "contains"
),
chemical_name = list(
terms = "benzene",
method = "exact"
)
), list_ecotox_fields("full"))
df_unit <-
process_ecotox_units(df, warn = FALSE)
}
Search and retrieve toxicity records from the database
Description
Create (and execute) an SQL search query based on basic search terms and
options. This allows you to search the database, without having to understand SQL.
Usage
search_ecotox(
search,
output_fields = list_ecotox_fields("default"),
group_by_results = TRUE,
compute = FALSE,
as_data_frame = TRUE,
...
)
search_ecotox_lazy(
search,
output_fields = list_ecotox_fields("default"),
compute = FALSE,
...
)
search_query_ecotox(search, output_fields = list_ecotox_fields("default"), ...)
Arguments
search |
A named Each element in that list should contain another list with at least one element named 'terms'. This should
contain a Search terms for a specific field (table header) will be combined with 'or'. Meaning that any record that matches any of the terms are returned. For instance when 'latin_name' 'Daphnia magna' and 'Skeletonema costatum' are searched, results for both species are returned. Search terms across fields (table headers) are combined with 'and', which will narrow the search. For instance if 'chemical_name' 'benzene' is searched in combination with 'latin_name' 'Daphnia magna', only tests where Daphnia magna are exposed to benzene are returned. When this search behaviour described above is not desirable, the user can either adjust the query manually, or use this function to perform several separate searches and combine the results afterwards. Beware that some field names are ambiguous and occur in multiple tables (like |
output_fields |
A |
group_by_results |
Ecological test results are generally the most informative element in the ECOTOX database. Therefore, this search function returns a table with unique results in each row. However, some tables in the database (such as 'chemical_carriers' and 'dose_responses') have a one to many relationship with test results. This means that multiple chemical carriers can be linked to a single test result, similarly, multiple doses can also be linked to a single test result. By default the search results are grouped by test results. As a result not all doses or chemical carriers may
be displayed in the output. Set the |
compute |
The ECOTOXr package tries to construct database queries as lazy as possible. Meaning that R
moves as much of the heavy lifting as possible to the database. When your search becomes complicated (e.g., when
including many output fields), you may run into trouble and hit the SQL parser limits. In those cases you can set
this parameter to |
as_data_frame |
|
... |
Arguments passed to |
Details
The ECOTOX database is stored locally as an SQLite file, which can be queried with SQL. These functions
allow you to automatically generate an SQL query and send it to the database, without having to understand
SQL. The function search_query_ecotox
generates and returns the SQL query (which can be edited by
hand if desired). You can also directly call search_ecotox
, this will first generate the query,
send it to the database and retrieve the result.
Although the generated query is not optimized for speed, it should be able to process most common searches
within an acceptable time. The time required for retrieving data from a search query depends on the complexity
of the query, the size of the query and the speed of your machine. Most queries should be completed within
seconds (or several minutes at most) on modern machines. If your search require optimisation for speed,
you could try reordering the search fields. You can also edit the query generated with search_query_ecotox
by hand and retrieve it with DBI::dbGetQuery()
.
Note that this package is actively maintained and this function may be revised in future versions.
In order to create reproducible results the user must: always work with an official release from
CRAN and document the package and database version that are used to generate specific results (see also
cite_ecotox()
).
Value
In case of search_query_ecotox
, a character
string containing an SQL
query is returned. This query is built based on the provided search terms and options.
In case of search_ecotox
a data.frame
is returned based on the search query built with
search_query_ecotox
. The data.frame
is unmodified as returned by SQLite, meaning that all
fields are returned as character
s (even where the field types are 'date' or 'numeric').
Therefore, retrieved search results may need some post-processing with process_ecotox_numerics()
as_numeric_ecotox()
The results are tagged with: a time stamp; the package version used; and the file path of the SQLite database used in the search (when applicable). These tags are added as attributes to the output table or query.
Author(s)
Pepijn de Vries
See Also
Other search-functions:
websearch_ecotox()
Examples
## let's find the ids of all ecotox tests on species
## where Latin names contain either of 2 specific genus names and
## where they were exposed to the chemical benzene
if (check_ecotox_availability()) {
search <-
list(
latin_name = list(
terms = c("Skeletonema", "Daphnia"),
method = "contains"
),
chemical_name = list(
terms = "benzene",
method = "exact"
)
)
## rows in result each represent a unique test id from the database
result <- search_ecotox(search)
query <- search_query_ecotox(search)
cat(query)
} else {
print("Sorry, you need to use 'download_ecotox_data()' first in order for this to work.")
}
Search and retrieve substance information from https://comptox.epa.gov/dashboard
Description
Search https://comptox.epa.gov/dashboard for substances and their chemico-physical properties
and meta-information.
Usage
websearch_comptox(
searchItems,
identifierTypes = c("chemical_name", "CASRN", "INCHIKEY", "dtxsid"),
inputType = c("IDENTIFIER", "DTXCID", "INCHIKEY_SKELETON", "MSREADY_FORMULA",
"EXACT_FORMULA", "MASS"),
downloadItems = c("DTXCID", "CASRN", "INCHIKEY", "IUPAC_NAME", "SMILES",
"INCHI_STRING", "MS_READY_SMILES", "QSAR_READY_SMILES", "MOLECULAR_FORMULA",
"AVERAGE_MASS", "MONOISOTOPIC_MASS", "QC_LEVEL", "SAFETY_DATA", "EXPOCAST",
"DATA_SOURCES", "TOXVAL_DATA", "NUMBER_OF_PUBMED_ARTICLES", "PUBCHEM_DATA_SOURCES",
"CPDAT_COUNT", "IRIS_LINK", "PPRTV_LINK", "WIKIPEDIA_ARTICLE", "QC_NOTES",
"ABSTRACT_SHIFTER", "TOXPRINT_FINGERPRINT", "ACTOR_REPORT", "SYNONYM_IDENTIFIER",
"RELATED_RELATIONSHIP", "ASSOCIATED_TOXCAST_ASSAYS",
"TOXVAL_DETAILS",
"CHEMICAL_PROPERTIES_DETAILS", "BIOCONCENTRATION_FACTOR_TEST_PRED",
"BOILING_POINT_DEGC_TEST_PRED", "48HR_DAPHNIA_LC50_MOL/L_TEST_PRED",
"DENSITY_G/CM^3_TEST_PRED", "DEVTOX_TEST_PRED",
"96HR_FATHEAD_MINNOW_MOL/L_TEST_PRED", "FLASH_POINT_DEGC_TEST_PRED",
"MELTING_POINT_DEGC_TEST_PRED", "AMES_MUTAGENICITY_TEST_PRED",
"ORAL_RAT_LD50_MOL/KG_TEST_PRED", "SURFACE_TENSION_DYN/CM_TEST_PRED",
"THERMAL_CONDUCTIVITY_MW/(M*K)_TEST_PRED",
"TETRAHYMENA_PYRIFORMIS_IGC50_MOL/L_TEST_PRED", "VISCOSITY_CP_CP_TEST_PRED",
"VAPOR_PRESSURE_MMHG_TEST_PRED", "WATER_SOLUBILITY_MOL/L_TEST_PRED",
"ATMOSPHERIC_HYDROXYLATION_RATE_(AOH)_CM3/MOLECULE*SEC_OPERA_PRED",
"BIOCONCENTRATION_FACTOR_OPERA_PRED",
"BIODEGRADATION_HALF_LIFE_DAYS_DAYS_OPERA_PRED", "BOILING_POINT_DEGC_OPERA_PRED",
"HENRYS_LAW_ATM-M3/MOLE_OPERA_PRED", "OPERA_KM_DAYS_OPERA_PRED",
"OCTANOL_AIR_PARTITION_COEFF_LOGKOA_OPERA_PRED",
"SOIL_ADSORPTION_COEFFICIENT_KOC_L/KG_OPERA_PRED",
"OCTANOL_WATER_PARTITION_LOGP_OPERA_PRED", "MELTING_POINT_DEGC_OPERA_PRED",
"OPERA_PKAA_OPERA_PRED", "OPERA_PKAB_OPERA_PRED", "VAPOR_PRESSURE_MMHG_OPERA_PRED",
"WATER_SOLUBILITY_MOL/L_OPERA_PRED",
"EXPOCAST_MEDIAN_EXPOSURE_PREDICTION_MG/KG-BW/DAY", "NHANES",
"TOXCAST_NUMBER_OF_ASSAYS/TOTAL", "TOXCAST_PERCENT_ACTIVE"),
massError = 0,
timeout = 300,
verify_ssl = getOption("ECOTOXr_verify_ssl"),
...
)
Arguments
searchItems |
A |
identifierTypes |
Substance identifiers for searching CompTox. Only used when |
inputType |
Type of input used for searching CompTox. See usage section for valid entries. |
downloadItems |
Output fields of CompTox data for requested substances |
massError |
Error tolerance when searching for substances based on their monoisotopic mass. Only used for |
timeout |
Time in seconds (default is 300 secs), that the routine will wait for the download link to get ready.
It will throw an error if it takes longer than the specified |
verify_ssl |
When set to |
... |
Arguments passed on to |
Details
The CompTox Chemicals Dashboard is a freely accessible online U.S. EPA database. It contains information on physico-chemical properties, environmental fate and transport, exposure, usage, in vivo toxicity, and in vitro bioassay of a wide range of substances.
The function described here to search and retrieve records from the online database is experimental. This is because this feature is not formally supported by the EPA, and it may break in future incarnations of the online database. The function forms an interface between R and the CompTox website and is therefore limited by the restrictions documented there.
Value
Returns a named list
of dplyr::tibbles containing the search results for the requested output tables and fields.
Results are unpolished and ‘as is’ returned by EPA's web service.
Author(s)
Pepijn de Vries
References
Official US EPA CompTox website: https://comptox.epa.gov/dashboard/
Williams, A.J., Grulke, C.M., Edwards, J., McEachran, A.D., Mansouri, K, Baker, N.C., Patlewicz, G., Shah, I., Wambaugh, J.F., Judson, R.S. & Richard, A.M. (2017), The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform, 9(61) doi:10.1186/s13321-017-0247-6
Examples
if (interactive()){
## search for substance name 'benzene' and CAS registration number 108-88-3
## on https://comptox.epa.gov/dashboard:
comptox_results <- websearch_comptox(c("benzene", "108-88-3"))
## search for substances with monoisotopic mass of 100+/-5:
comptox_results2 <- websearch_comptox("100", inputType = "MASS", massError = 5)
}
Search and retrieve toxicity records from the online database
Description
Functions to search and retrieve records from the online database at
https://cfpub.epa.gov/ecotox/search.cfm.
Usage
websearch_ecotox(
fields = list_ecotox_web_fields(),
habitat = c("aquire", "terrestrial"),
verify_ssl = getOption("ECOTOXr_verify_ssl"),
...
)
list_ecotox_web_fields(...)
Arguments
fields |
A named |
habitat |
Use |
verify_ssl |
When set to |
... |
In case of In case of |
Details
The functions described here to search and retrieve records from the online database are experimental. This is because this feature is not formally supported by the EPA, and it may break in future iterations of the online database. The functions form an interface between R and the ECOTOX website and is therefore limited by its restrictions as described in the package documentation: ECOTOXr. The functions should therefore be used with caution.
Value
Returns named list
of dplyr::tibbles with search results. Results are unpolished and ‘as is’ returned by EPA's web service.
list_ecotox_web_fields()
returns a named list with fields that can be used in a web search of EPA's ECOTOX database, using
websearch_ecotox()
.
Note
IMPORTANT: when you plan to perform multiple adjacent searches (for instance in a loop), please insert a call to Sys.sleep()
.
This to avoid overloading the server and getting your IP address banned from the server.
Author(s)
Pepijn de Vries
See Also
Other online-functions:
download_ecotox_data()
,
get_ecotox_url()
Other search-functions:
search_ecotox()
Examples
if (interactive()) {
search_fields <-
list_ecotox_web_fields(
txAdvancedSpecEntries = "daphnia magna",
RBSPECSEARCHTYPE = "EXACT",
txAdvancedChemicalEntries = "benzene",
RBCHEMSEARCHTYPE = "EXACT")
search_results <- websearch_ecotox(search_fields)
}