The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Please send questions to wambaugh.john@epa.gov
Chemical risk assessment depends on knowledge of inherent chemical hazard, the dose-response relationship, and the extent of exposure that occurs (NASEM 2017). High throughput screening (HTS) for in vitro bioactivity allows characterization of potential hazard for thousands of chemicals for which no other testing has occurred (Judson et al., 2009).
Toxicokinetics (TK) describes the Absorption, Distribution, Metabolism, and Excretion (ADME) of a chemical by the body. TK relates external exposures to internal tissue concentrations of chemical and therefore informs the chemical dose-response relationship (Coecke et al., 2013). TK requires chemical-specific information that is traditionally obtained in vivo. However, most chemicals do not have TK data, so a new approach methodology (NAM) uses in vitro TK methods adapted from the pharmaceutical industry to provide the necessary chemical-specific information (Wambaugh et al., 2019, Chang et al., 2022). High(er) throughput toxicokinetics (HTTK) involves the combination of chemical-specific in vitro TK data and chemical-independent (“generic”) TK models to predict TK (Breen et al., 2021).
The primary goal of HTTK is to provide a human dose context for bioactive in vitro concentrations from HTS (that is, in vitro-in vivo extrapolation, or IVIVE) (Wetmore, 2015). IVIVE is Utilization of in vitro experimental data to predict phenomena in vivo. In pharmaceutical development, HTTK methods allow IVIVE to estimate therapeutic doses for clinical studies – predicted concentrations are typically on the order of values measured in clinical trials (Wang, 2010). High throughput screening + IVIVE can predict a daily chemical dose rate (mg/kg bw/day) that might be adverse (Paul Friedman et al., 2019).
A secondary goal is to provide open source data and models for evaluation and use by the scientific community (Pearce et al., 2017).
The following material is from a lecture taught as part of the course “Computational Methods in Chemical Risk Assessment” at Rutgers University in September, 2019.
R package knitr generates html and PDF documents from this RMarkdown file, Each bit of code that follows is known as a “chunk”. We start by telling knitr how we want our chunks to look.
::opts_chunk$set(collapse = TRUE, comment = '#>')
knitroptions(rmarkdown.html_vignette.check_title = FALSE)
It is a bad idea to let variables and other information from previous R sessions float around, so we first remove everything in the R memory.
rm(list=ls())
This vignette was created using httk v2.2.2 in 2022. Although we attempt to maintain backward compatibility, if you encounter issues with the latest release of httk and cannot easily address the changes, historical versions of httk are available from: https://cran.r-project.org/src/contrib/Archive/httk/
If you are familiar with computer files and R you might want to skip directly to “Step 1: Loading In Vitro Bioactivity Data”.
R (or any computer program) needs to know where to look to find particular information (such as data generated by a panel of in vitro assays). We describe a chunk of information as a “file”. Files range in size and can be small (even 0 bytes) or very large (several GB). A file my physically be located on a disk drive inside your computing device or, increasingly, it might be located somewhere in a cloud storage network. Regardless, most computers will treat a file as if it has a specific location. Each file has a name, although multiple files can have the same name as long as those files are stored in different “folders”.
Files for all the operating systems with R are organized using a folder (also called a “directory”) and sub-folder structure. A sub-folder is a folder within another folder. They can be well organized:
/animals/cats/tabbies/
Or not:
/animals/gradschoolfiles/photosfromchildhood
The sequence of folders and sub-folders leading to a particular file is referred to as a “path”. For information on how folder structures work, please check out these sites:
Within the command shell you use the “cd” command to “change directories”.
One complication to watch out for is the direction of the symbol that separates folder names in the path. On Windows the path is separated by “back-slashes”: \
\animals\cats\black
On Unix-based operating systems (including Linux and MacOS) the path is separated by “forward slashes”: /
/animals/cats/black
R always uses Unix forward slashes, even on a Windows machine, so if you know that the path is:
c:\users\USERNAME\Downloads\
You need to tell R that you want:
c:/users/USERNAME/Downloads/
Finally, the bit at the front of the path (such as “C:\” or “L:\” tells the computer which “drive” to look in to find your folders. Often “C drive” (c:\) is physically inside your machine while other letters might indicate network (or cloud) drives.
Whether you are on Linux or Windows you have a home directory. Often you will only be able to write and edit files within your home directory and its sub-directories. However, you may be able to read and copy from most other folders.
On Windows your home directory will be a sub-directory of “C:\users\”. For example, “C:\users\jwambaug”.
You will want to create a directory for your R packages locally on your machine:
C:\users\USERNAME\AppData\Loca\R
Many web browsers put their files in a default directory such as
C:\users\USERNAME\Downloads
On Windows the path “c:\” refers to the physical hard drive inside your computer. Therefore the only copy of “c:\users\USERNAME\” is on your computer and if that drive or the whole computer is damaged or destroyed you have lost those files. For this reason we call this “temporary” space. You don’t want to keep the only copies of your files there.
install.packages("httk")
install.packages("readxl")
install.packages("ggplot2")
If you are using the RMarkdown version of this vignette (extension, .RMD) you will be able to see that several chunks of code in this vignette have the statement “eval = execute.vignette”. The next chunk of code, by default, sets execute.vignette = FALSE. This means that the code is included (and necessary) but was not run when the vignette was built. We do this because some steps require extensive computing time and the checks on CRAN limit how long we can spend building the package. If you want this vignette to work, you must run all code, either by cutting and pasting it into R. Or, if viewing the .RMD file, you can either change execute.vignette to TRUE or press “play” (the green arrow) on each chunk in RStudio.
# Set whether or not the following chunks will be executed (run):
<- FALSE execute.vignette
library(httk)
library(readxl)
library(ggplot2)
Portions of this vignette use Monte Carlo sampling to simulate variability and propagate uncertainty. The more samples that are used, the more stable the results are (that is, the less likely they are to change if a different random sequence is used). However, the more samples that are used, the longer it takes to run. So, to speed up how fast these examples run, we specify here that we only want to use 25 samples, even though the actual httk default is 1000. Increase this number to get more stable (and accurate) results:
<- 25 NSAMP
The ToxCast and Tox21 research programs employ batteries of high-throughput assays to assess chemical bioactivity in vitro. Not every chemical is tested through every assay (Richard et al., 2016). Most assays are conducted in concentration response, and each corresponding assay endpoint is analyzed statistically to determine if there is a concentration-dependent response or “hit” using the ToxCast Pipeline (Filer et al., 2017). Most assay endpoint-chemical combinations are nonresponsive. Here, only the hits are treated as potential indicators of bioactivity.
In vitro bioactivity does not necessarily indicate adversity or hazard. Biological models can be used to make predictions of toxicity based on in vitro data (for example, Sipes et al., 2011, Browne et al., 2015 and Kleinstreuer et al., 2017). However, here we make a more conservative, precautionary assumption that concentrations too low to cause in vitro bioactivity are more likely to be safe (Paul Friedman et al., 2019). Among all of the assay hits for each chemical, we choose to use the lower 10th percentile of the \(\mu M\) potencies (50% active concentration or \(AC_{50}\)). We are assuming that the 10th percentile represents a low concentration for activating multiple assays (assuming 10 or more bioactive assays were observed for a given chemical). However, this bioactivity does not necessarily have a direct toxicological interpretation.
The main page for the data is here: https://www.epa.gov/comptox-tools/exploring-toxcast-data
Most useful to us is a single file containing all the hits across all chemicals and assays: https://clowder.edap-cluster.com/datasets/6364026ee4b04f6bb1409eda?space=62bb560ee4b07abf29f88fef
As of November, 2022 the most recent version was 3.5 and was available as an .Rdata file (invitrodb_3_5_mc5.Rdata), which we can download and place in our working directory (FOLDERPATH).
Don’t forget to use “setwd(FOLDERPATH)” to change to your R working folder:
setwd("FOLDERPATH")
Then load the file:
load("invitrodb_3_5_mc5.Rdata")
We now have a file “mc5” in our woking environment. It’s absolutely huge, so lets shrink it down to contain only the chemicals we care about.
This list could be chemicals with HTTK data (note, this step cuts things down to chemicals with human HTTK data, you could change species to rat and get something different).
<- subset(mc5, dsstox_substance_id %in% get_cheminfo(
toxcast.httk info="DTXSID",
suppress.messages=TRUE))
We might have chemicals we are interested in for one reason or another – you could type any chemical ID’s you want into the following my.chems vector, or even load it from a file.
Here we’ll pick 50 chemicals at random from among the ToxCast chemicals:
set.seed(1234)
<- sample(mc5$dsstox_substance_id,50)
my.chems <- as.data.frame(mc5[mc5$dsstox_substance_id %in% my.chems,]) example.toxcast
Unfortunately for this vignette there are too many ToxCast data to fit into a 5mb R package. So we will subset to just the selected chemicals and distribute only those data. In addition, out of 78 columns in the data, we will keep only eight. Download the full data following the instructions above.
<- example.toxcast[, c("chnm",
example.toxcast "dsstox_substance_id",
"spid",
"hitc",
"modl",
"aeid",
"modl_ga",
"modl_ac10",
"modl_acc")]
# reduce precision to decrease space:
<- c("modl_ga", "modl_ac10", "modl_acc")
cols for (this.col in cols)
= signif(example.toxcast[, this.col], 3)
example.toxcast[, this.col] save(example.toxcast,file="introtoivive-toxcast.Rdata",version=2)
In the mc5 table, “aenm” gives the assay name while “dsstox_substance_id” is the chemical identity from the CompTox Chemicals Dashboard. Since both names and CAS-RN can be subject to variations, we track chemical identities with the DSSTox Substance ID’s or “DTXSIDs” (Grulke et al., 2019).
The column “hitc” is 1 (that is, “TRUE”) if there was a statistically-signficant systematic concentration-response observed (this is a “hit”). “hitc” is 0 (FALSE) if there was no hit. The column “mc6_flags” indicates features of the hit that might be of interest for further examination but are not necessarily disqualifying.
The column “modl” indicates the type of concentration-response modl that was selected (based on Akaike Information Criterion) to best describe the data. The possibilities are constant (“cnst” – no concentation response), Hill equation (“hill”), or gain-loss model (“gnls” – the product of an increasing and decreasing Hill equations). “gnls” is often interpreted as being caused by cytotoxicity.
The column “modl_ga” indicates the log10 uM concentration at which the chemical caused 50% activity. This is also called the chemical-assay “potency” or “\(AC_{50}\)”.
Each chemical might have multiple assays for which there is a hit. Each chemical may have been run multiple times (experimental replicates), so that there may be multiple samples from the same chemical for the same assay. Different samples are indicated by the column “spid” (sample id).
<- httk::example.toxcast
example.toxcast ::kable(head(example.toxcast), caption = "ToxCast In Vitro Bioactivity Data",
knitrfloating.environment="sidewaystable")
chnm | dsstox_substance_id | spid | hitc | modl | aeid | modl_ga | modl_ac10 | modl_acc |
---|---|---|---|---|---|---|---|---|
Methyl perfluoro(3-(1-ethenyloxypropan-2-yloxy)propanoate) | DTXSID8044969 | 01504168 | 0 | cnst | 63 | NA | NA | NA |
Methyl perfluoro(3-(1-ethenyloxypropan-2-yloxy)propanoate) | DTXSID8044969 | 01504168 | 1 | hill | 64 | 1.390 | -0.282 | 1.86 |
Methyl perfluoro(3-(1-ethenyloxypropan-2-yloxy)propanoate) | DTXSID8044969 | 01504168 | 0 | cnst | 65 | NA | NA | NA |
Methyl perfluoro(3-(1-ethenyloxypropan-2-yloxy)propanoate) | DTXSID8044969 | 01504168 | 0 | hill | 66 | 1.460 | -0.526 | NA |
Methyl perfluoro(3-(1-ethenyloxypropan-2-yloxy)propanoate) | DTXSID8044969 | 01504168 | 0 | gnls | 67 | 0.293 | -1.670 | NA |
Methyl perfluoro(3-(1-ethenyloxypropan-2-yloxy)propanoate) | DTXSID8044969 | 01504168 | 0 | cnst | 68 | NA | NA | NA |
Then we can build a table summarizing the in vitro data. For a “hit”, the “AC50” indicates the 50% activation concentration from the best curve-fit, “AC10” indicates the concentration estimated to cause 10%, and “ACC” is the “activity concentration at cutoff” are which is the concentration that first causes statistically significant activity. See Filer et al. (2017) figure three for additional discussion.
<- NULL
toxcast.table for (this.id in unique(example.toxcast$dsstox_substance_id))
{<- subset(example.toxcast, dsstox_substance_id == this.id)
this.subset <- subset(this.subset, hitc==1)
these.hits if (dim(these.hits)[1]>0){
<- data.frame(Compound=as.character(this.subset[1,"chnm"]),
this.row DTXSID=this.id,
Total.Assays = dim(this.subset)[1],
Unique.Assays = length(unique(this.subset$aeid)),
Total.Hits = dim(these.hits)[1],
Unique.Hits = length(unique(these.hits$aeid)),
Low.AC50 = signif(min(these.hits$modl_ga),3),
Low.AC10 = signif(min(these.hits$modl_ac10),3),
Low.ACC = signif(min(these.hits$modl_acc),3),
Q10.AC50 = signif(quantile(these.hits$modl_ga,probs=0.1),3),
Q10.AC10 = signif(quantile(these.hits$modl_ac10,probs=0.1),3),
Q10.ACC = signif(quantile(these.hits$modl_acc,probs=0.1),3),
Med.AC50 = signif(median(these.hits$modl_ga),3),
Med.AC10 = signif(median(these.hits$modl_ac10),3),
Med.ACC = signif(median(these.hits$modl_acc),3),
Q90.AC50 = signif(quantile(these.hits$modl_ga,probs=0.9),3),
Q90.AC10 = signif(quantile(these.hits$modl_ac10,probs=0.9),3),
Q90.ACC = signif(quantile(these.hits$modl_acc,probs=0.9),3)
)<- rbind(toxcast.table, this.row)
toxcast.table
}
}rownames(toxcast.table) <- seq(1,dim(toxcast.table)[1])
::kable(head(toxcast.table[,1:6]), caption = "Summarized ToxCast Data",
knitrfloating.environment="sidewaystable")
Compound | DTXSID | Total.Assays | Unique.Assays | Total.Hits | Unique.Hits |
---|---|---|---|---|---|
Methyl perfluoro(3-(1-ethenyloxypropan-2-yloxy)propanoate) | DTXSID8044969 | 1087 | 467 | 80 | 49 |
Nonafluoropentanamide | DTXSID60400587 | 792 | 496 | 7 | 6 |
Bisphenol A | DTXSID7020182 | 4304 | 1414 | 814 | 375 |
Tris(2-ethylhexyl) trimellitate | DTXSID9026265 | 1241 | 941 | 46 | 40 |
1-Octen-3-ol | DTXSID3035214 | 427 | 427 | 19 | 19 |
Thalidomide | DTXSID9022524 | 1161 | 922 | 6 | 6 |
For each chemical in your subset of ToxCast, add the HTTK plasma steady-state concentration if it is available. calc_mc_css() gives us the predicted steady-state plasma concentration resulting from an ongoing 1 mg/kg/day exposure.
for (this.id in unique(toxcast.table$DTXSID))
{# get_cheminfo() gives a list of all the CAS numbers for which HTTK will work:
if (this.id %in% get_cheminfo(info="dtxsid", suppress.messages=TRUE))
{# Set a random number generator seed so that the Monte Carlo will always give
# the same random sequence:
set.seed(12345)
$DTXSID==this.id,"Css"] <-
toxcast.table[toxcast.tablecalc_mc_css(dtxsid=this.id,
output.units="uM",
samples=NSAMP,
suppress.messages=TRUE)
$DTXSID==this.id,"Css.Type"] <- "in vitro"
toxcast.table[toxcast.table
} }
Let’s look at just the relevant columns from our table. Any Css values of “NA” (not a number) indicate that we don’t currently have in vitro TK data for those chemicals:
::kable(toxcast.table[1:10,c("Compound","Q10.AC50","Css","Css.Type")],
knitrcaption = "Summarized ToxCast Data",
floating.environment="sidewaystable")
For chemicals where no in vitro toxicokinetic data are available, we can sometimes load predictions from in silico models. We do not provide in silico models for \(f_{up}\) and \(Cl_{int}\) themselves (in many cases those models are much larger in terms of file size than the whole httk package). However we do have tables of pre-made predictions form a variety of models. For example, you can load the table of quantitative structure-property relationship (QSPR) model predictions from the Sipes et al. (2017) supplemental material:
load_sipes2017()
Now go through the list of chemicals again but only calculate if we didn’t already have a value (NOTE: load_sipes2017 will not overwrite the in vitro data by default, so you’ll still get the same answer if you use the same random number seed, but you would also be wasting time):
for (this.id in unique(toxcast.table$DTXSID))
{if (this.id %in% get_cheminfo(info="dtxsid", suppress.messages=TRUE) &
is.na(toxcast.table[toxcast.table$DTXSID==this.id,"Css"]))
{# Set a random number generator seed so that the Monte Carlo will always give
# the same random sequence:
set.seed(12345)
$DTXSID==this.id,"Css"] <-
toxcast.table[toxcast.tablecalc_mc_css(dtxsid=this.id,
output.units="uM",
samples=NSAMP,
suppress.messages=TRUE)
$DTXSID==this.id,"Css.Type"] <- "in silico"
toxcast.table[toxcast.table
} }
Take another look at our table – many of the gaps are now filled in. Any Css values of “NA” (not a number) indicate that we don’t currently have in vitro TK data or in silico predictions from Sipes et al. (2017) for those chemicals (note that there are also predictions available from Pradeep et al. (2020) (load_pradeep2020) and Dawson et al. (2021) (load_dawson2021):
::kable(toxcast.table[1:10,c("Compound","Q10.AC50","Css","Css.Type")],
knitrcaption = "Summarized ToxCast Data",
floating.environment="sidewaystable")
Most IVIVE calculations in httk take advantage of the fact that, to date, the toxicokinetic models included in httk are linear in dose. What this means is that your dose rate is \(X~mg/kg/day\) and the \(C_{ss}\) for \(1~mg/kg/day\) is \(Y~uM\), then the \(C_{ss}\) for \(X~mg/kg/day\) is \(X*Y~uM\), that is: \[C_{ss}\left(X~mg/kg/day\right) = X * C_{ss}\left(1~mg/kg/day\right)\]. Using toxicokinetics to predict tissue concentrations resulting from a known dose is known as “forward dosimetry”. Using toxicokinetics to estimate the dose that might have led to a known tissue cocentration is known as “reverse dosimetry” (Clewell, et al., 2008). httk allows the use of reverse dosimetry to calculate the equivalent steady-state dose to produce a plasma concentration equal to a bioactive concentration identified in vitro (Breen et al., 2021). For example, we can start from the tenth percentile ToxCast \(AC_{50}\) and find the administered equivalent dose that would cause that concentration in the plasma. We do this by dividing the in vitro concentration by the \(C_{ss}\) from 1 mg/kg/day, ending up with a dose rate with units of mg/kd/day that will product the in vitro concentration: \[AED = \frac{AC_{50}}{C_{ss}\left(1~mg/kg/day\right)}~mg/kg/day\] where b0oth \(AC_{50}\) and \(C_{ss}\) must have the same units. Do not forget that the ToxCast concentrations are given on the log-10 scale:
$EquivDose <- signif(10^toxcast.table$Q10.AC50 / toxcast.table$Css,
toxcast.table3)
Look at the table again:
::kable(toxcast.table[1:10,c("Compound","Q10.AC50","Css","EquivDose")],
knitrcaption = "Summarized ToxCast Data",
floating.environment="sidewaystable")
The U.S. National Academy of Sciences, Engineering, and Medicine
advises calculating chemical risk posed to the public health on the
basis of hazard, the dose-response relationship, and exposure (NRC, 1983). We can use in
vitro bioactivity as a surrogate for hazard and
httk as a surrogate for the dose-reponse relationship.
However, to prioritize chemicals on the basis of putative risk we need
some estimate of human exposure. Because most chemicals lack measured
data characterizing population intake rates (mg/kg body weight/ day) (Egeghy,
2012), a consensus prediction from EPA’s Systematic Empirical
Evaluation of Models (SEEM) is often the only option available for
exposures. The SEEM consensus prediction is composed of multiple
exposure predictors aggregated on the basis of likely exposure pathways
(Ring, 2018). The
individual exposure predictors are weighted based upon their ability to
predict intake rates that could be inferred from biomonitoring data (Ring, 2017; Wambaugh, 2014).
The SEEM exposure forecasts are uncertain, and the upper boundary of the
95% credible interval is used here. Although a 95% credible interval is
calculated, this reflects uncertainty but not variability – the (Ring, 2018) model
predicts population median exposure rates for the U.S. population.
There are three ways to access the SEEM predictions. If you can access the journal article online, we can load the SEEM predictions from Supplemental Table 1 of the Supporting Information for Ring et al. (2018). Don’t forget that you have to “unzip” a file that ends in “.zip” and place the “.txt” file in your working directory with your other files.
<- read.csv("SupTable-all.chem.preds-2018-11-28.txt",stringsAsFactors=F) SEEM
If you want, you can also download the predictions for specific chemicals from the Comptox Chemicals Dashboard. Be sure to use Batch Search and select the option for “NHANES/Predicted Exposure”. You can download the file in a “comma separated values” or “.CSV” format, in which case you use “read.csv”. Or you can download an Excel “.xls” file which requires load the package readXL.
<- read_excel("CCD-Batch-Search_DATE.xlsx",sheet=2) SEEM
Perhaps most easily we can grab SEEM predictions already in RData format from GitHub Download the file Ring2018Preds.RData and load it:
load("Ring2018Preds.RData")
<- Ring2018.preds SEEM
Again we do not have the space to distribute all the SEEM predictions within this R package, but we can give you our example chemicals:
<- as.data.frame(subset(SEEM,
example.seem %in% toxcast.table$DTXSID))
dsstox_substance_id for (this.col in 4:36)
<- signif(example.seem[,this.col],4)
example.seem[,this.col] save(example.seem,file="introtoivive-seem.Rdata",version=2)
Merge the consensus model predictions with your table:
<- httk::example.seem
example.seem <- merge(toxcast.table,
toxcast.table c(
example.seem[,"dsstox_substance_id",
"seem3",
"seem3.l95",
"seem3.u95",
"Pathway",
"AD")],
by.x="DTXSID",
by.y="dsstox_substance_id",
all.x = TRUE)
Now we can calculate a bioactivity:exposure ratio (BER), which is a risk-like metric since it takes hazard, dose-response, and exposure into account (Wetmore, 2015):
$BER <- signif(toxcast.table$EquivDose/toxcast.table$seem3.u95, 3) toxcast.table
Reorder your table by BER:
<- toxcast.table[order(toxcast.table$BER),]
toxcast.table ::kable(toxcast.table[1:10,c("Compound","EquivDose","seem3.u95","Pathway","BER")],
knitrcaption = "Bioactivity:Exposure Ratios",
floating.environment="sidewaystable")
Convert the chemical names into factors for plotting:
$Compound <- factor(toxcast.table$Compound,
toxcast.tablelevels=toxcast.table$Compound)
Then we can create a plot where the BER for each chemical is shown by the margin between putative hazard (in red) and estimated exposure (in blue). The wider the margin, the lesser the risk. Chemicals are arranged from left-to-right by increasing margin (that is, decreasing risk).
<- ggplot(data=toxcast.table) +
BER.plot geom_segment(aes(x=Compound,
xend=Compound,
y=(10^Q10.AC50)/Css,
yend=(10^Q90.AC50)/Css),
size=1,
color="red",
alpha=0.5) +
geom_segment(aes(x=Compound,
xend=Compound,
y=seem3.l95,
yend=seem3.u95),
size=1,
color="blue",
alpha=0.5) +
geom_point(aes(x=Compound,y=(10^Med.AC50)/Css),shape=3,color="red") +
geom_point(aes(x=Compound,y=seem3),shape=3,color="blue") +
theme_bw() +
theme(axis.title.x = element_text(size=8)) +
theme(axis.title.y = element_text(size=8)) +
scale_y_log10() +
theme(axis.text.x = element_text(
face = "bold",
hjust = 1,
vjust = 1,
size = 4,
angle = 45)) +
ylab("Bioactive Dose & Exposure\nmg/kg BW/day") +
xlab("Chemicals Ranked By BER")
print(BER.plot)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.