The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
In trial simulation mode, we simulate patient data and then analyze it with a (usually Bayesian) model. In trial execution mode, we begin with patient data and just do the downstream analysis. This is useful for simulating entire clinical programs where the same patients move from trial to trial.
The workflow is very similar to the parallel computing section of the vignette. First, we use our FACTS file and run_flfll()
to generate a directory of param files. The example below uses a tempfile()
to store the param files (i.e. output_path
). However, for distributed computing on traditional HPC clusters, output_path
should be a directory path that all nodes can access.
library(rfacts)
facts_file <- get_facts_file_example("dichot.facts") # could be any FACTS file
# On traditional HPC clusters, this should be a shared directory
# instead of a temp directory:
tmp <- fs::dir_create(tempfile())
all_param_files <- file.path(tmp, "param_files")
# Set n_weeks_files to 0 so we only read the weeks files generated by
# trial execution mode.
run_flfll(facts_file, all_param_files, n_weeks_files = 0L)
Since we are supplying our own data, VSR scenarios lose their meaning, and only need a single VSR scenario. So we pick one. Any will do.
param_files <- get_param_dirs(all_param_files)[1]
basename(param_files)
Next, we write a function to perform a single simulation. It simulates a single set of patients, does some custom data processing on the patients files, runs trial execution mode on those patients files, and returns the aggregated weeks files as an in-memory data frame. In functions like this, be sure to set a unique seed for each simulation iteration.
run_once <- function(index, param_files) {
out <- tempfile()
dir_copy(param_files, out) # Requires the fs package.
run_engine_dichot(out, n_sims = 1L)
pats <- read_patients(out) # Read and aggregate all the patients files.
# Here, do some custom data processing on the whole pats data frame...
# Write the processed patient data to the original patients files.
overwrite_csv_files(pats)
run_engine_dichot(
out,
n_sims = 1L,
seed = index,
mode = "r",
execdata = "patients00001.csv", # Custom / modified patients files.
final = TRUE
)
read_weeks(out)
}
The data frame below is an aggregate of all the weeks00000.csv
files from trial execution mode.
library(dplyr)
library(fs)
# Ignore the facts_sim column since all weeks files were indexed 00000.
# For data post-processing, use the facts_id column instead.
lapply(seq_len(2), run_once, param_files = param_files) %>%
bind_rows()
Thanks to clustermq
, it is straightforward to run simulations in parallel on a cluster. First, configure clustermq
with a template file and global options. Here, we demonstrate using an SGE cluster.
# Configure clustermq to use our grid and your template file.
# If you are using a scheduler like SGE, you need to write a template file
# like clustermq.tmpl. To learn how, visit
# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1
options(clustermq.scheduler = "sge", clustermq.template = "clustermq.tmpl")
Then, run the simulations.
library(clustermq)
weeks <- Q(
fun = run_once,
iter = seq_len(1e3), # Run 1000 simulations.
const = list(param_files = param_files),
pkgs = c("fs", "rfacts"),
n_jobs = 1e2 # Use 100 clustermq workers.
) %>%
bind_rows()
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.