The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
tq_apply() provides a simplified workflow for running
parallel tasks on HPC clusters. It combines multiple steps (project
creation, resource assignment, task addition, and worker scheduling)
into a single function call, similar to base R’s lapply()
or sapply().
This is the easiest way to get started with taskqueue if
you:
Before using taskqueue, ensure you have:
PostgreSQL installed and configured (see PostgreSQL Setup vignette)
SSH access configured for remote resources (see SSH Setup vignette)
Database initialized:
A resource already defined:
The simplest use of tq_apply() requires just a few
arguments:
library(taskqueue)
# Define your function
my_simulation <- function(i) {
# Your computation here
result <- i^2
Sys.sleep(1) # Simulate some work
return(result)
}
# Run 100 tasks in parallel
tq_apply(
n = 100,
fun = my_simulation,
project = "my_project",
resource = "hpc"
)This will:
my_simulation(1),
my_simulation(2), …, my_simulation(100) in
paralleln: Number of tasks to run
(integer)fun: The function to execute for each
taskproject: Project name (string)resource: Resource name (string, must
already exist)memory: Memory per task in GB
(default: 10)hour: Maximum runtime in hours
(default: 24)account: Account name for cluster
billing (optional)working_dir: Working directory on
cluster (default: getwd())...: Additional arguments passed to
your functionYou can pass additional arguments to your function using
...:
my_function <- function(i, multiplier, offset = 0) {
result <- i * multiplier + offset
return(result)
}
tq_apply(
n = 50,
fun = my_function,
project = "test_args",
resource = "hpc",
multiplier = 10, # Passed to my_function
offset = 5 # Passed to my_function
)Each task will call: - Task 1:
my_function(1, multiplier = 10, offset = 5) - Task 2:
my_function(2, multiplier = 10, offset = 5) - And so
on…
Here’s a practical example running a Monte Carlo simulation:
library(taskqueue)
# Define simulation function
run_monte_carlo <- function(task_id, n_samples = 10000, seed_base = 12345) {
# Set unique seed for each task
set.seed(seed_base + task_id)
# Run simulation
samples <- rnorm(n_samples)
result <- list(
task_id = task_id,
mean = mean(samples),
sd = sd(samples),
quantiles = quantile(samples, probs = c(0.025, 0.5, 0.975))
)
# Save results
out_file <- sprintf("results/simulation_%04d.Rds", task_id)
dir.create("results", showWarnings = FALSE)
saveRDS(result, out_file)
return(invisible(NULL))
}
# Run 1000 simulations in parallel
tq_apply(
n = 1000,
fun = run_monte_carlo,
project = "monte_carlo_study",
resource = "hpc",
memory = 8, # 8 GB per task
hour = 2, # 2 hour time limit
working_dir = "/home/user/monte_carlo",
n_samples = 50000, # Argument for run_monte_carlo
seed_base = 99999 # Argument for run_monte_carlo
)After calling tq_apply(), monitor your tasks:
After all tasks complete, collect your results:
Your function should save results to the file system:
Check if output already exists to avoid re-running completed tasks:
Ensure your working directory on the cluster is correct:
tq_apply() simplifies the workflow by combining these
steps:
Manual approach:
# Multiple steps
project_add("test", memory = 10)
project_resource_add("test", "hpc", working_dir = "/path", hours = 24)
task_add("test", num = 100, clean = TRUE)
project_reset("test")
worker_slurm("test", "hpc", fun = my_function)With tq_apply():
Tasks fail immediately: - Check the log folder specified in your resource configuration - Verify your function works locally first - Ensure the working directory exists on the cluster
Tasks remain in “idle” status: - Check that the
project is started: project_start("my_project") - Verify
the resource is correctly configured - Check SLURM queue:
squeue -u $USER
“Resource not found” error: - The resource must be
created before using tq_apply() - Use
resource_list() to see available resources - Create
resource with resource_add()
Use tq_apply() when: - You have a
simple parallel task - You want to quickly run many iterations of a
function - You don’t need fine-grained control over project settings
Use the manual workflow when: - You need to manage multiple projects simultaneously - You want to reuse a project for different task sets - You need more control over resource scheduling - You’re running different types of tasks in the same project
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.