The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
High-dimensional survival datasets can be computationally demanding.
bigPLScox implements algorithms that scale to large
numbers of predictors and observations via component-based models,
sparse penalties, and stochastic gradient descent routines. This
vignette demonstrates how to benchmark the package against baseline
approaches using the bench package.
We focus on simulated data to illustrate reproducible comparisons
between the classical coxgpls() solver, its big-memory
counterparts, and the survival::coxph() implementation.
The examples below require a recent version of bench together with survival.
The helper dataCox() simulates survival outcomes with
right censoring. We work with a moderately sized problem here, but
larger values for n and p can be used to
stress test performance.
set.seed(2024)
sim_design <- dataCox(
n = 2000,
lambda = 2,
rho = 1.5,
x = matrix(rnorm(2000 * 50), ncol = 50),
beta = c(1, 3, rep(0, 48)),
cens.rate = 5
)
cox_data <- list(
x = as.matrix(sim_design[, -(1:3)]),
time = sim_design$time,
status = sim_design$status
)
X_big <- bigmemory::as.big.matrix(cox_data$x)We compare the classical Cox proportional hazards model with
coxgpls() and the two big_pls_cox() solvers.
The bench::mark() helper executes the estimators multiple
times and records timing statistics alongside memory usage
information.
bench_res <- bench::mark(
coxgpls = coxgpls(
cox_data$x,
cox_data$time,
cox_data$status,
ncomp = 5,
ind.block.x = c(3, 10)
),
big_pls = big_pls_cox(X_big, cox_data$time, cox_data$status, ncomp = 5),
big_pls_gd = big_pls_cox_gd(X_big, cox_data$time, cox_data$status, ncomp = 5, max_iter = 100),
survival = coxph(Surv(cox_data$time, cox_data$status) ~ cox_data$x, ties = "breslow"),
iterations = 100,
check = FALSE
)
bench_res
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 coxgpls 17.64ms 18.02ms 55.3 30.95MB 383.
#> 2 big_pls 7.72ms 7.88ms 127. 3.54MB 12.5
#> 3 big_pls_gd 11.67ms 11.71ms 85.3 2.75MB 7.42
#> 4 survival 26.94ms 27.32ms 36.4 13.39MB 23.3
bench_summary <- bench_res[, c("expression", "median", "itr/sec")]
bench_summary
#> # A tibble: 4 × 3
#> expression median `itr/sec`
#> <bch:expr> <bch:tm> <dbl>
#> 1 coxgpls 18.02ms 55.3
#> 2 big_pls 7.88ms 127.
#> 3 big_pls_gd 11.71ms 85.3
#> 4 survival 27.32ms 36.4The resulting tibble reports elapsed time, memory allocations, and
garbage collection statistics for each estimator. The
itr/sec column is often the most useful indicator when
comparing multiple implementations. The bench_summary
object summarises the median runtime and iterations per second.
bench provides ggplot-based helpers to visualise the
distributions of elapsed and memory usage.
Additional geometries, such as ridge plots, are available via
autoplot(bench_res, type = "ridge").
Use the function write.csv() to store the benchmarking
table as part of a reproducible pipeline. For larger studies consider
varying the number of latent components, sparsity constraints, or the
dataset dimensions.
The package also ships with standalone scripts under
inst/benchmarks/ that mirror this vignette while exposing
additional configuration points. Run them from the repository root
as:
Rscript inst/benchmarks/cox-benchmark.R
Rscript inst/benchmarks/benchmark_bigPLScox.R
Rscript inst/benchmarks/cox_pls_benchmark.REach script accepts environment variables to adjust the problem size
and stores results under inst/benchmarks/results/ with
time-stamped filenames.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.