The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

bigPLScox

bigPLScox, PLS models and their extension for big data in R

Frédéric Bertrand and Myriam Maumy-Bertrand

R-CMD-check R-hub

bigPLScox provides Partial Least Squares (PLS) methods tailored for Cox proportional hazards models with large, high-dimensional feature matrices. The package works directly with bigmemory objects, enabling native C++ accelerators and iterative algorithms to run without loading the full dataset into memory. In addition to the classical coxgpls() solver, the package contains accelerated variants, cross-validation helpers, and model diagnostics.

GPU support is not available in the current release; ongoing development focuses on improving the multi-core CPU back-end instead.

Additional articles are available in the vignettes/ directory:

Standalone benchmarking scripts that complement the vignette live under inst/benchmarks/.

The documentation website and examples are maintained by Frédéric Bertrand and Myriam Maumy.

Conference highlight. Maumy, M. and Bertrand, F. (2023). “PLS models and their extension for big data”. Conference presentation at the Joint Statistical Meetings (JSM 2023), Toronto, Ontario, Canada, Aug 5–10, 2023.

Conference highlight. Maumy, M. and Bertrand, F. (2023). “bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data”. Poster at BioC2023: The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA, Aug 2–4, 2023. doi:10.7490/f1000research.1119546.1.

Key features

Installation

You can install the released version of bigPLScox from CRAN with:

install.packages("bigPLScox")

You can install the development version of bigPLScox from GitHub with:

# install.packages("devtools")
devtools::install_github("fbertran/bigPLScox")

Learning materials

Release highlights

The full changelog lives in NEWS.md. Recent releases include:

Quick start

The following example demonstrates the typical workflow on a subset of the allelotyping dataset bundled with the package. Chunks are evaluated by default when the README is rendered locally, but they can be toggled with knitr::opts_chunk$set(eval = FALSE) for faster builds.

library(bigPLScox)
data(micro.censure)
data(Xmicro.censure_compl_imp)
Y_train <- micro.censure$survyear[1:80]
status_train <- micro.censure$DC[1:80]
X_train <- Xmicro.censure_compl_imp[1:80, ]

Fit a Cox-PLS model with six components and inspect the fit summary:

set.seed(123)
cox_pls_fit <- coxgpls(
  Xplan = X_train,
  time = Y_train,
  status = status_train,
  ncomp = 6,
  ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
cox_pls_fit
#> Error: object 'cox_pls_fit' not found

Visualise deviance residuals to assess the baseline model fit and verify the agreement between the R and C++ engines:

residuals_overview <- computeDR(Y_train, status_train, plot = TRUE)
plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-5

head(residuals_overview)
#>          1          2          3          4          5          6 
#> -1.4843296 -0.5469540 -0.2314550 -0.3400301 -0.9763372 -0.3866766

cpp_residuals <- computeDR(
  Y_train,
  status_train,
  engine = "cpp",
  eta = predict(cox_pls_fit, type = "lp")
)
#> Error: object 'cox_pls_fit' not found
stopifnot(all.equal(residuals_overview, cpp_residuals, tolerance = 1e-7))
#> Error: object 'cpp_residuals' not found

Cross-validate the number of components and re-fit using the deviance residual solver for comparison:

set.seed(123)
cv_results <- cv.coxgpls(
  list(x = X_train, time = Y_train, status = status_train),
  nt = 6,
  ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
cv_results$opt_nt
#> Error: object 'cv_results' not found
cox_pls_dr <- coxgplsDR(
  Xplan = X_train,
  time = Y_train,
  status = status_train,
  ncomp = cv_results$opt_nt,
  ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
cox_pls_dr
#> Error: object 'cox_pls_dr' not found

Explore alternative estimators such as coxgplsDR() for deviance-residual fitting or coxsgpls() for sparse component selection. Refer to the package reference for the full list of available models and helper functions.

Benchmarking

We provide reproducible benchmarks that compare coxgpls() and the big-memory solvers against survival::coxph(). Start with the Benchmarking bigPLScox vignette for an interactive tour.

For command-line experiments, execute the scripts in inst/benchmarks/ after installing the optional dependencies listed under Suggests in the DESCRIPTION file. Each script accepts environment variables (for example, bigPLScox.benchmark.n, bigPLScox.benchmark.p, and bigPLScox.benchmark.ncomp) to control the simulation size.

Rscript inst/benchmarks/cox-benchmark.R
Rscript inst/benchmarks/cox_pls_benchmark.R
Rscript inst/benchmarks/benchmark_bigPLScox.R

Results are stored under inst/benchmarks/results/ with time-stamped filenames for traceability.

Vignettes and documentation

Four vignettes ship with the package:

  1. Getting started with bigPLScox – an end-to-end introduction covering data preparation, fitting, and validation workflows.
  2. Overview of bigPLScox – a high-level description of the modelling functions and their typical use cases.
  3. Big-memory workflows with bigPLScox – instructions for working with bigmemory matrices and the streaming solvers.
  4. Benchmarking bigPLScox – guidance for evaluating performance against baseline Cox implementations using the bench package.

The full reference documentation and pkgdown website are available at https://fbertran.github.io/bigPLScox/.

Bug reports and feature requests

Bug reports and feature requests can be filed on the issue tracker. Please make sure that new code comes with unit tests or reproducible examples when applicable.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.