The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
bigPLScox provides Partial Least Squares (PLS) methods
tailored for Cox proportional hazards models with large,
high-dimensional feature matrices. The package works directly with bigmemory
objects, enabling native C++ accelerators and iterative algorithms to
run without loading the full dataset into memory. In addition to the
classical coxgpls() solver, the package contains
accelerated variants, cross-validation helpers, and model
diagnostics.
coxgpls() with
support for grouped predictors.coxsgpls(),
coxspls_sgpls()).coxgplsDR(),
coxsgplsDR()) for robust fits.cv.coxgpls(),
cv.coxsgpls(), …) to select the number of latent
components.computeDR()
for quick residual exploration.computeDR(engine = "cpp") for both in-memory and big-memory
workflows.big.matrix objects while leveraging
foreach parallelism.big_pls_cox()
and big_pls_cox_gd().GPU support is not available in the current release; ongoing development focuses on improving the multi-core CPU back-end instead.
Additional articles are available in the vignettes/
directory:
bigmemory matrices and parallel back-ends.Standalone benchmarking scripts that complement the vignette live
under inst/benchmarks/.
The documentation website and examples are maintained by Frédéric Bertrand and Myriam Maumy.
Conference highlight. Maumy, M. and Bertrand, F. (2023). “PLS models and their extension for big data”. Conference presentation at the Joint Statistical Meetings (JSM 2023), Toronto, Ontario, Canada, Aug 5–10, 2023.
Conference highlight. Maumy, M. and Bertrand, F. (2023). “bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data”. Poster at BioC2023: The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA, Aug 2–4, 2023. doi:10.7490/f1000research.1119546.1.
coxgpls(),
coxgplsDR()) that operate on big matrices stored on
disk.computeDR().inst/benchmarks/
to quantify runtime trade-offs between the available solvers.vignettes/bigPLScox.Rmd) showing a complete modelling
workflow.You can install the released version of bigPLScox from CRAN with:
install.packages("bigPLScox")You can install the development version of bigPLScox from GitHub with:
# install.packages("devtools")
devtools::install_github("fbertran/bigPLScox")vignette("getting-started", package = "bigPLScox") for a
worked example.vignette("bigPLScox", package = "bigPLScox")
for big-memory workflows and streaming solvers.inst/benchmarks/ to
compare solver performance on simulated data.The full changelog lives in NEWS.md. Recent releases include:
big_pls_cox()/big_pls_cox_gd(), and new
component selection utilities.The following example demonstrates the typical workflow on a subset
of the allelotyping dataset bundled with the package. Chunks are
evaluated by default when the README is rendered locally, but they can
be toggled with knitr::opts_chunk$set(eval = FALSE) for
faster builds.
library(bigPLScox)
data(micro.censure)
data(Xmicro.censure_compl_imp)
Y_train <- micro.censure$survyear[1:80]
status_train <- micro.censure$DC[1:80]
X_train <- Xmicro.censure_compl_imp[1:80, ]Fit a Cox-PLS model with six components and inspect the fit summary:
set.seed(123)
cox_pls_fit <- coxgpls(
Xplan = X_train,
time = Y_train,
status = status_train,
ncomp = 6,
ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
cox_pls_fit
#> Error: object 'cox_pls_fit' not foundVisualise deviance residuals to assess the baseline model fit and verify the agreement between the R and C++ engines:
residuals_overview <- computeDR(Y_train, status_train, plot = TRUE)
plot of chunk unnamed-chunk-5
head(residuals_overview)
#> 1 2 3 4 5 6
#> -1.4843296 -0.5469540 -0.2314550 -0.3400301 -0.9763372 -0.3866766
cpp_residuals <- computeDR(
Y_train,
status_train,
engine = "cpp",
eta = predict(cox_pls_fit, type = "lp")
)
#> Error: object 'cox_pls_fit' not found
stopifnot(all.equal(residuals_overview, cpp_residuals, tolerance = 1e-7))
#> Error: object 'cpp_residuals' not foundCross-validate the number of components and re-fit using the deviance residual solver for comparison:
set.seed(123)
cv_results <- cv.coxgpls(
list(x = X_train, time = Y_train, status = status_train),
nt = 6,
ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
cv_results$opt_nt
#> Error: object 'cv_results' not found
cox_pls_dr <- coxgplsDR(
Xplan = X_train,
time = Y_train,
status = status_train,
ncomp = cv_results$opt_nt,
ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
cox_pls_dr
#> Error: object 'cox_pls_dr' not foundExplore alternative estimators such as coxgplsDR() for
deviance-residual fitting or coxsgpls() for sparse
component selection. Refer to the package reference for the full list of
available models and helper functions.
We provide reproducible benchmarks that compare
coxgpls() and the big-memory solvers against
survival::coxph(). Start with the Benchmarking
bigPLScox vignette for an interactive tour.
For command-line experiments, execute the scripts in
inst/benchmarks/ after installing the optional dependencies
listed under Suggests in the DESCRIPTION file.
Each script accepts environment variables (for example,
bigPLScox.benchmark.n, bigPLScox.benchmark.p,
and bigPLScox.benchmark.ncomp) to control the simulation
size.
Rscript inst/benchmarks/cox-benchmark.R
Rscript inst/benchmarks/cox_pls_benchmark.R
Rscript inst/benchmarks/benchmark_bigPLScox.RResults are stored under inst/benchmarks/results/ with
time-stamped filenames for traceability.
Four vignettes ship with the package:
bigmemory matrices and the streaming
solvers.The full reference documentation and pkgdown website are available at https://fbertran.github.io/bigPLScox/.
Bug reports and feature requests can be filed on the issue tracker. Please make sure that new code comes with unit tests or reproducible examples when applicable.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.