The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
We choose between XtX (SIMPLS), XX^T (wide-kernel), and NIPALS using \((n,p)\) and a RAM budget.
# In pls_fit(), after arg parsing:
if (identical(algo_in, "auto")) {
algo_in <- .choose_algorithm_auto(backend, X, y, ncomp)
}
.mem_bytes <- function() {
gb <- getOption("bigPLSR.mem_budget_gb", 8)
as.numeric(gb) * (1024^3)
}
.dims_of <- function(X) {
if (inherits(X, "big.matrix")) c(nrow(X), ncol(X)) else c(NROW(X), NCOL(X))
}
.choose_algorithm_auto <- function(backend, X, y, ncomp) {
is_big_local <- inherits(X, "big.matrix") || inherits(X, "big.matrix.descriptor")
dims <- .dims_of(X); n <- as.integer(dims[1]); p <- as.integer(dims[2])
B <- .mem_bytes()
bytes <- 8
need_XtX <- bytes * as.double(p) * as.double(p) # bytes for p x p
need_XXt <- bytes * as.double(n) * as.double(n) # bytes for n x n
can_XtX <- need_XtX <= M
can_XXt <- need_XXt <= M
shape_XtX <- (p <= 4L * n)
shape_XXt <- (n <= 4L * p)
if (can_XtX && shape_XtX) {
algo_in <- "simpls"
} else if (can_XXt && shape_XXt) {
algo_in <- "widekernelpls"
} else {
algo_in <- "nipals"
}
}Users can override the memory budget:
bigPLSR::pls_fit() can automatically choose an algorithm
based on problem shape and a user-configurable
memory budget:
p × p cross-product fits in memory.XtX
does not fit but XXt (n × n) does.XtX
nor XXt comfortably fit.This selection only applies when algorithm = "auto" (the
default). Any explicit algorithm = overrides the
decision.
p×p or n×n).XtX is efficient when p is moderate;
using XXt is efficient for “wide” problems
(p ≫ n) but still bounded by n^2 memory.big.memory with fixed working
memory; it is the safe fallback when memory is tight.Let the memory budget be B bytes (defaults to 8 GB,
configurable via options(bigPLSR.mem_budget_gb = ...)).
With doubles (8 bytes), we estimate the size of each symmetric matrix
as:
need_XtX = 8 * p^2need_XXt = 8 * n^2Then:
if (can_XtX && shape_XtX) { algo_in <- "simpls"}. # XtX
if (can_XXt && shape_XXt) { algo_in <- "widekernelpls"}. XXt (a.k.a. "kernel" route)
else { algorithm <- "nipals"} # streaming
This does not change R’s actual memory limit; it only controls the selection.
For tight numerical parity in tests:
library(bigPLSR)
n <- 2e3; p <- 5e2
X <- matrix(rnorm(n*p), n, p)
y <- X[,1] - 0.5*X[,2] + rnorm(n)
# Auto will likely pick SIMPLS (XtX) here
fit <- pls_fit(X, y, ncomp = 10, algorithm = "auto")
fit$algorithm # "simpls"Wide case:
n <- 200; p <- 4000
X <- matrix(rnorm(n*p), n, p)
y <- rnorm(n)
# If budget is small, auto picks kernel (XXt) or NIPALS
options(bigPLSR.mem_budget_gb = 2) # small budget
fit <- pls_fit(X, y, ncomp = 5, algorithm = "auto")
fit$algorithm # "kernelpls" or "nipals" depending on n^2 vs budgetBig-matrix streaming:
For column blocks \(J\), \[ K \approx \sum_{J} X_{[:,J]} X_{[:,J]}^\top,\quad (Kv) \leftarrow (Kv) + X_{[:,J]} \big(X_{[:,J]}^\top v\big). \]
For row blocks \(B\), \[ K \approx \sum_{B} X_B X^\top,\quad (Kv) \leftarrow (Kv) + X_B \big(X^\top v\big)_B. \]
Center on the fly: \(H K H v = K v - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top K v - \tfrac{1}{n}K\mathbf{1}\mathbf{1}^\top v + \tfrac{1}{n^2}\mathbf{1}\mathbf{1}^\top K \mathbf{1}\,\mathbf{1}^\top v\). Maintain the needed aggregated vectors once per pass.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.