The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette illustrates how to evaluate partial least squares (PLS)
models with repeated cross-validation and information criteria using the
new parallel helpers available in bigPLSR.
We generate a small synthetic data set so the examples run quickly even when the vignette is built during package installation.
The pls_cross_validate() function now accepts a
parallel argument. Setting parallel = "future"
evaluates the folds concurrently by relying on the future
ecosystem. You are free to configure any execution plan you like before
calling the helper. Below we keep the sequential default to avoid
introducing run-time dependencies during the build process.
cv_res <- pls_cross_validate(X, y, ncomp = 4, folds = 6,
metrics = c("rmse", "r2"),
parallel = "none")
head(cv_res$details)
#> fold ncomp metric value
#> 1 1 1 rmse 0.4673779
#> 2 1 1 r2 0.8877468
#> 3 1 2 rmse 0.4176394
#> 4 1 2 r2 0.9103676
#> 5 1 3 rmse 0.3397565
#> 6 1 3 r2 0.9406804Aggregating the metrics provides a quick overview of the predictive performance per number of components:
cv_res$summary
#> ncomp metric value
#> 1 1 r2 0.8263996
#> 2 2 r2 0.8928828
#> 3 3 r2 0.9039359
#> 4 4 r2 0.9039186
#> 5 1 rmse 0.5430639
#> 6 2 rmse 0.4294906
#> 7 3 rmse 0.4038882
#> 8 4 rmse 0.4038991The cross-validation table is convenient for downstream selection. For example, we can pick the component count that minimises the RMSE:
Information criteria complement cross-validation by trading off
goodness of fit with model complexity. The helper
pls_information_criteria() computes the RSS, RMSE, AIC and
BIC across components:
fit <- pls_fit(X, y, ncomp = 4, scores = "r")
ic_tbl <- pls_information_criteria(fit, X, y)
ic_tbl
#> ncomp rss rmse aic bic
#> 1 1 28.81873 0.4900572 -167.1760 -161.6010
#> 2 2 18.63632 0.3940846 -217.4856 -209.1231
#> 3 3 17.49284 0.3818032 -223.0840 -211.9340
#> 4 4 17.39255 0.3807071 -221.7740 -207.8365For convenience the wrapper pls_select_components()
selects the best components according to the requested criteria:
pls_select_components(fit, X, y, criteria = c("aic", "bic"))
#> $table
#> ncomp rss rmse aic bic
#> 1 1 28.81873 0.4900572 -167.1760 -161.6010
#> 2 2 18.63632 0.3940846 -217.4856 -209.1231
#> 3 3 17.49284 0.3818032 -223.0840 -211.9340
#> 4 4 17.39255 0.3807071 -221.7740 -207.8365
#>
#> $best
#> $best$aic
#> [1] 3
#>
#> $best$bic
#> [1] 3futureIf you wish to parallelise cross-validation, configure a plan before calling the helper. The example below assumes a multicore environment and therefore is not run during vignette building:
future::plan(future::multisession, workers = 2)
cv_parallel <- pls_cross_validate(X, y, ncomp = 4, folds = 6,
metrics = c("rmse", "mae"),
parallel = "future",
future_seed = TRUE)
future::plan(future::sequential)The future_seed argument ensures reproducible bootstrap
samples even when multiple workers are used.
The refreshed cross-validation workflow exposes a consistent interface for sequential and parallel execution, while the information-criteria helpers offer another perspective on component selection. The combination lets you systematically tune your PLS models for both accuracy and parsimony.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.