The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
SelectBoost.beta re-uses the correlated-resampling
machinery introduced by the original SelectBoost package and combines it
with Beta-regression selectors. This vignette summarises the main
routines and presents pseudo-code for their internal logic. The goal is
to make it easy to re-implement or extend the algorithms in other
contexts.
The following helpers expose the canonical SelectBoost stages.
sb_normalize() centres and \(\ell_2\)-normalises the design matrix
columns.sb_compute_corr() computes a correlation (or
user-supplied association) matrix from the normalised design.sb_group_variables() converts the correlation matrix
into groups of highly associated predictors for a given threshold \(c_0\).sb_resample_groups() regenerates correlated predictors
for each group by drawing from a multivariate normal approximation and
re-normalising. When all groups are singletons it now warns and simply
returns repeated copies of the normalised design.sb_apply_selector_manual() applies a selector to each
resampled design and collects the resulting coefficient vectors. Set
keep_template = TRUE (the default) to retain the base fit
as column sim0 without recomputing it on the first
resample.sb_selection_frequency() converts the matrix of
coefficients into selection frequencies while respecting the selector’s
coefficient convention.The manual SelectBoost workflow follows the same steps regardless of the base selector. Pseudo-code for producing selection frequencies at a single threshold is given below.
Procedure ManualSelectBoost(X, Y, selector, c0, B):
1. X_norm <- sb_normalize(X)
2. Corr <- sb_compute_corr(X_norm)
3. Groups <- sb_group_variables(Corr, c0)
4. Resamples <- sb_resample_groups(X_norm, Groups, B)
5. CoefMatrix <- sb_apply_selector_manual(X_norm, Resamples, Y, selector)
6. Frequencies <- sb_selection_frequency(CoefMatrix, version = "glmnet")
7. Return Frequencies
In practice sb_resample_groups() preserves singletons
untouched. Only groups with two or more predictors receive correlated
draws.
sb_beta() extends the manual workflow by iterating over
a grid of correlation thresholds. The following pseudo-code matches the
behaviour of the exported function.
Algorithm sb_beta(X, Y, selector, B, step.num, steps.seq, version, squeeze):
1. If squeeze, transform Y into the open unit interval.
2. X_norm <- sb_normalize(X)
3. Corr <- sb_compute_corr(X_norm)
4. Grid <- {1} ∪ .sb_c0_sequence(Corr, step.num, steps.seq) ∪ {0}
5. For each c0 in Grid:
a. Groups <- sb_group_variables(Corr, c0)
b. If every group has size 1:
i. CoefMatrix <- selector(X_norm, Y)
Else:
i. Resamples <- sb_resample_groups(X_norm, Groups, B)
ii. For each design in Resamples:
- CoefMatrix[, b] <- selector(design, Y)
c. Freq[c0, ] <- sb_selection_frequency(CoefMatrix, version)
6. Attach attributes (B, selector, c0 sequence) and return Freq
The selector argument can be any function returning a numeric vector
of coefficients with optional names. When
version = "glmnet", the first entry is interpreted as the
intercept and excluded from the selection frequencies.
The squeezing step enforces the usual SelectBoost transformation that
pushes all responses inside (0, 1). Keep it enabled unless
you already pre-processed the outcome; otherwise zero or one values will
cause the selectors to abort.
The modular helpers are designed to be recomposed. For example, it is
possible to plug in a custom grouping routine before calling
sb_resample_groups() or to supply a selector that
implements cross-validation or penalisation strategies. Because each
helper only relies on basic R primitives, the pseudo-code above
translates readily into other languages.
The SelectBoost4Beta concepts described here were showcased by Frédéric Bertrand and Myriam Maumy in 2023 at:
These communications detailed how correlation-aware resampling strengthens variable selection performance for Beta regression under strong predictor dependencies.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.