The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

SelectBoost.beta algorithms

Frédéric Bertrand

Cedric, Cnam, Paris
frederic.bertrand@lecnam.net

2025-11-04

Motivation

SelectBoost.beta re-uses the correlated-resampling machinery introduced by the original SelectBoost package and combines it with Beta-regression selectors. This vignette summarises the main routines and presents pseudo-code for their internal logic. The goal is to make it easy to re-implement or extend the algorithms in other contexts.

Building blocks

The following helpers expose the canonical SelectBoost stages.

Pseudo-code: manual workflow

The manual SelectBoost workflow follows the same steps regardless of the base selector. Pseudo-code for producing selection frequencies at a single threshold is given below.

Procedure ManualSelectBoost(X, Y, selector, c0, B):
  1. X_norm <- sb_normalize(X)
  2. Corr <- sb_compute_corr(X_norm)
  3. Groups <- sb_group_variables(Corr, c0)
  4. Resamples <- sb_resample_groups(X_norm, Groups, B)
  5. CoefMatrix <- sb_apply_selector_manual(X_norm, Resamples, Y, selector)
  6. Frequencies <- sb_selection_frequency(CoefMatrix, version = "glmnet")
  7. Return Frequencies

In practice sb_resample_groups() preserves singletons untouched. Only groups with two or more predictors receive correlated draws.

Pseudo-code: correlation grid driver

sb_beta() extends the manual workflow by iterating over a grid of correlation thresholds. The following pseudo-code matches the behaviour of the exported function.

Algorithm sb_beta(X, Y, selector, B, step.num, steps.seq, version, squeeze):
  1. If squeeze, transform Y into the open unit interval.
  2. X_norm <- sb_normalize(X)
  3. Corr <- sb_compute_corr(X_norm)
  4. Grid <- {1} ∪ .sb_c0_sequence(Corr, step.num, steps.seq) ∪ {0}
  5. For each c0 in Grid:
       a. Groups <- sb_group_variables(Corr, c0)
       b. If every group has size 1:
            i. CoefMatrix <- selector(X_norm, Y)
          Else:
            i. Resamples <- sb_resample_groups(X_norm, Groups, B)
           ii. For each design in Resamples:
                  - CoefMatrix[, b] <- selector(design, Y)
       c. Freq[c0, ] <- sb_selection_frequency(CoefMatrix, version)
  6. Attach attributes (B, selector, c0 sequence) and return Freq

The selector argument can be any function returning a numeric vector of coefficients with optional names. When version = "glmnet", the first entry is interpreted as the intercept and excluded from the selection frequencies.

The squeezing step enforces the usual SelectBoost transformation that pushes all responses inside (0, 1). Keep it enabled unless you already pre-processed the outcome; otherwise zero or one values will cause the selectors to abort.

Extending the algorithms

The modular helpers are designed to be recomposed. For example, it is possible to plug in a custom grouping routine before calling sb_resample_groups() or to supply a selector that implements cross-validation or penalisation strategies. Because each helper only relies on basic R primitives, the pseudo-code above translates readily into other languages.

Conference communications

The SelectBoost4Beta concepts described here were showcased by Frédéric Bertrand and Myriam Maumy in 2023 at:

These communications detailed how correlation-aware resampling strengthens variable selection performance for Beta regression under strong predictor dependencies.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.