The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The new sb_beta() helper glues the beta-regression
selectors provided by this package to a SelectBoost-style
correlated-resampling loop implemented directly in
SelectBoost.beta. It takes care of squeezing the response
inside the open unit interval (unless squeeze = FALSE) and
tagging the output with the selector that was used.
This vignette walks through two complementary perspectives:
betareg_step_aic() to highlight where correlated resampling
happens.sb_beta() to obtain the same result with a
single function call.Throughout the examples we rely on the built-in simulator to generate correlated design matrices with a handful of truly associated predictors.
sim <- simulation_DATA.beta(
n = 150, p = 6, s = 3, beta_size = c(1, -0.8, 0.6),
corr = "ar1", rho = 0.25,
mechanism = "jitter"
)
str(sim$X)
#> num [1:150, 1:6] -0.123 1.635 1.428 -0.508 -0.243 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : chr [1:6] "x1" "x2" "x3" "x4" ...
summary(sim$Y)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.04934 0.25725 0.49003 0.48969 0.70233 0.99998The classic SelectBoost algorithm first normalises the design matrix,
computes pairwise correlations, groups variables above a chosen
threshold and finally resamples the predictors before applying the
selector. All of those stages are available directly in
SelectBoost.beta.
# Normalise the predictors (centre + L2 scale)
X_norm <- sb_normalize(sim$X)
# Compute correlations
corr_mat <- sb_compute_corr(X_norm)
# Group variables whose absolute correlation exceeds 0.6
raw_groups <- sb_group_variables(corr_mat, c0 = 0.6)
# Draw eight correlated replicas for the grouped variables
X_draws <- sb_resample_groups(X_norm, raw_groups, B = 8, seed = 11)
#> Warning: All groups are singletons; correlated resampling degenerates to
#> repeated `X_norm`.
dim(X_draws[[1]])
#> [1] 150 6Each element of X_draws stores a correlated copy of the
normalised design. Feeding these matrices to
sb_apply_selector_manual() together with a beta-regression
selector yields coefficient estimates for every resampled data set.
coef_path <- sb_apply_selector_manual(
X_norm, X_draws, sim$Y, selector = betareg_step_aic
)
dim(coef_path)
#> [1] 8 9
coef_path[, 1:3]
#> sim0 sim1 sim2
#> (Intercept) -0.03588528 -0.03588528 -0.03588528
#> x1 11.34931343 11.34931343 11.34931343
#> x2 -8.95724666 -8.95724666 -8.95724666
#> x3 7.17554325 7.17554325 7.17554325
#> x4 0.00000000 0.00000000 0.00000000
#> x5 0.87055660 0.87055660 0.87055660
#> x6 0.00000000 0.00000000 0.00000000
#> phi|(Intercept) 2.95165950 2.95165950 2.95165950The leading column sim0 records the coefficients fitted
on the original normalised design, providing a convenient baseline
against which the resampled paths can be compared.
Finally, the sb_selection_frequency() helper counts how
often each variable appears with a non-zero coefficient across the
replicates. Because betareg_step_aic() returns a
glmnet-style coefficient vector (intercept plus
predictors), we set version = "glmnet" when computing the
selection frequencies.
sel_freq <- sb_selection_frequency(coef_path, version = "glmnet")
sel_freq
#> x1 x2 x3 x4 x5
#> 1 1 1 0 1
#> x6 phi|(Intercept)
#> 0 1This manual exercise confirms that the correlated resampling loop
from the original SelectBoost package plugs seamlessly into the beta
selectors shipped in SelectBoost.beta.
sb_beta()The sb_beta() wrapper performs the same steps internally
while exposing the arguments most relevant to beta regression. By
default it uses betareg_step_aic() as the base selector,
but any of the exported functions ("betareg_step_bic",
betareg_glmnet, etc.) can be passed either by name or as a
function.
sb <- sb_beta(
sim$X, sim$Y,
B = 60,
step.num = 0.5,
steps.seq = c(0.9, 0.7, 0.5)
)
class(sb)
#> [1] "sb_beta" "matrix" "array"
attr(sb, "selector")
#> [1] "betareg_step_aic"
rownames(sb)
#> [1] "c0 = 1.000" "c0 = 0.900" "c0 = 0.700" "c0 = 0.500" "c0 = 0.000"
round(sb, 3)
#> SelectBoost beta selection frequencies
#> Selector: betareg_step_aic
#> Resamples per threshold: 60
#> Interval mode: none
#> c0 grid: 1.0, 0.9, 0.7, 0.5, 0.0
#> Inner thresholds: 0.9, 0.7, 0.5
#> x1 x2 x3 x4 x5 x6 phi|(Intercept)
#> c0 = 1.000 1.0 1.000 1.0 0.00 1.0 0.0 1
#> c0 = 0.900 1.0 1.000 1.0 0.00 1.0 0.0 1
#> c0 = 0.700 1.0 1.000 1.0 0.00 1.0 0.0 1
#> c0 = 0.500 1.0 1.000 1.0 0.00 1.0 0.0 1
#> c0 = 0.000 0.2 0.167 0.2 0.25 0.2 0.2 1
#> attr(,"c0.seq")
#> [1] 1.0 0.9 0.7 0.5 0.0
#> attr(,"steps.seq")
#> [1] 0.9 0.7 0.5
#> attr(,"B")
#> [1] 60
#> attr(,"selector")
#> [1] "betareg_step_aic"
#> attr(,"resample_diagnostics")
#> attr(,"resample_diagnostics")$`c0 = 1.000`
#> [1] group size regenerated
#> [4] cached mean_abs_corr_orig mean_abs_corr_surrogate
#> [7] mean_abs_corr_cross
#> <0 rows> (or 0-length row.names)
#>
#> attr(,"resample_diagnostics")$`c0 = 0.900`
#> [1] group size regenerated
#> [4] cached mean_abs_corr_orig mean_abs_corr_surrogate
#> [7] mean_abs_corr_cross
#> <0 rows> (or 0-length row.names)
#>
#> attr(,"resample_diagnostics")$`c0 = 0.700`
#> [1] group size regenerated
#> [4] cached mean_abs_corr_orig mean_abs_corr_surrogate
#> [7] mean_abs_corr_cross
#> <0 rows> (or 0-length row.names)
#>
#> attr(,"resample_diagnostics")$`c0 = 0.500`
#> [1] group size regenerated
#> [4] cached mean_abs_corr_orig mean_abs_corr_surrogate
#> [7] mean_abs_corr_cross
#> <0 rows> (or 0-length row.names)
#>
#> attr(,"resample_diagnostics")$`c0 = 0.000`
#> group size regenerated cached mean_abs_corr_orig
#> 1 x1,x2,x3,x4,x5,x6 6 60 FALSE 0.1189262
#> mean_abs_corr_surrogate mean_abs_corr_cross
#> 1 0.1375874 0.06768812
#>
#> attr(,"interval")
#> [1] "none"The resulting matrix comes with several attributes that document how
the frequencies were generated. attr(sb, "c0.seq") returns
the correlation threshold grid, attr(sb, "B") stores the
number of correlated resamples per threshold,
attr(sb, "interval") highlights whether interval sampling
was activated, and attr(sb, "resample_diagnostics") keeps
summary statistics on the cached surrogate draws. These metadata mirror
the legacy SelectBoost beta implementation and are now documented in
?sb_beta().
Changing the selector is simply a matter of passing a different
routine. The call below uses the GAMLSS-based elastic-net variant and
asks sb_beta() to pass choose = "bic" to the
underlying betareg_glmnet() implementation.
sb_enet <- sb_beta(
sim$X, sim$Y,
selector = betareg_glmnet,
B = 60,
step.num = 0.5,
version = "glmnet",
choose = "bic",
prestandardize = TRUE
)
attr(sb_enet, "selector")
#> [1] "betareg_glmnet"
colMeans(sb_enet)
#> x1 x2 x3 x4 x5 x6
#> 0.35000000 0.33888889 0.33333333 0.34444444 0.34444444 0.01666667Because the wrapper always builds on the same correlated resamples,
results are directly comparable across selectors as long as they adopt
the glmnet-style coefficient convention. This makes it
straightforward to run stability analyses for interval responses by
pairing sb_beta() with the convenience wrapper
sb_beta_interval() (or the lower-level
fastboost_interval()) or to compare several beta selectors
under the exact same resampled design matrices.
The SelectBoost4Beta workflow and its correlated resampling foundations were presented by Frédéric Bertrand and Myriam Maumy in 2023 at two conferences:
Both communications emphasised how leveraging correlation-aware resampling improves the recall and precision of variable selection in high-dimensional Beta regression settings.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.