| Type: | Package |
| Title: | Stability-Selection via Correlated Resampling for Beta-Regression Models |
| Version: | 0.4.5 |
| Date: | 2025-11-04 |
| Depends: | R (≥ 4.0) |
| Imports: | betareg, gamlss, gamlss.dist, glmnet, MASS, Rcpp, rlang, stats, withr |
| Suggests: | future, future.apply, gamlss.lasso, ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0) |
| LinkingTo: | Rcpp, RcppArmadillo |
| Author: | Frederic Bertrand |
| Maintainer: | Frederic Bertrand <frederic.bertrand@lecnam.net> |
| Description: | Adds variable-selection functions for Beta regression models (both mean and phi submodels) so they can be used within the 'SelectBoost' algorithm. Includes stepwise AIC, BIC, and corrected AIC on betareg() fits, 'gamlss'-based LASSO/Elastic-Net, a pure 'glmnet' iterative re-weighted least squares-based selector with an optional standardization speedup, and 'C++' helpers for iterative re-weighted least squares working steps and precision updates. Also provides a fastboost_interval() variant for interval responses, comparison helpers, and a flexible simulator simulation_DATA.beta() for interval-valued data. For more details see Bertrand and Maumy (2023) <doi:10.7490/f1000research.1119552.1>. |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| Classification/MSC: | 62H11, 62J12, 62J99 |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.3.3 |
| URL: | https://fbertran.github.io/SelectBoost.beta/, https://github.com/fbertran/SelectBoost.beta/ |
| BugReports: | https://github.com/fbertran/SelectBoost.beta/issues/ |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | yes |
| Packaged: | 2025-11-04 09:49:20 UTC; bertran7 |
| Repository: | CRAN |
| Date/Publication: | 2025-11-11 09:20:09 UTC |
SelectBoost.beta: Stability-Selection via Correlated Resampling for Beta-Regression Models
Description
Adds variable-selection functions for Beta regression models (both mean and phi submodels) so they can be used within the 'SelectBoost' algorithm. Includes stepwise AIC, BIC, and corrected AIC on betareg() fits, 'gamlss'-based LASSO/Elastic-Net, a pure 'glmnet' iterative re-weighted least squares-based selector with an optional standardization speedup, and 'C++' helpers for iterative re-weighted least squares working steps and precision updates. Also provides a fastboost_interval() variant for interval responses, comparison helpers, and a flexible simulator simulation_DATA.beta() for interval-valued data. For more details see Bertrand and Maumy (2023) doi:10.7490/f1000research.1119552.1.
Author(s)
This package was written by Frédéric Bertrand. Maintainer: Frédéric Bertrand frederic.bertrand@lecnam.net
References
Bertrand, F. and Maumy, M. (2023). Improving variable selection in Beta regression models using correlated resampling. Presented at Joint Statistical Meetings 2023, Toronto, Canada.
Bertrand, F. and Maumy, M. (2023). SelectBoost4Beta: Improving variable selection in Beta regression models. Presented at BioC2023, Boston, USA.
See Also
Useful links:
Report bugs at https://github.com/fbertran/SelectBoost.beta/issues/
Examples
set.seed(1)
n <- 150; p <- 6
X <- matrix(rnorm(n*p), n, p); colnames(X) <- paste0("x",1:p)
eta <- 0.4 + X[,1] - 0.7*X[,3]
mu <- plogis(eta)
Y <- rbeta(n, mu*25, (1-mu)*25)
betareg_step_aic(X, Y) # should return (Intercept) + x1,x3 nonzero often
betareg_step_bic(X, Y)
betareg_step_aicc(X, Y)
Beta regression Elastic-Net via GAMLSS (gamlss.lasso)
Description
Uses gamlss.lasso::gnet() to fit ENet on the mean submodel of
gamlss(dist = BE). The routine assumes complete cases and does not expose
offsets or precision-model terms.
Usage
betareg_enet_gamlss(
X,
Y,
method = c("IC", "CV"),
ICpen = c("BIC", "AIC", "HQC"),
alpha = 1,
trace = FALSE
)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
method |
|
ICpen |
Penalty for |
alpha |
Elastic-net mixing (1 = LASSO, 0 = ridge). |
trace |
Logical; print stepwise trace. |
Value
Named numeric vector of coefficients as in betareg_lasso_gamlss().
See Also
gamlss.lasso::gnet(), gamlss::gamlss(), gamlss.dist::BE()
Pure glmnet IRLS selector for Beta regression
Description
Runs an IRLS loop with Beta working responses/weights and calls
glmnet on the weighted least-squares surrogate. Supports BIC/AIC/CV
model choice and an optional prestandardize speedup. The helper uses only
the mean submodel, requires complete cases, and does not expose offset terms.
Usage
betareg_glmnet(
X,
Y,
alpha = 1,
choose = c("bic", "aic", "cv"),
nfolds = 5,
n_iter = 6,
tol = 1e-05,
standardize = TRUE,
lambda = NULL,
phi_init = 20,
update_phi = TRUE,
phi_maxit = 5,
prestandardize = FALSE,
trace = FALSE
)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
alpha |
Elastic-net mixing parameter. |
choose |
One of |
nfolds |
Folds for CV when |
n_iter |
Max IRLS iterations; |
tol |
Convergence tolerance for the IRLS parameter change (Euclidean norm
of the difference in |
standardize |
Forwarded to |
lambda |
Optional fixed lambda; if |
phi_init |
Initial precision (phi). |
update_phi |
Logical; update phi inside the IRLS loop. |
phi_maxit |
Newton steps for phi update. |
prestandardize |
If |
trace |
Logical; print IRLS progress. |
Value
Named numeric vector (Intercept) + colnames(X) with zeros for
unselected variables.
See Also
glmnet::glmnet(), glmnet::cv.glmnet()
Examples
set.seed(1); X <- matrix(rnorm(500), 100, 5); Y <- plogis(X[,1]-0.5*X[,3])
Y <- rbeta(100, Y*40, (1-Y)*40)
betareg_glmnet(X, Y, alpha = 1, choose = "bic", prestandardize = TRUE)
Beta regression LASSO via GAMLSS
Description
Uses gamlss::ri() (L1 penalty) in a gamlss(dist = BE) mean submodel to
select variables. The helper works on complete cases of X/Y, targets the
mean component, and does not yet expose offset handling.
Usage
betareg_lasso_gamlss(
X,
Y,
method = c("ML", "GAIC"),
k = 2,
degf = NULL,
lambda = NULL,
trace = FALSE
)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
method |
|
k |
Penalty multiplier for GAIC when |
degf |
Optional degrees of freedom for the L1 term. |
lambda |
Optional penalty strength. |
trace |
Logical; print stepwise trace. |
Value
Named numeric vector of coefficients (Intercept) + colnames(X),
with 0 for unselected variables.
See Also
gamlss::gamlss(), gamlss::ri(), gamlss.dist::BE()
Examples
set.seed(1); X <- matrix(rnorm(300), 100, 3); Y <- plogis(X[,1]); Y <- rbeta(100, Y*30, (1-Y)*30)
betareg_lasso_gamlss(X, Y, method = "GAIC", k = 2)
Stepwise Beta regression by AIC
Description
Fits a Beta regression with optional joint selection of the mean and
precision (phi) submodels using betareg::betareg(). The routine performs
greedy forward/backward search using the requested information criterion and
returns coefficients aligned with the supplied design matrix. The selectors
currently target the mean submodel only, require complete cases, and do not
expose offsets. Observation weights are passed through to betareg() when
provided.
Usage
betareg_step_aic(
X,
Y,
direction = "both",
link = "logit",
link.phi = "log",
type = "ML",
trace = FALSE,
max_steps = NULL,
epsilon = 1e-08,
X_phi = NULL,
direction_phi = c("none", "both", "forward", "backward"),
weights = NULL
)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
direction |
Stepwise direction for the mean submodel: |
link |
Link for the mean submodel (passed to |
link.phi |
Link for precision parameter. Default |
type |
Likelihood type for |
trace |
Logical; print stepwise trace. |
max_steps |
Integer; maximum number of greedy steps (default |
epsilon |
Numeric; minimum improvement required to accept a move
(default |
X_phi |
Optional matrix of candidate predictors for the precision (phi)
submodel. When |
direction_phi |
Stepwise direction for the precision submodel.
Defaults to |
weights |
Optional non-negative observation weights passed to
|
Value
Named numeric vector of length p_mean + p_phi + 1 containing the
intercept, mean coefficients, phi-intercept (prefixed by "phi|"), and
phi coefficients (also prefixed by "phi|"). Non-selected variables have
coefficient 0.
See Also
Examples
set.seed(1)
X <- matrix(rnorm(200), 100, 2);
Y <- plogis(0.5 + X[,1]-X[,2]);
betareg_step_aic(X, Y)
Y <- rbeta(100, Y*20, (1-Y)*20)
betareg_step_aic(X, Y)
Stepwise Beta regression by AICc (finite-sample corrected AIC)
Description
Greedy forward/backward search minimizing AICc computed on betareg fits with
optional precision-submodel selection and observation weights.
Usage
betareg_step_aicc(
X,
Y,
direction = "both",
link = "logit",
link.phi = "log",
type = "ML",
trace = FALSE,
max_steps = NULL,
epsilon = 1e-08,
X_phi = NULL,
direction_phi = c("none", "both", "forward", "backward"),
weights = NULL
)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
direction |
Stepwise direction for the mean submodel: |
link |
Link for the mean submodel (passed to |
link.phi |
Link for precision parameter. Default |
type |
Likelihood type for |
trace |
Logical; print stepwise trace. |
max_steps |
Maximum number of greedy steps (default |
epsilon |
Minimal AICc improvement to accept a move. |
X_phi |
Optional matrix of candidate predictors for the precision (phi)
submodel. When |
direction_phi |
Stepwise direction for the precision submodel.
Defaults to |
weights |
Optional non-negative observation weights passed to
|
Value
See betareg_step_aic().
Examples
set.seed(1);
X <- matrix(rnorm(400), 100, 4);
Y <- plogis(X[,1]+0.5*X[,2])
betareg_step_aicc(X, Y)
Y <- rbeta(100, Y*25, (1-Y)*25);
betareg_step_aicc(X, Y)
Stepwise Beta regression by BIC
Description
Stepwise Beta regression by BIC
Usage
betareg_step_bic(
X,
Y,
direction = "both",
link = "logit",
link.phi = "log",
type = "ML",
trace = FALSE,
max_steps = NULL,
epsilon = 1e-08,
X_phi = NULL,
direction_phi = c("none", "both", "forward", "backward"),
weights = NULL
)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
direction |
Stepwise direction for the mean submodel: |
link |
Link for the mean submodel (passed to |
link.phi |
Link for precision parameter. Default |
type |
Likelihood type for |
trace |
Logical; print stepwise trace. |
max_steps |
Integer; maximum number of greedy steps (default |
epsilon |
Numeric; minimum improvement required to accept a move
(default |
X_phi |
Optional matrix of candidate predictors for the precision (phi)
submodel. When |
direction_phi |
Stepwise direction for the precision submodel.
Defaults to |
weights |
Optional non-negative observation weights passed to
|
Value
See betareg_step_aic().
Examples
set.seed(1); X <- matrix(rnorm(300), 100, 3);
Y <- plogis(X[,1]);
betareg_step_bic(X, Y)
Y <- rbeta(100, Y*30, (1-Y)*30)
betareg_step_bic(X, Y)
Bootstrap selection frequencies across selectors
Description
Bootstraps the dataset B times and records how often each variable is
selected by each selector. Observations containing NA in either X or Y
are removed prior to resampling. Column names are abbreviated internally and
mapped back to the originals in the output just like in
compare_selectors_single().
Usage
compare_selectors_bootstrap(X, Y, B = 50, include_enet = TRUE, seed = NULL)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
B |
Number of bootstrap replications. |
include_enet |
Logical; include ENet if |
seed |
Optional RNG seed. |
Value
Long data frame with columns selector, variable, freq in [0,1],
n_success, and n_fail. The freq column reports the share of bootstrap
replicates where a variable was selected by the corresponding selector.
Values near 1 signal high stability whereas small values indicate weak
evidence. n_success counts the successful fits contributing to the
frequency estimate (excluding failed replicates), while n_fail records the
number of unsuccessful fits. A "failures" attribute attached to the
returned data frame lists the replicate indices and messages for any
encountered errors.
Examples
set.seed(1)
X <- matrix(rnorm(300), 100, 3); Y <- plogis(X[, 1])
Y <- rbeta(100, Y * 30, (1 - Y) * 30)
freq <- compare_selectors_bootstrap(X, Y, B = 10, include_enet = FALSE)
head(freq)
subset(freq, freq > 0.8)
# Increase B until the reported frequencies stabilise. For example,
freq_big <- compare_selectors_bootstrap(X, Y, B = 200, include_enet = FALSE)
stats::aggregate(freq ~ selector, freq_big, summary)
Run all selectors once on a dataset
Description
Convenience wrapper that runs AIC/BIC/AICc stepwise, GAMLSS LASSO (and ENet
when available), and the pure glmnet IRLS selector, then collates coefficients
into a long table for comparison. Observations containing NA in either X
or Y are removed prior to fitting. Column names are temporarily shortened
to satisfy selector requirements and avoid clashes; the outputs remap them to
the original labels before returning so the reported variables always match
the input design.
Usage
compare_selectors_single(X, Y, include_enet = TRUE)
Arguments
X |
Numeric matrix (n × p) of mean-submodel predictors. |
Y |
Numeric response in (0,1). Values are squeezed to (0,1) internally. |
include_enet |
Logical; include ENet if |
Value
A list with:
- coefs
Named coefficient vectors for each selector.
- table
Long data frame with columns
selector,variable,coef,selected.
Examples
set.seed(1)
X <- matrix(rnorm(300), 100, 3); Y <- plogis(X[, 1])
Y <- rbeta(100, Y * 30, (1 - Y) * 30)
single <- compare_selectors_single(X, Y, include_enet = FALSE)
head(single$table)
Merge single-run results and bootstrap frequencies
Description
Merge single-run results and bootstrap frequencies
Usage
compare_table(single_tab, freq_tab = NULL)
Arguments
single_tab |
Data frame returned in |
freq_tab |
Optional frequency table from |
Value
Merged data frame.
Examples
single_tab <- data.frame(
selector = rep(c("AIC", "BIC"), each = 3),
variable = rep(paste0("x", 1:3), times = 2),
coef = c(0.5, 0, -0.2, 0.6, 0.1, -0.3)
)
single_tab$selected <- single_tab$coef != 0
freq_tab <- data.frame(
selector = rep(c("AIC", "BIC"), each = 3),
variable = rep(paste0("x", 1:3), times = 2),
freq = c(0.9, 0.15, 0.4, 0.85, 0.3, 0.25)
)
compare_table(single_tab, freq_tab)
Interval-response stability selection (fastboost variant)
Description
Repeats selection on interval-valued responses by sampling a pseudo-response
from each interval (uniformly or midpoint), tallying variable selection
frequencies across B replicates.
Usage
fastboost_interval(
X,
Y_low,
Y_high,
func,
B = 100,
sample = c("uniform", "midpoint"),
version = "glmnet",
use.parallel = FALSE,
seed = NULL,
...
)
Arguments
X |
Numeric matrix (n × p). |
Y_low, Y_high |
Interval bounds in [0,1]. Rows with missing bounds are dropped. |
func |
Function |
B |
Number of interval resamples. |
sample |
|
version |
Ignored (reserved for future). |
use.parallel |
Use |
seed |
Optional RNG seed. Scoped via |
... |
Extra args forwarded to |
Value
A list with:
- betas
B × (p+1)matrix of coefficients over replicates.- freq
Named vector of selection frequencies for each predictor.
Examples
# suppose you have interval data (Y_low, Y_high)
set.seed(1)
n <- 120; p <- 6
X <- matrix(rnorm(n*p), n, p); colnames(X) <- paste0("x",1:p)
mu <- plogis(X[,1] - 0.5*X[,2]); Y <- rbeta(n, mu*25, (1-mu)*25)
Y_low <- pmax(0, Y - 0.05); Y_high <- pmin(1, Y + 0.05)
fb <- fastboost_interval(X, Y_low, Y_high,
func = function(X,y) betareg_glmnet(X,y, choose="bic", prestandardize=TRUE),
B = 40)
sort(fb$freq, decreasing = TRUE)
Side-by-side coefficient heatmap
Description
Visual comparison of coefficients returned by each selector. Requires ggplot2.
Usage
plot_compare_coeff(single_tab)
Arguments
single_tab |
Data frame as returned by |
Value
A ggplot object when ggplot2 is available; otherwise draws a base R image.
Examples
demo_tab <- data.frame(
selector = rep(c("AIC", "BIC"), each = 3),
variable = rep(paste0("x", 1:3), times = 2),
coef = c(0.6, 0, -0.2, 0.55, 0.05, -0.3)
)
demo_tab$selected <- demo_tab$coef != 0
plot_compare_coeff(demo_tab)
Side-by-side selection-frequency heatmap
Description
Visual comparison of bootstrap selection frequencies by selector. Requires ggplot2.
Usage
plot_compare_freq(freq_tab)
Arguments
freq_tab |
Data frame as returned by |
Value
A ggplot object when ggplot2 is available; otherwise draws a base R image.
Examples
freq_tab <- data.frame(
selector = rep(c("AIC", "BIC"), each = 3),
variable = rep(paste0("x", 1:3), times = 2),
freq = c(0.85, 0.2, 0.45, 0.75, 0.35, 0.3)
)
plot_compare_freq(freq_tab)
Apply a selector to a collection of resampled designs
Description
Apply a selector to a collection of resampled designs
Usage
sb_apply_selector_manual(
X_norm,
resamples,
Y,
selector,
...,
keep_template = TRUE
)
Arguments
X_norm |
Normalised design matrix. |
resamples |
List of matrices returned by |
Y |
Numeric response. |
selector |
Variable-selection routine; function or character string. If it is a function, the selector name should be added as the fun.name attribute. |
... |
Extra arguments passed to the selector. |
keep_template |
Logical; when |
Value
A numeric matrix of coefficients with one column per resample.
SelectBoost for beta-regression models
Description
sb_beta() orchestrates all SelectBoost stages—normalisation, correlation
analysis, grouping, correlated resampling, and stability tallying—while using
the beta-regression selectors provided by this package. It can operate on
point-valued or interval-valued responses and automatically squeezes the
outcome into (0, 1) unless instructed otherwise.
Usage
sb_beta(
X,
Y = NULL,
selector = betareg_step_aic,
corrfunc = "cor",
B = 100,
step.num = 0.1,
steps.seq = NULL,
version = c("glmnet", "lars"),
squeeze = TRUE,
use.parallel = FALSE,
seed = NULL,
verbose = FALSE,
threshold = 1e-04,
interval = c("none", "uniform", "midpoint"),
Y_low = NULL,
Y_high = NULL,
...
)
Arguments
X |
Numeric design matrix. Coerced with |
Y |
Numeric response vector. Values are squeezed to the open unit
interval with the standard SelectBoost transformation unless |
selector |
Selection routine. Defaults to |
corrfunc |
Correlation function passed to |
B |
Number of replicates to generate. |
step.num |
Step length for the automatically generated |
steps.seq |
Optional user-supplied grid of absolute correlation thresholds. |
version |
Either |
squeeze |
Logical; ensure the response lies in |
use.parallel |
Logical; enable parallel resampling and selector fits when supported by the current R session. |
seed |
Optional integer seed for reproducibility. The seed is scoped via
|
verbose |
Logical; emit progress messages. |
threshold |
Numeric tolerance for considering a coefficient selected. |
interval |
Interval-resampling mode: |
Y_low, Y_high |
Interval bounds in |
... |
Additional arguments forwarded to |
Details
The returned object carries a rich set of attributes:
-
"c0.seq"– the grid of absolute-correlation thresholds explored during resampling. -
"steps.seq"– the raw sequence (if any) used to construct the grid. -
"selector"– the selector identifier (function name or expression). -
"B"– number of resampled designs passed to the selector. -
"interval"– the interval sampling mode ("none","uniform", or"midpoint"). -
"resample_diagnostics"– per-threshold data frames with summary statistics on the cached correlated draws.
These attributes mirror the historical SelectBoost beta implementation so the object can be consumed by existing plotting and reporting utilities.
Value
Matrix of selection frequencies with one row per c0 level and class
"sb_beta". See Details for the recorded attributes.
Examples
set.seed(42)
sim <- simulation_DATA.beta(n = 80, p = 4, s = 2)
# increase B for real applications
res <- sb_beta(sim$X, sim$Y, B = 5)
res
SelectBoost workflow for interval responses
Description
sb_beta_interval() forwards to sb_beta() while activating interval sampling
so that beta-regression SelectBoost runs can ingest lower/upper response
bounds directly. It mirrors fastboost_interval() but reuses the correlated
resampling pipeline of sb_beta().
Usage
sb_beta_interval(
X,
Y_low,
Y_high,
selector = betareg_step_aic,
sample = c("uniform", "midpoint"),
Y = NULL,
...
)
Arguments
X |
Numeric design matrix. Coerced with |
Y_low, Y_high |
Interval bounds in |
selector |
Selection routine. Defaults to |
sample |
Interval sampling scheme passed to the |
Y |
Optional point-valued response. Supply it when you wish to keep the
observed mean response but still resample within |
... |
Additional arguments forwarded to |
Value
See sb_beta(). The returned object carries the same
"sb_beta"-class attributes describing the correlation thresholds,
resampling diagnostics, selector, and number of replicates.
Examples
set.seed(1)
sim <- simulation_DATA.beta(n = 120, p = 5, s = 2)
y_low <- pmax(sim$Y - 0.05, 0)
y_high <- pmin(sim$Y + 0.05, 1)
interval_fit <- sb_beta_interval(
sim$X,
Y_low = y_low,
Y_high = y_high,
B = 5,
step.num = 0.4
)
attr(interval_fit, "interval")
User-friendly methods for sb_beta() results
Description
These S3 helpers make it easier to inspect and visualise the
correlation-threshold grid returned by sb_beta(). They surface the stored
attributes, reshape the selection frequencies into tidy summaries, and produce
quick ggplot2 visualisations for interactive use.
Usage
## S3 method for class 'sb_beta'
print(x, digits = 3, ...)
## S3 method for class 'sb_beta'
summary(object, ...)
## S3 method for class 'summary.sb_beta'
print(x, digits = 3, n = 10, ...)
autoplot.sb_beta(object, variables = NULL, ...)
Arguments
x, object |
An object of class |
digits |
Number of decimal places to display when printing. |
... |
Additional arguments passed on to lower-level methods. |
n |
Number of rows to show from the summary table when printing. |
variables |
Optional character vector of variables to retain in the plotted output. |
Value
summary.sb_beta() returns an object of class summary.sb_beta
containing a tidy data frame of selection frequencies. The plotting and
printing methods are invoked for their side effects and return the input
object invisibly.
Examples
set.seed(42)
sim <- simulation_DATA.beta(n = 50, p = 4, s = 2)
fit <- sb_beta(sim$X, sim$Y, B = 5, step.num = 0.5)
print(fit)
summary(fit)
if (requireNamespace("ggplot2", quietly = TRUE)) {
autoplot.sb_beta(fit)
}
Core helpers for SelectBoost-style beta regression
Description
These helpers expose the individual stages of the SelectBoost workflow so
that beta-regression selectors can be combined with correlation-aware
resampling directly from SelectBoost.beta. They normalise the design matrix,
derive correlation structures, form groups of correlated predictors, generate
Gaussian surrogates that mimic the observed dependency structure, and apply a
user-provided selector on each resampled design.
Usage
sb_normalize(X, center = NULL, scale = NULL, eps = 1e-08)
sb_compute_corr(X, corrfunc = "cor")
sb_group_variables(corr_mat, c0)
Arguments
X |
Numeric matrix of predictors. |
center |
Optional centering vector recycled to the number of columns.
Defaults to the column means of |
scale |
Optional scaling vector recycled to the number of columns.
Defaults to the column-wise |
eps |
Small positive constant used when normalising columns. |
corrfunc |
Function or character string used to compute pairwise
associations. Defaults to |
corr_mat |
Numeric matrix of associations. |
c0 |
Threshold applied to the absolute correlations. |
Value
sb_normalize() returns a centred, \ell_2-scaled copy of X.
sb_compute_corr() returns the association matrix.
sb_group_variables() returns a list of integer vectors, one per
variable, describing the correlated group it belongs to.
Examples
sb_normalize(matrix(rnorm(20), 5))
Generate correlated design replicates for a set of groups
Description
Generate correlated design replicates for a set of groups
Usage
sb_resample_groups(
X_norm,
groups,
B = 100,
jitter = 1e-06,
seed = NULL,
use.parallel = FALSE,
cache = NULL
)
Arguments
X_norm |
Normalised design matrix. |
groups |
Correlation structure. Either a list as returned by
|
B |
Number of replicates to generate. |
jitter |
Numeric value added to covariance diagonals for stability. |
seed |
Optional integer seed for reproducibility. The seed is scoped via
|
use.parallel |
Logical; when |
cache |
Optional environment or named list used to cache previously generated surrogates. Passing the same cache across calls reuses draws for identical groups. |
Details
When every group has size one (no correlated variables) the function
simply returns B copies of X_norm. A warning is issued in that situation
so downstream code can avoid mistaking the replicated designs for genuinely
resampled surrogates. The covariance matrices underpinning each correlated
draw are cached in the supplied cache environment; reusing the environment
across calls lets sb_resample_groups() skip redundant covariance
decompositions for identical groups and speeds up iterative workflows.
Value
An object of class sb_resamples, i.e. a list of length B whose
elements are resampled design matrices. The object exposes per-group
diagnostics in its "diagnostics" attribute and returns the cache via the
"cache" attribute for reuse.
Compute selection frequencies from coefficient paths
Description
Compute selection frequencies from coefficient paths
Usage
sb_selection_frequency(
coef_matrix,
version = c("glmnet", "lars"),
threshold = 1e-04
)
Arguments
coef_matrix |
Matrix produced by |
version |
Either |
threshold |
Coefficients with absolute value below this threshold are treated as zero. |
Value
Numeric vector of selection frequencies.
Simulate interval Beta-regression data (flexible)
Description
Simulate interval Beta-regression data (flexible)
Usage
simulation_DATA.beta(
n,
p,
s = min(5L, p),
beta_size = 1,
a0 = 0,
X_dist = c("gaussian", "t", "bernoulli"),
corr = c("indep", "ar1", "block"),
rho = 0,
block_size = 5L,
df = 5,
prob = 0.5,
active_idx = NULL,
phi = 20,
mechanism = c("jitter", "quantile", "mixed"),
mix_prob = 0.5,
delta = 0.05,
delta_low = NULL,
delta_high = NULL,
alpha = 0.1,
alpha_low = NULL,
alpha_high = NULL,
na_rate = 0,
na_side = c("random", "left", "right"),
centerX = FALSE,
scaleX = FALSE,
seed = NULL
)
Arguments
n, p |
Sample size and number of predictors. |
s |
Number of active (nonzero) coefficients. |
beta_size |
Scalar (alternating ±) or numeric vector of length greater then equal s. |
a0 |
Intercept (logit scale). |
X_dist |
Distribution for X: |
corr |
Correlation structure: |
rho |
AR(1) correlation or within-block correlation. |
block_size |
Block size when |
df |
Degrees of freedom for |
prob |
Success prob for |
active_idx |
Optional integer vector of active feature indices (length s). If NULL, uses 1:s. |
phi |
Precision parameter: scalar, length-n vector, or function |
mechanism |
Interval mechanism per row: |
mix_prob |
Probability of jitter when |
delta |
Symmetric jitter half-width (scalar / vector / function). |
delta_low, delta_high |
Asymmetric jitter widths (override |
alpha |
Miscoverage for quantile intervals (scalar / vector / function). |
alpha_low, alpha_high |
Asymmetric miscoverage (override |
na_rate |
Fraction of rows with a missing bound (default 0). |
na_side |
Which bound to drop: |
centerX, scaleX |
Whether to center/scale X before returning. |
seed |
RNG seed. |
Value
list with X, Y, Y_low, Y_high, mu, beta, a0, phi, info, settings.