The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
combss solves the best subset selection problem in
generalised linear models by reformulating it as a continuous
optimisation over the hypercube \([0,
1]^p\) and applying a Frank-Wolfe homotopy algorithm. The inner
ridge problem is solved with glmnet. Three families are
supported:
family = "gaussian" (alias "linear") —
linear regressionfamily = "binomial" — binary logistic regressionfamily = "multinomial" — multinomial logistic
regressionA simulated dataset with \(n = 300\) observations and \(p = 30\) predictors, of which only the first five are truly active.
set.seed(102)
n <- 300; p <- 30
beta <- c(3, 2, 1.5, 1, 0.5, rep(0, p - 5))
x <- matrix(rnorm(n * p), n, p)
y <- as.numeric(x %*% beta + rnorm(n) * 0.5)
itr <- 1:200; iva <- 201:300
fit <- combss(x[itr, ], y[itr],
x_val = x[iva, ], y_val = y[iva],
family = "gaussian", q = 10)
fit
#> COMBSS fit
#> family: gaussian
#> n, p: 200, 30
#> q: 10
#> lam_ridge: 0
#> best k: 5
#> mse: 0.22222
#> subset: 1, 2, 3, 4, 5fit$subset_list contains the selected feature set for
each k = 1, ..., q:
fit$subset_list[1:8]
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2
#>
#> [[3]]
#> [1] 1 2 3
#>
#> [[4]]
#> [1] 1 2 3 4
#>
#> [[5]]
#> [1] 1 2 3 4 5
#>
#> [[6]]
#> [1] 1 2 3 4 5 6
#>
#> [[7]]
#> [1] 1 2 3 4 5 6 21
#>
#> [[8]]
#> [1] 1 2 3 4 5 6 10 21fit$mse_path gives the validation MSE at each
k, and fit$best_k the size with smallest
MSE.
plot(seq_along(fit$mse_path), fit$mse_path, type = "b",
xlab = "k", ylab = "Validation MSE")
abline(v = fit$best_k, lty = 2)ymulti <- cut(as.numeric(x %*% beta),
breaks = c(-Inf, -1, 1, Inf),
labels = c("a", "b", "c"))
fit_mn <- combss(x[itr, ], ymulti[itr],
x_val = x[iva, ], y_val = ymulti[iva],
family = "multinomial", q = 10)
fit_mn
#> COMBSS fit
#> family: multinomial
#> n, p: 200, 30
#> q: 10
#> lam_ridge: 0
#> best k: 6
#> accuracy: 0.87
#> subset: 1, 2, 3, 4, 5, 12For each candidate ridge penalty lam_ridge,
combss_cv() runs COMBSS and evaluates LOOCV error on the
chosen subset.
predict() refits on the chosen subset of the original
training data and predicts on newx.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.