The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

scR

scR estimates empirical sample complexity bounds for supervised learning tasks. The core workflow is:

  1. estimate resampled generalization curves with estimate_accuracy();
  2. fit/extrapolate those curves with interpolate_scb(); and
  3. summarize or plot the estimated sample complexity bound.

Basic use

library(scR)

mylogit <- function(formula, data) {
  structure(
    glm(formula = formula, data = data, family = binomial(link = "logit")),
    class = c("svrclass", "glm")
  )
}

mypred <- function(m, newdata) {
  p <- predict.glm(m, newdata, type = "response")
  factor(ifelse(p > 0.5, 1, 0), levels = c("0", "1"))
}

# In applied work, pass your observed data instead of generating synthetic data.
dat <- gendata(mylogit, dim = 3, maxn = 250, predictfn = mypred)

results <- estimate_accuracy(
  y ~ .,
  mylogit,
  data = dat,
  predictfn = mypred,
  nsample = 10,
  steps = 25,
  parallel = FALSE,
  backend = "sequential"
)

scbhat <- interpolate_scb(
  list(results),
  epsilon = 0.05,
  delta = 0.05,
  maxN = nrow(dat)
)

summary(scbhat)
plot(scbhat, list(results), plot_type = "Delta")

Optional monotone Gaussian process extrapolation

The package also includes the monotone-integrated Gaussian process extrapolator used in the paper appendix. This is an optional nonparametric robustness check. It requires a working CmdStan installation plus the cmdstanr and posterior packages. These are not hard dependencies of scR, so the core package can be installed and checked without a Stan toolchain.

# Requires cmdstanr, posterior, and CmdStan.
gp_delta <- interpolate_scb_gp(
  results,
  epsilon = 0.05,
  delta = 0.05,
  maxN = nrow(dat),
  curve = "delta",
  M_grid = 80
)

summary(gp_delta)
plot(gp_delta, plot_type = "Delta")

The GP implementation uses the paper’s monotone-integrated construction: a Gaussian process is placed on an unconstrained latent field, a softplus transform produces a nonnegative derivative, the derivative is integrated on a fixed grid, and the resulting latent curve is mapped to either the delta or epsilon mean curve.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.