The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Getting Started with predictset

What is conformal prediction?

Machine learning models produce point predictions - a single number for regression, a single class for classification. But in practice, you need to know how uncertain that prediction is.

Conformal prediction wraps any model in a layer of calibrated uncertainty quantification. Given a target coverage level (say 90%), it produces:

The key property: this guarantee holds in finite samples, without distributional assumptions. The only requirement is that calibration and test data are exchangeable (roughly: drawn from the same distribution).

predictset implements the main conformal methods from the recent literature in a lightweight, model-agnostic R package with only two dependencies (cli and stats).

Quick start: regression

Split conformal prediction with a linear model using formula shorthand:

library(predictset)

set.seed(42)
n <- 500
x <- matrix(rnorm(n * 5), ncol = 5)
y <- x[, 1] * 2 + x[, 2] + rnorm(n)
x_new <- matrix(rnorm(100 * 5), ncol = 5)

result <- conformal_split(x, y, model = y ~ ., x_new = x_new, alpha = 0.10)
print(result)
#> 
#> ── Conformal Prediction Intervals (Split Conformal) ────────────────────────────
#> • Coverage target: "90%"
#> • Training: 250 | Calibration: 250 | Predictions: 100
#> • Conformal quantile: 1.876
#> • Median interval width: 3.7519

The alpha parameter controls the miscoverage rate: alpha = 0.10 targets 90% coverage.

Model interface

There are three ways to specify a model:

1. Formula shorthand (fits lm internally):

result <- conformal_split(x, y, model = y ~ ., x_new = x_new)

2. Fitted model (auto-detected for lm, glm, ranger):

fit <- lm(y ~ ., data = data.frame(y = y, x))
result <- conformal_split(x, y, model = fit, x_new = x_new)

3. Custom model via make_model() (works with anything):

rf <- make_model(
  train_fun = function(x, y) {
    ranger::ranger(y ~ ., data = data.frame(y = y, x))
  },
  predict_fun = function(object, x_new) {
    predict(object, data = as.data.frame(x_new))$predictions
  },
  type = "regression"
)

result <- conformal_split(x, y, model = rf, x_new = x_new)

Classification with Adaptive Prediction Sets

APS produces prediction sets that adapt to the difficulty of each observation. Easy cases get small sets (often a single class), while ambiguous cases get larger sets.

set.seed(42)
n <- 600
x <- matrix(rnorm(n * 4), ncol = 4)
y <- factor(ifelse(x[, 1] > 0.5, "A", ifelse(x[, 2] > 0, "B", "C")))
x_new <- matrix(rnorm(100 * 4), ncol = 4)

clf <- make_model(
  train_fun = function(x, y) {
    ranger::ranger(y ~ ., data = data.frame(y = y, x), probability = TRUE)
  },
  predict_fun = function(object, x_new) {
    predict(object, data = as.data.frame(x_new))$predictions
  },
  type = "classification"
)

result <- conformal_aps(x, y, model = clf, x_new = x_new, alpha = 0.10)
print(result)

# Most predictions are a single class; ambiguous ones include 2-3
table(set_size(result))

Mondrian conformal: group-conditional coverage

Standard conformal guarantees marginal coverage (across all test points), but coverage can vary across subgroups. Mondrian conformal computes a separate quantile for each group, guaranteeing coverage within each subgroup. This is critical for fairness and regulatory compliance.

set.seed(42)
n <- 600
x <- matrix(rnorm(n * 3), ncol = 3)
groups <- factor(ifelse(x[, 1] > 0, "high", "low"))
y <- x[, 1] * 2 + ifelse(groups == "high", 3, 0.5) * rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
groups_new <- factor(ifelse(x_new[, 1] > 0, "high", "low"))

result <- conformal_mondrian(x, y, model = y ~ ., x_new = x_new,
                              groups = groups, groups_new = groups_new)

# Check per-group coverage
y_new <- x_new[, 1] * 2 + ifelse(groups_new == "high", 3, 0.5) * rnorm(200)
coverage_by_group(result, y_new, groups_new)
#>   group  coverage   n target
#> 1  high 0.8672566 113    0.9
#> 2   low 0.8850575  87    0.9

Diagnostics

After producing predictions, use diagnostic functions to evaluate calibration and efficiency:

# Empirical coverage (should be close to 1 - alpha)
coverage(result, y_test)

# Average interval width (regression) - narrower is better
mean(interval_width(result))

# Prediction set sizes (classification) - smaller is better
table(set_size(result))

# Coverage within subgroups (fairness check)
coverage_by_group(result, y_test, groups = groups_test)

# Coverage by prediction quantile bin
coverage_by_bin(result, y_test, bins = 5)

# Conformal p-values for outlier detection
pvals <- conformal_pvalue(result$scores, new_scores)

Comparing methods

Benchmark multiple conformal methods side-by-side with conformal_compare():

set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] * 2 + rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
y_new <- x_new[, 1] * 2 + rnorm(200)

comp <- conformal_compare(x, y, model = y ~ ., x_new = x_new, y_new = y_new,
                           methods = c("split", "cv", "jackknife"))
print(comp)
#> 
#> ── Conformal Method Comparison ─────────────────────────────────────────────────
#> 
#> • split: coverage = 0.89, mean width = 3.388, time = 0s
#> • cv: coverage = 0.89, mean width = 3.308, time = 0.019s
#> • jackknife: coverage = 0.89, mean width = 3.335, time = 15.386s

Choosing a method

Scenario Recommended method Why
Large dataset, speed matters conformal_split() Single model fit, fastest
Small dataset, need tight intervals conformal_cv() or conformal_jackknife()* Uses all data for training and calibration
Heteroscedastic data conformal_cqr() or conformal_split(..., score_type = "normalized") Adaptive interval widths
Multi-class classification conformal_aps() Adaptive set sizes, well-calibrated
Classification with many classes conformal_raps() Regularized APS, smaller sets
Coverage must hold per subgroup conformal_mondrian() / conformal_mondrian_class() Group-conditional guarantees
Covariate shift between train/test conformal_weighted() Importance-weighted calibration
Sequential/online prediction conformal_aci()** Adapts to distribution drift

*Jackknife+ and CV+ have a theoretical coverage guarantee of 1-2α (Barber et al. 2021), weaker than split conformal’s 1-α. In practice, coverage is typically near 1-α. **ACI provides asymptotic (not finite-sample) coverage guarantees.

Further reading

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.