The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
R interface to the weightederm Python package — changepoint regression via Weighted Empirical Risk Minimization (WERM).
weightederm detects changepoints in ordered regression
data by minimizing a Weighted Empirical Risk (WERM). The method is
described in:
Inferring Change Points in Regression via Sample Weighting by Gabriel Arpino and Ramji Venkataramanan.
Six estimators are available:
| Loss | Fixed num_chgpts |
Unknown num_chgpts via CV |
|---|---|---|
| Least squares | werm_least_squares() |
werm_least_squares_cv() |
| Huber | werm_huber() |
werm_huber_cv() |
| Logistic | werm_logistic() |
werm_logistic_cv() |
This package calls Python via reticulate. You need:
weightederm Python package:pip install git+https://github.com/gabrielarpino/weightederm.gitinstall.packages("weightederm")# install.packages("pak")
pak::pak("gabrielarpino/weightederm-r")Or with remotes:
# install.packages("remotes")
remotes::install_github("gabrielarpino/weightederm-r")By default, reticulate auto-discovers Python. If it
picks the wrong one, call weightederm_configure_python()
before your first model fit:
# Option 1 — point to a specific Python binary
weightederm_configure_python(python = "/path/to/your/venv/bin/python")
# Option 2 — point to the Python binary inside a conda environment
weightederm_configure_python("/path/to/miniconda/envs/my_conda_env/bin/python")You can also set the RETICULATE_PYTHON environment
variable in your .Renviron file to avoid calling this every
session:
RETICULATE_PYTHON=/path/to/your/venv/bin/python
library(weightederm)
set.seed(0)
n <- 120L; p <- 20L; true_cp <- 60L
X <- matrix(rnorm(n * p), n, p)
beta_left <- rep(0, p); beta_left[c(1, 4)] <- c( 2.0, -1.5)
beta_right <- rep(0, p); beta_right[c(1, 4)] <- c(-1.0, 2.5)
y <- numeric(n)
y[1:true_cp] <- X[1:true_cp, ] %*% beta_left + rnorm(true_cp, sd = 0.2)
y[(true_cp+1):n] <- X[(true_cp+1):n, ] %*% beta_right + rnorm(n - true_cp, sd = 0.2)
fit <- werm_least_squares(X, y, num_chgpts = 1L, delta = 5L,
search_method = "efficient", fit_intercept = FALSE)
fit
#> WERM Changepoint Estimator (werm_least_squares)
#> Changepoints (1-indexed): 60
#> num_chgpts: 1 | num_signals: 2 | n_features_in: 20
#> Objective: ...set.seed(1)
n <- 180L; p <- 10L
X <- matrix(rnorm(n * p), n, p)
beta_1 <- rep(0, p); beta_1[1] <- 3.5
beta_2 <- rep(0, p); beta_2[1:2] <- c(-3.0, 3.0)
beta_3 <- rep(0, p); beta_3[1:3] <- c(2.5, -2.5, 2.5)
y <- numeric(n)
y[1:60] <- X[1:60, ] %*% beta_1 + rnorm(60, sd = 0.05)
y[61:120] <- X[61:120, ] %*% beta_2 + rnorm(60, sd = 0.05)
y[121:180] <- X[121:180, ] %*% beta_3 + rnorm(60, sd = 0.05)
fit <- werm_least_squares_cv(X, y, max_num_chgpts = 2L, delta = 5L, cv = 3L,
search_method = "efficient", fit_intercept = FALSE)
fit$best_num_chgpts # 2
fit$changepoints # near c(60, 120)
fit$cv_results # data.frame of num_chgpts vs mean_test_scoreset.seed(2)
n <- 160L; p <- 12L; true_cp <- 80L
X <- matrix(rnorm(n * p), n, p)
beta_left <- rep(0, p); beta_left[c(1, 3)] <- c( 2.5, -2.0)
beta_right <- rep(0, p); beta_right[c(1, 3)] <- c(-2.5, 2.0)
eta <- numeric(n)
eta[1:true_cp] <- X[1:true_cp, ] %*% beta_left
eta[(true_cp+1):n] <- X[(true_cp+1):n, ] %*% beta_right
y <- rbinom(n, 1L, 1 / (1 + exp(-eta)))
fit <- werm_logistic(X, y, num_chgpts = 1L, delta = 5L,
search_method = "efficient", fit_intercept = FALSE,
max_iter = 300L)
fit$changepoints # near 80
fit$classes # c("0", "1")
# Predict probabilities on new data
X_new <- matrix(rnorm(10 * p), 10, p)
predict(fit, X_new, type = "prob") # 10 × 2 matrix
predict(fit, X_new, type = "class") # character vector of labels| Argument | Description | Default |
|---|---|---|
num_chgpts |
Number of changepoints (fixed estimators) | required |
max_num_chgpts |
Upper bound for CV search (CV estimators) | required |
delta |
Minimum gap between changepoints during search | 1L |
search_method |
"efficient" (greedy + local refinement) or
"brute_force" |
"efficient" |
fit_intercept |
Include per-segment intercept | TRUE |
cv |
Number of interleaved CV folds (CV estimators) | 5L |
penalty |
"none", "l1", or "l2" |
"none" ("l2" for logistic) |
alpha |
Penalty strength | 0.0 |
epsilon |
Huber transition parameter | 1.35 |
max_iter |
Optimizer iterations (Huber / logistic) | 100L |
Every werm_*() call returns a named list with class
c("werm_<type>", "werm_fit"):
| Element | Description |
|---|---|
changepoints |
Integer vector, 1-indexed (R convention) |
num_chgpts |
Number of detected changepoints |
num_signals |
num_chgpts + 1 |
objective |
Minimised WERM value |
signal_coefs |
(num_signals × p) Stage-1 WERM coefficient matrix |
signal_intercepts |
Numeric vector or NULL |
last_segment_coef |
Coefficients used by predict() |
last_segment_intercept |
Numeric or NULL |
n_features_in |
Number of input features |
classes |
(logistic only) Character vector of length 2 |
CV estimators additionally expose best_num_chgpts,
best_score, cv_results (a data frame),
num_chgpts_grid, segment_bounds,
segment_coefs, and segment_intercepts.
print(fit) # compact summary
summary(fit) # print + coefficient table
coef(fit) # last_segment_coef vector
predict(fit, X_new) # numeric predictions (regression) or class labels / probabilities (logistic)Apache License 2.0 — the same license as the underlying Python package.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.