The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
A grammar of machine learning workflows for R.
Split, fit, evaluate, assess — four verbs that encode the workflow
from Hastie, Tibshirani & Friedman (The Elements of Statistical
Learning, Ch. 7). The evaluate/assess boundary makes data leakage
inexpressible: ml_evaluate() runs on validation data and
can be called freely; ml_assess() runs on held-out test
data and locks after one use.
# Install from GitHub (current)
remotes::install_github("epagogy/ml", subdir = "r")
# install.packages("ml")
# CRAN submission is under review — the line above will work once accepted.R >= 4.1.0. Optional backends: ‘xgboost’, ‘ranger’, ‘glmnet’, ‘kknn’, ‘e1071’, ‘naivebayes’, ‘rpart’.
library(ml)
s <- ml_split(iris, "Species", seed = 42)
model <- ml_fit(s$train, "Species", seed = 42)
ml_evaluate(model, s$valid) # check performance, tweak, repeat
final <- ml_fit(s$dev, "Species", seed = 42)
ml_assess(final, test = s$test) # final exam — second call errorss$dev is train + valid combined, used for the final
refit before assessment. This three-way split (train 60 / valid 20 /
test 20) with a .dev convenience accessor follows the
textbook protocol exactly.
ml_split() |
Stratified three-way split → $train,
$valid, $test, $dev |
ml_fit() |
Train a model (per-fold preprocessing, deterministic seeding) |
ml_evaluate() |
Validation metrics — repeat freely |
ml_assess() |
Test metrics — once, final, locks after use |
These four are the grammar. Everything else extends it:
ml_screen() |
Algorithm leaderboard |
ml_tune() |
Hyperparameter search |
ml_stack() |
OOF ensemble stacking |
ml_predict() |
Class labels or probabilities |
ml_explain() |
Feature importance |
ml_compare() |
Side-by-side model comparison |
ml_validate() |
Pass/fail deployment gate |
ml_drift() |
Distribution shift detection (KS, chi-squared) |
ml_calibrate() |
Probability calibration (Platt, isotonic) |
ml_profile() |
Dataset summary |
ml_save() / ml_load() |
Serialize to .mlr |
13 families. engine = "auto" uses the Rust backend when
available; engine = "r" forces the R package backend.
| Algorithm | String | Clf | Reg | Backend |
|---|---|---|---|---|
| Logistic | "logistic" |
Y | nnet | |
| Decision Tree | "decision_tree" |
Y | Y | rpart |
| Random Forest | "random_forest" |
Y | Y | ranger |
| Extra Trees | "extra_trees" |
Y | Y | Rust |
| Gradient Boosting | "gradient_boosting" |
Y | Y | Rust |
| XGBoost | "xgboost" |
Y | Y | xgboost |
| Ridge | "linear" |
Y | glmnet | |
| Elastic Net | "elastic_net" |
Y | glmnet | |
| SVM | "svm" |
Y | Y | e1071 |
| KNN | "knn" |
Y | Y | kknn |
| Naive Bayes | "naive_bayes" |
Y | naivebayes | |
| AdaBoost | "adaboost" |
Y | Rust | |
| Hist. Gradient Boosting | "histgradient" |
Y | Y | Rust |
Seeds. seed = NULL auto-generates a
seed and stores it on the result for reproducibility.
seed = 42 gives full deterministic control.
Per-fold preprocessing. Scaling and encoding fit on training folds only, never on validation or test. No information leaks across the split boundary.
Error messages. Wrong column name?
ml_fit() tells you what columns exist. Wrong algorithm
string? It lists the valid ones. Errors aim to fix themselves.
Roth, S. (2026). A Grammar of Machine Learning Workflows.
doi:10.5281/zenodo.19023838
MIT. Simon Roth, 2026.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.