Getting Started with ml

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

library(ml)

Overview

The ml package implements the split-fit-evaluate-assess workflow from Hastie, Tibshirani, and Friedman (2009), Chapter 7. The key idea: keep a held-out test set sacred until you are done experimenting, then assess once.

Formula interfaces are not supported. Pass the data frame and target column name as a string: ml_fit(data, "target", seed = 42).

Step 1: Profile your data

Before modeling, understand what you have:

prof <- ml_profile(iris, "Species")
prof

Step 2: Split into train/valid/test

Three-way split (60/20/20). Stratified by default for classification.

s <- ml_split(iris, "Species", seed = 42)
s

Access partitions with $train, $valid, $test. The $dev property combines train and valid for final retraining.

Step 3: Screen algorithms

Find candidates quickly before tuning:

lb <- ml_screen(s, "Species", seed = 42)
lb

Step 4: Fit and evaluate

Iterate freely on the validation set:

model <- ml_fit(s$train, "Species", algorithm = "logistic", seed = 42)
model

metrics <- ml_evaluate(model, s$valid)
metrics

Step 5: Explain feature importance

exp <- ml_explain(model)
exp

Step 6: Validate against rules

Gate your model before final assessment:

gate <- ml_validate(model,
                    test  = s$test,
                    rules = list(accuracy = ">0.70"))
gate

Step 7: Assess on test data (once)

The final exam. Call this only when done experimenting.

verdict <- ml_assess(model, test = s$test)
verdict

Step 8: Save and load

path <- file.path(tempdir(), "iris_model.mlr")
ml_save(model, path)
loaded <- ml_load(path)
predict(loaded, s$valid)[1:5]

Module-style interface

All functions are also available via the ml$verb() pattern, which mirrors Python’s import ml; ml.fit(...):

# Identical results — pick the style you prefer
m2 <- ml$fit(s$train, "Species", algorithm = "logistic", seed = 42)
identical(predict(model, s$valid), predict(m2, s$valid))

Regression example

The same workflow applies to regression:

s2   <- ml_split(mtcars, "mpg", seed = 42)
m_rf <- ml_fit(s2$train, "mpg", seed = 42)
ml_evaluate(m_rf, s2$valid)

Available algorithms

ml_algorithms()

Algorithm	Classification	Regression	Package
“logistic”	yes	–	base R (‘nnet’)
“xgboost”	yes	yes	‘xgboost’
“random_forest”	yes	yes	‘ranger’
“linear” (Ridge)	–	yes	‘glmnet’
“elastic_net”	–	yes	‘glmnet’
“svm”	yes	yes	‘e1071’
“knn”	yes	yes	‘kknn’
“naive_bayes”	yes	–	‘naivebayes’

LightGBM support is planned for v1.1.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.