The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
{SLmetrics} is a
lightweight R
package written in C++
and {Rcpp} for
memory-efficient and lightning-fast machine learning
performance evaluation; it’s like using a supercharged {yardstick} but
without the risk of soft to super-hard deprecations. {SLmetrics} covers both
regression and classification metrics and provides (almost) the same
array of metrics as {scikit-learn}
and {PyTorch} all
without {reticulate}
and the Python compile-run-(crash)-debug cylce.
Depending on the mood and alignment of planets {SLmetrics} stands for
Supervised Learning metrics, or Statistical Learning metrics. If {SLmetrics} catches on,
the latter will be the core philosophy and include unsupervised learning
metrics. If not, then it will remain a {pkg} for Supervised Learning
metrics, and a sandbox for me to develop my C++
skills.
Below you’ll find instructions to install {SLmetrics} and get started with your first metric, the Root Mean Squared Error (RMSE).
## install latest CRAN build
install.packages("SLmetrics")
Below is a minimal example demonstrating how to compute both unweighted and weighted RMSE.
library(SLmetrics)
<- c(10.2, 12.5, 14.1)
actual <- c(9.8, 11.5, 14.2)
predicted <- c(0.2, 0.5, 0.3)
weights
cat(
"Root Mean Squared Error", rmse(
actual = actual,
predicted = predicted,
),"Root Mean Squared Error (weighted)", weighted.rmse(
actual = actual,
predicted = predicted,
w = weights
),sep = "\n"
)#> Root Mean Squared Error
#> 0.6244998
#> Root Mean Squared Error (weighted)
#> 0.7314369
That’s all! Now you can explore the rest of this README for in-depth usage, performance comparisons, and more details about {SLmetrics}.
Machine learning can be a complicated task; the steps from feature engineering to model deployment require carefully measured actions and decisions. One low-hanging fruit to simplify this process is performance evaluation.
At its core, performance evaluation is essentially just comparing two vectors — a programmatically and, at times, mathematically trivial step in the machine learning pipeline, but one that can become complicated due to:
{SLmetrics} solves these issues by being:
C++
and {Rcpp}Performance evaluation should be plug-and-play and “just work” out of the box — there’s no need to worry about quasiquations, dependencies, deprecations, or variations of the same functions relative to their arguments when using {SLmetrics}.
One, obviously, can’t build an R
-package on
C++
and {Rcpp} without a proper
pissing contest at the urinals - below is a comparison in execution time
and memory efficiency of two simple cases that any {pkg} should be able
to handle gracefully; computing a 2 x 2 confusion matrix and computing
the RMSE1.
As shown in the chart, {SLmetrics} maintains consistently low(er) execution times across different sample sizes.
Below are the results for garbage collections and total memory allocations when computing a 2×2 confusion matrix (N = 1e7) and RMSE (N = 1e7) 2. Notice that {SLmetrics} requires no GC calls for these operations.
Iterations | Garbage Collections [gc()] | gc() pr. second | Memory Allocation (MB) | |
---|---|---|---|---|
{SLmetrics} | 100 | 0 | 0.00 | 0 |
{yardstick} | 100 | 190 | 4.44 | 381 |
{MLmetrics} | 100 | 186 | 4.50 | 381 |
{mlr3measures} | 100 | 371 | 3.93 | 916 |
2 x 2 Confusion Matrix (N = 1e7)
Iterations | Garbage Collections [gc()] | gc() pr. second | Memory Allocation (MB) | |
---|---|---|---|---|
{SLmetrics} | 100 | 0 | 0.00 | 0 |
{yardstick} | 100 | 149 | 4.30 | 420 |
{MLmetrics} | 100 | 15 | 2.00 | 76 |
{mlr3measures} | 100 | 12 | 1.29 | 76 |
RMSE (N = 1e7)
In both tasks, {SLmetrics} remains extremely memory-efficient, even at large sample sizes.
[!IMPORTANT]
From {bench} documentation: Total amount of memory allocated by R while running the expression. Memory allocated outside the R heap, e.g. by
malloc()
or new directly is not tracked, take care to avoid misinterpreting the results if running code that may do this.
In its simplest form, {SLmetrics}-functions
work directly with pairs of <numeric>
vectors (for
regression) or <factor>
vectors (for classification).
Below we demonstrate this on two well-known datasets,
mtcars
(regression) and iris
(classification).
We first fit a linear model to predict mpg
in the
mtcars
dataset, then compute the in-sample RMSE:
## Evaluate a linear model on mpg (mtcars)
<- lm(mpg ~ ., data = mtcars)
model rmse(mtcars$mpg, fitted(model))
#> [1] 2.146905
Now we recode the iris
dataset into a binary problem
(“virginica” vs. “others”) and fit a logistic regression. Then we
generate predicted classes, compute the confusion matrix and summarize
it.
## 1) recode iris
## to binary problem
$species_num <- as.numeric(
iris$Species == "virginica"
iris
)
## 2) fit the logistic
## regression
<- glm(
model formula = species_num ~ Sepal.Length + Sepal.Width,
data = iris,
family = binomial(
link = "logit"
)
)
## 3) generate predicted
## classes
<- factor(
predicted as.numeric(
predict(model, type = "response") > 0.5
),levels = c(1,0),
labels = c("Virginica", "Others")
)
## 4) generate actual
## values as factor
<- factor(
actual x = iris$species_num,
levels = c(1,0),
labels = c("Virginica", "Others")
)
## 4) generate
## confusion matrix
summary(
<- cmatrix(
confusion_matrix actual = actual,
predicted = predicted
)
)#> Confusion Matrix (2 x 2)
#> ================================================================================
#> Virginica Others
#> Virginica 35 15
#> Others 14 86
#> ================================================================================
#> Overall Statistics (micro average)
#> - Accuracy: 0.81
#> - Balanced Accuracy: 0.78
#> - Sensitivity: 0.81
#> - Specificity: 0.81
#> - Precision: 0.81
[!IMPORTANT]
OpenMP support in {SLmetrics} is experimental. Use it with caution, as performance gains and stability may vary based on your system configuration and workload.
You can control OpenMP usage within {SLmetrics} using the setUseOpenMP function. Below are examples demonstrating how to enable and disable OpenMP:
## enable OpenMP
::openmp.on()
SLmetrics#> OpenMP enabled!
## disable OpenMP
::openmp.off()
SLmetrics#> OpenMP disabled!
To illustrate the impact of OpenMP on performance, consider the following benchmarks for calculating entropy on a 1,000,000 x 200 matrix over 100 iterations3.
Iterations | Runtime (sec) | Garbage Collections [gc()] | gc() pr. second | Memory Allocation (MB) |
---|---|---|---|---|
100 | 0.86 | 0 | 0 | 0 |
1e6 x 200 matrix without OpenMP
Iterations | Runtime (sec) | Garbage Collections [gc()] | gc() pr. second | Memory Allocation (MB) |
---|---|---|---|---|
100 | 0.15 | 0 | 0 | 0 |
1e6 x 200 matrix with OpenMP
## install github release
::pak(
pakpkg = "serkor1/SLmetrics@*release",
ask = FALSE
)
## install nightly build
::pak(
pakpkg = "serkor1/SLmetrics",
ask = FALSE
)
Please note that the {SLmetrics} project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.