The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Welcome to leakR, an R package designed to help researchers, data scientists, and machine learning practitioners rigorously detect and diagnose data leakage in their workflows.
Data leakage is a pervasive yet often overlooked issue that undermines the integrity and reproducibility of predictive models by allowing unintended information to “leak” between training and testing phases. leakR provides a modular, extensible toolkit for detecting the most common and impactful forms of leakage, starting with tabular data contamination, target leakage, and temporal misalignments, while laying the foundation for a universal leakage detection framework across diverse data domains.
install.packages("leakr")For the latest features and bug fixes:
# Install devtools if you don't have it
install.packages("devtools")
# Install leakR from GitHub
devtools::install_github("cherylisabella/leakR")library(leakr)
# Basic audit of your dataset
report <- leakr_audit(iris, target = "Species")
# View summary of issues found
leakr_summarise(report)
# Generate diagnostic visualizations
leakr_plot(report)
# Access detailed results
print(report)| Function | Purpose |
|---|---|
leakr_audit() |
Main auditing function - detects leakage across your dataset |
leakr_summarise() |
Generate human-readable summaries of detected issues |
leakr_plot() |
Create diagnostic visualizations highlighting problems |
leakr_from_caret() |
Import and audit caret workflow objects |
leakr_from_tidymodels() |
Import and audit tidymodels workflow objects |
leakr_from_mlr3() |
Import and audit mlr3 workflow objects |
Get started with the comprehensive vignettes:
# Getting started guide
vignette("getting-started", package = "leakr")
# Advanced detection techniques
vignette("advanced-detection", package = "leakr")
# Framework integration examples
vignette("framework-integration", package = "leakr")If you use leakR in your research, please cite:
@Manual{leakr2025,
title = {leakR: Data Leakage Detection Tools for Machine Learning},
author = {Cheryl Isabella Lim},
year = {2025},
note = {R package version 0.1.0},
url = {https://github.com/cherylisabella/leakR},
}
This project is licensed under the MIT License - see the LICENSE file for details.
leakR is currently under development. Feedback and contributions are welcome from the community!
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.