The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

leakR

Welcome to leakR, an R package designed to help researchers, data scientists, and machine learning practitioners rigorously detect and diagnose data leakage in their workflows.

Data leakage is a pervasive yet often overlooked issue that undermines the integrity and reproducibility of predictive models by allowing unintended information to “leak” between training and testing phases. leakR provides a modular, extensible toolkit for detecting the most common and impactful forms of leakage, starting with tabular data contamination, target leakage, and temporal misalignments, while laying the foundation for a universal leakage detection framework across diverse data domains.

Installation

install.packages("leakr")

From GitHub (Development Version)

For the latest features and bug fixes:

# Install devtools if you don't have it
install.packages("devtools")

# Install leakR from GitHub
devtools::install_github("cherylisabella/leakR")

Quick Start

library(leakr)

# Basic audit of your dataset
report <- leakr_audit(iris, target = "Species")

# View summary of issues found
leakr_summarise(report)

# Generate diagnostic visualizations
leakr_plot(report)

# Access detailed results
print(report)

Main Functions

Function Purpose
leakr_audit() Main auditing function - detects leakage across your dataset
leakr_summarise() Generate human-readable summaries of detected issues
leakr_plot() Create diagnostic visualizations highlighting problems
leakr_from_caret() Import and audit caret workflow objects
leakr_from_tidymodels() Import and audit tidymodels workflow objects
leakr_from_mlr3() Import and audit mlr3 workflow objects

Learn More

Get started with the comprehensive vignettes:

# Getting started guide
vignette("getting-started", package = "leakr")

# Advanced detection techniques
vignette("advanced-detection", package = "leakr") 

# Framework integration examples
vignette("framework-integration", package = "leakr")

Why leakR?

What leakR Detects

Key Features

Development Roadmap

Citation

If you use leakR in your research, please cite:

@Manual{leakr2025,
  title = {leakR: Data Leakage Detection Tools for Machine Learning},
  author = {Cheryl Isabella Lim},
  year = {2025},
  note = {R package version 0.1.0},
  url = {https://github.com/cherylisabella/leakR},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

leakR is currently under development. Feedback and contributions are welcome from the community!

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.