The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Automated data quality checks for recurring dataset deliveries.
For each new file arrival, dqcheckr runs a battery of
quality checks, compares the file to the previous delivery, writes a
self-contained HTML report, and records summary statistics in a local
SQLite database so that quality trends can be tracked over time.
Supports CSV and fixed-width formats. Custom organisation-specific
checks can be supplied as plain R files.
This is a CLI/API package — no UI. If you’d rather configure and run checks without writing R code, see dqcheckrGUI, a Shiny front-end built on top of this package.
install.packages("dqcheckr")
# or, the development version from GitHub
devtools::install_github("mickmioduszewski/dqcheckr")A data officer runs a single command for each arriving dataset:
library(dqcheckr)
run_dq_check("customer_accounts", config_dir = "path/to/configs")This prints a one-line console summary, writes an HTML report, and
returns list(status, report_path, snapshot_id)
invisibly.
Two YAML files control every run: a global dqcheckr.yml
(default thresholds shared across datasets) and a per-dataset
<dataset_name>.yml (file location, expected columns,
column-level rules and overrides).
See vignette("dqcheckr") for a full walkthrough of
configuration and the available checks, or the package
documentation site.
MIT
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.