The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

detector

Build Status Build status codecov.io CRAN_Status_Badge Downloads from the RStudio CRAN mirror Project Status: Active - The project has reached a stable, usable state and is being actively developed.

detector makes detecting data containing Personally Identifiable Information (PII) quick, easy, and scalable. It provides high-level functions that can take vectors and data.frames and return important summary statistics in a convenient data.frame. Once complete, detector will be able to detect the following types of PII:

State of the Union

Complete!

Needs more work…

Haven’t even started :(

Installation

You can install:

If you encounter a clear bug, please file a minimal reproducible example on github.

API

Generate data containing fake PII

library(dplyr, warn.conflicts = FALSE)
library(generator)
n <- 6
ashley_madison <- 
  data.frame(name = r_full_names(n), 
             email = r_email_addresses(n), 
             phone_number = r_phone_numbers(n, use_hyphens = TRUE, 
                                            use_spaces = TRUE), 
             stringsAsFactors = FALSE)
ashley_madison %>% 
  knitr::kable(format = "markdown")
name email phone_number
Leonardo Rodriguez 254- 851- 6814
Dee Rice 597- 978- 5193
Conception Marquardt 184- 962- 8153
Collette Nitzsche 475- 723- 2947
Norman Pfannerstill 153- 674- 4219
Katelin Gislason 831- 847- 1568

Detect data containing PII

library(detector)
ashley_madison %>% 
  detect %>% 
  knitr::kable(format = "markdown")
column_name has_email_addresses has_phone_numbers has_national_identification_numbers
name FALSE FALSE FALSE
email TRUE FALSE FALSE
phone_number FALSE TRUE FALSE

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.