The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Out of the box, deident features a set of
transformations to aid in the de-identification of data sets. Each
transformation is implemented via R6Class and extends
BaseDeident. User defined transformations can be
implemented in a similar manner.
To demonstrate the different transformation we supply a toy data set,
df, comprising 26 observations of three variables:
X if B <= 13,
Y if B > 13Apply a cached random replacement cipher. Re-occurrence of the same key will receive the same hash.
Implemented deident options:
deident(df, "psudonymize", A)
deident(df, "Pseudonymizer", A)
deident(df, Pseudonymizer, A)
deident(df, Pseudonymizer$new(), A)
psu <- Pseudonymizer$new()
deident(df, psu, A)By default Pseudonymizer replaces values in variables
with a random alpha-numeric string of 5 characters. This can be replaced
via calling set_method on an instantiated Pseudonymizer
with the desired function:
psu <- Pseudonymizer$new()
new_method <- function(key, ...){
paste(sample(letters, 12, T), collapse="")
}
psu$set_method(new_method)
deident(df, psu, A)
#> DeidentList
#> 1 step(s) implemented
#> Step 1 : 'Pseudonymizer' on variable(s) A
#> For data:
#> columns: A, B, CThe first argument to the method receives the key to be transformed.
Implemented deident options:
Apply cryptographic hashing to a variable.
Implemented deident options:
deident(df, "encrypt", A)
deident(df, "Encrypter", A)
deident(df, Encrypter, A)
deident(df, Encrypter$new(), A)
encrypt <- Encrypter$new()
deident(df, encrypt, A)At initialization, Encrypter can be given
hash_key and seed values to control the
cryptographic encryption. It is recommended users set these values and
do not disclose them.
Apply Gaussian white noise to a numeric variable.
Implemented deident options:
Aggregate categorical values dependent on a user supplied list. the
list must be supplied to Blur at initialization.
Implemented deident options:
Aggregate numeric values dependent on a user supplied vector of
breaks/ cuts. If no vector is supplied NumericBlurer
defaults to a binary classification about 0.
Implemented deident options:
deident(df, "numeric_blur", B)
deident(df, "NumericBlurer", B)
deident(df, NumericBlurer, B)
deident(df, NumericBlurer$new(), B)
numeric_blur <- NumericBlurer$new()
deident(df, numeric_blur, B)At initialization NumericBlurer takes an argument
cuts to define the limits of each interval.
Apply Shuffler to a data set having first grouped the
data on column(s). The grouping needs to be defined at
initialization.
Implemented deident options:
At initialization GroupedShuffler takes an argument
limit such that if any aggregated sub group has fewer than
limit observations all values are dropped.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.