The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Initial release: a comprehensive framework for measuring disclosure
risk and data utility of anonymized and synthetic data. All measures
share a consistent S3 API (print(), summary(),
plot()) and feed a multivariate Risk-Utility (R-U) map.
dcap()
(reports both the raw mean CAP and the differential CAP = mean CAP minus
baseline), tcap(), weap(),
disco().rapid() (Risk of Attribute
Prediction-Induced Disclosure; random-forest default, also
lm/cart/gbm/logit)
with confint(), permutation test, threshold selection,
synthesizer cross-validation, and six plot types.dcr(),
nndr(), ims(), repu(), including
the DCR-Delusion caveat and null-distribution diagnostics.domias(),
nnaa(), mia_classifier().kanonymity(), ldiversity()
(distinct/entropy/recursive), tcloseness() (EMD),
suda(), individual_risk(),
population_uniqueness() (Pitman/Zayatz/SNB),
epsilon_identifiability(), delta_presence(),
hitting_rate(), singling_out(),
linkability(), attacker_risk()
(prosecutor/journalist/marketer), drisk().recordLinkage() with
deterministic, probabilistic (Fellegi-Sunter), PRAM, predictive,
random-forest, RBRL, robust-Mahalanobis, and embedding (autoencoder)
methods; independent, bijective (Hungarian / GDBRL), and
optimal-transport (Sinkhorn) matching; blocking and per-record
accessors. All eight methods share a single re-identification-risk
definition — the probability of identifying the true match
within the attacker’s candidate set. For the random-forest and embedding
methods, the nearest-neighbour similarity (their former
risk value) is now retained in an
nn_similarity diagnostic column. na_anon
(ignore/match/mismatch) is
honored consistently across all methods (PRAM no longer reports an
artificial zero risk for records with a missing key). New options:
compute_baseline = TRUE reports the no-perturbation
reference risk (with risk_reduction), and
expected_risk = TRUE reports a perturbation-aware expected
PRAM risk over the transition distribution. User-supplied
m_probs/u_probs are validated and clamped to
the open interval (0,1).disclosure_report()
produces a comprehensive multi-metric report.propscore(),
pMSE(), specks().gower(), mqs(),
ci_overlap(), ci_proximity().compare_wasserstein(),
compare_ks_test(), compare_chisq_gof(),
compare_pca(), compare_embedding(),
compare_correlation_matrices(), hellinger(),
energy_distance(), mmd(),
copula_fidelity(), tail_fidelity(),
contingency_fidelity().tstr() (train on synthetic,
test on real), compare_feature_importance(),
compare_model_performance(),
regression_fidelity(),
subgroup_utility().KLDiv(), JSDiv(),
CrossEntropy(), entropy and mutual-information helpers,
privacy_score().rumap(): normalized multivariate R-U evaluation with
Pareto-frontier identification, internal-consistency metrics, and seven
visualizations (scatter, heatmap, dot plot, parallel coordinates,
radial, PCA biplot, blockwise PCA).synth_pair() container plus
from_synthpop() and from_simPop() converters;
most measures dispatch on synth_pair objects as well as
plain data frames.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.