The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette demonstrates how to prepare and validate data before
running multiple imputation with {rbmi}.
The {rbmiUtils} package provides three key functions for
this workflow:
validate_data(): Pre-flight validation to catch common
data issuesprepare_data_ice(): Build intercurrent event data from
flag columnssummarise_missingness(): Understand missing data
patternsUsing these functions helps ensure your imputation will run successfully and gives you insight into the structure of your missing data.
We’ll create a small example dataset to demonstrate the functions:
set.seed(42)
dat <- data.frame(
USUBJID = factor(rep(paste0("SUBJ-", 1:20), each = 4)),
AVISIT = factor(
rep(c("Week 4", "Week 8", "Week 12", "Week 16"), 20),
levels = c("Week 4", "Week 8", "Week 12", "Week 16")
),
TRT = factor(rep(c("Placebo", "Drug A"), each = 40)),
BASE = rep(round(rnorm(20, 50, 10), 1), each = 4),
STRATA = factor(rep(sample(c("Low", "High"), 20, replace = TRUE), each = 4))
)
# Generate CHG with some missing values
dat$CHG <- round(rnorm(80, mean = -2, sd = 3), 1)
# Create missing data patterns:
# - Subjects 3, 8: monotone dropout at Week 12
# - Subject 15: intermittent missing at Week 8
# - Subject 18: monotone dropout at Week 16
dat$CHG[dat$USUBJID == "SUBJ-3" & dat$AVISIT %in% c("Week 12", "Week 16")] <- NA
dat$CHG[dat$USUBJID == "SUBJ-8" & dat$AVISIT %in% c("Week 12", "Week 16")] <- NA
dat$CHG[dat$USUBJID == "SUBJ-15" & dat$AVISIT == "Week 8"] <- NA
dat$CHG[dat$USUBJID == "SUBJ-18" & dat$AVISIT == "Week 16"] <- NA
# Add discontinuation flag
dat$DISCFL <- ifelse(
dat$USUBJID %in% c("SUBJ-3", "SUBJ-8") & dat$AVISIT == "Week 12",
"Y",
ifelse(
dat$USUBJID == "SUBJ-18" & dat$AVISIT == "Week 16",
"Y",
"N"
)
)
head(dat, 12)
#> USUBJID AVISIT TRT BASE STRATA CHG DISCFL
#> 1 SUBJ-1 Week 4 Placebo 63.7 Low -0.6 N
#> 2 SUBJ-1 Week 8 Placebo 63.7 Low 0.1 N
#> 3 SUBJ-1 Week 12 Placebo 63.7 Low 1.1 N
#> 4 SUBJ-1 Week 16 Placebo 63.7 Low -3.8 N
#> 5 SUBJ-2 Week 4 Placebo 44.4 Low -0.5 N
#> 6 SUBJ-2 Week 8 Placebo 44.4 Low -7.2 N
#> 7 SUBJ-2 Week 12 Placebo 44.4 Low -4.4 N
#> 8 SUBJ-2 Week 16 Placebo 44.4 Low -4.6 N
#> 9 SUBJ-3 Week 4 Placebo 53.6 High -9.2 N
#> 10 SUBJ-3 Week 8 Placebo 53.6 High -1.9 N
#> 11 SUBJ-3 Week 12 Placebo 53.6 High NA Y
#> 12 SUBJ-3 Week 16 Placebo 53.6 High NA NThe validate_data() function performs comprehensive
checks on your data before imputation:
The function checks:
data_ice is provided: valid subjects, visits, and
strategiesHere’s an example of how validation catches issues:
# Create problematic data
bad_dat <- dat
bad_dat$TRT <- as.character(bad_dat$TRT) # Should be factor
bad_dat$BASE[1] <- NA # Covariate with missing value
# This will report all issues at once
tryCatch(
validate_data(bad_dat, vars),
error = function(e) cat(e$message)
)
#> Warning: 1 column is character instead of factor.
#> ℹ Column: TRT.
#> ℹ `rbmi::draws()` will auto-coerce, but explicit conversion gives you control
#> over level ordering.
#> ℹ Example: `data$TRT <- factor(data$TRT)`
#> Data validation failed.Before imputation, it’s important to understand your missing data patterns:
print(miss$by_visit)
#> # A tibble: 8 × 5
#> visit group n n_miss pct_miss
#> <fct> <fct> <int> <int> <dbl>
#> 1 Week 4 Drug A 10 0 0
#> 2 Week 4 Placebo 10 0 0
#> 3 Week 8 Drug A 10 1 10
#> 4 Week 8 Placebo 10 0 0
#> 5 Week 12 Drug A 10 0 0
#> 6 Week 12 Placebo 10 2 20
#> 7 Week 16 Drug A 10 1 10
#> 8 Week 16 Placebo 10 2 20print(miss$patterns)
#> # A tibble: 20 × 4
#> USUBJID TRT pattern dropout_visit
#> <fct> <chr> <chr> <chr>
#> 1 SUBJ-1 Placebo complete <NA>
#> 2 SUBJ-2 Placebo complete <NA>
#> 3 SUBJ-3 Placebo monotone Week 12
#> 4 SUBJ-4 Placebo complete <NA>
#> 5 SUBJ-5 Placebo complete <NA>
#> 6 SUBJ-6 Placebo complete <NA>
#> 7 SUBJ-7 Placebo complete <NA>
#> 8 SUBJ-8 Placebo monotone Week 12
#> 9 SUBJ-9 Placebo complete <NA>
#> 10 SUBJ-10 Placebo complete <NA>
#> 11 SUBJ-11 Drug A complete <NA>
#> 12 SUBJ-12 Drug A complete <NA>
#> 13 SUBJ-13 Drug A complete <NA>
#> 14 SUBJ-14 Drug A complete <NA>
#> 15 SUBJ-15 Drug A intermittent <NA>
#> 16 SUBJ-16 Drug A complete <NA>
#> 17 SUBJ-17 Drug A complete <NA>
#> 18 SUBJ-18 Drug A monotone Week 16
#> 19 SUBJ-19 Drug A complete <NA>
#> 20 SUBJ-20 Drug A complete <NA>print(miss$summary)
#> # A tibble: 2 × 5
#> group n_subjects n_complete n_monotone n_intermittent
#> <chr> <int> <int> <int> <int>
#> 1 Drug A 10 8 1 1
#> 2 Placebo 10 8 2 0The three pattern types are:
When subjects discontinue treatment, you may want to apply
reference-based imputation strategies (see the {rbmi}
documentation for details on intercurrent event handling). The
prepare_data_ice() function builds the required
data_ice data.frame from a discontinuation flag:
data_ice <- prepare_data_ice(
data = dat,
vars = vars,
ice_col = "DISCFL",
strategy = "JR" # Jump to Reference
)
print(data_ice)
#> USUBJID AVISIT STRATEGY
#> 1 SUBJ-18 Week 16 JR
#> 2 SUBJ-3 Week 12 JR
#> 3 SUBJ-8 Week 12 JRThe function:
"Y",
TRUE, or 1)Available strategies are:
"MAR": Missing at Random"CR": Copy Reference"JR": Jump to Reference"CIR": Copy Increment from Reference"LMCF": Last Mean Carried ForwardHere’s how these functions fit into a typical {rbmi}
workflow:
library(rbmi)
library(rbmiUtils)
# 1. Validate data
validate_data(dat, vars)
# 2. Understand missing patterns
miss <- summarise_missingness(dat, vars)
print(miss$summary)
# 3. Prepare ICE data if needed
data_ice <- prepare_data_ice(dat, vars, ice_col = "DISCFL", strategy = "JR")
# 4. Define method
method <- method_bayes(
n_samples = 100,
control = control_bayes(warmup = 200, thin = 2)
)
# 5. Run imputation
draws_obj <- draws(
data = dat,
vars = vars,
data_ice = data_ice,
method = method
)
# 6. Continue with impute() and analyse()The data preparation functions in {rbmiUtils} help
you:
validate_data() before running time-consuming
imputationssummarise_missingness() to characterize missing data
patternsprepare_data_ice() to build data_ice from flag
columnsThese utilities complement the core {rbmi}
package and support reproducible, well-documented analysis
workflows.
After data preparation, see vignette('pipeline') for the
complete analysis workflow from imputation through to regulatory
tables.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.