| Type: | Package |
| Title: | Visual Diagnostics for Multiple Imputation |
| Version: | 0.9.5 |
| Description: | A comprehensive suite of static and interactive visual diagnostics for assessing the quality of multiply-imputed data obtained from packages such as 'mixgb' and 'mice'. The package supports inspection of distributional characteristics, diagnostics based on masking observed values and comparing them with re-imputed values, and convergence diagnostics. |
| URL: | https://agnesdeng.github.io/vismi/ |
| BugReports: | https://github.com/agnesdeng/vismi/issues |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| Language: | en-GB |
| LazyData: | true |
| Depends: | R (≥ 4.3.0) |
| Imports: | cli, data.table, dplyr, GGally, ggplot2 (≥ 4.0.1), ggtext, gridExtra, ggridges, mixgb (≥ 2.2.3), patchwork, plotly, purrr, rlang, stats, scales, tidyr, trelliscopejs |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-30 21:43:16 UTC; agnes |
| Author: | Yongshi Deng |
| Maintainer: | Yongshi Deng <agnes.yongshideng@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-03 14:10:02 UTC |
vismi: Visual Diagnostics for Multiple Imputation
Description
A comprehensive suite of static and interactive visual diagnostics for assessing the quality of multiply-imputed data obtained from packages such as 'mixgb' and 'mice'. The package supports inspection of distributional characteristics, diagnostics based on masking observed values and comparing them with re-imputed values, and convergence diagnostics.
Author(s)
Maintainer: Yongshi Deng agnes.yongshideng@gmail.com (ORCID)
Other contributors:
Thomas Lumley t.lumley@auckland.ac.nz [thesis advisor]
References
Yongshi Deng, Thomas Lumley. (2026), vismi: Visual Diagnostics for Multiple Imputation, R package version 0.9.3
See Also
Useful links:
Precomputed mixgb imputed datasets for 'newborn'
Description
A small precomputed list object containing 5 imputed datasets generated by 'mixgb::mixgb()' on the 'newborn' example data. This dataset is included so that users can run plotting examples without installing 'mixgb'.
Usage
data(imp_newborn)
Format
A list of 5 data.frames (each a completed dataset) created by 'mixgb::mixgb()' in development.
Source
Generated during package development with 'mixgb::mixgb()'.
Precomputed mixgb imputed datasets for 'nhanes3'
Description
A small precomputed list object containing 5 imputed datasets generated by 'mixgb::mixgb()' on the 'nhanes3' example data. This dataset is included so that users can run plotting examples without installing 'mixgb'.
Usage
data(imp_nhanes3)
Format
A list of 5 data.frames (each a completed dataset) created by 'mixgb::mixgb()' in development.
Source
Generated during package development with 'mixgb::mixgb()'.
NHANES III (1988-1994) newborn data
Description
This dataset is extracted from the NHANES III (1988-1994) for the age class Newborn (under 1 year). Please note that this example dataset only contains selected variables and is for demonstration purposes only.
Usage
data(newborn)
Format
A data frame of 2107 rows and 16 variables, adapted from the NHANES III dataset. Nine variables contain missing values. Variable names and factor levels have been renamed for clarity and easier interpretation.
- household_size
Household size. An integer variable ranging from 1 to 10. The original variable name in the NHANES III dataset is
HSHSIZER.- age_months
Age at interview (screener), in months. An integer variable ranging from 2 to 11. The original variable name in the NHANES III dataset is
HSAGEIR.- sex
Sex of the subject. A factor variable with levels
MaleandFemale. The original variable name in the NHANES III dataset isHSSEX.- race
Race of the subject. A factor variable with levels
White,Black, andOther. The original variable name in the NHANES III dataset isDMARACER.- ethnicity
Ethnicity of the subject. A factor variable with levels
Mexican-American,Other Hispanic, andNot Hispanic. The original variable name in the NHANES III dataset isDMAETHNR.- race_ethinicity
Combined race–ethnicity classification. A factor variable with levels
Non-Hispanic White,Non-Hispanic Black,Mexican-American, andOther. The original variable name in the NHANES III dataset isDMARETHN.- head_circumference_cm
Head circumference, in centimetres. Numeric. The original variable name in the NHANES III dataset is
BMPHEAD.- recumbent_length_cm
Recumbent length, in centimetres. Numeric. The original variable name in the NHANES III dataset is
BMPRECUM.- first_subscapular_skinfold_mm
First subscapular skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is
BMPSB1.- second_subscapular_skinfold_mm
Second subscapular skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is
BMPSB2.- first_triceps_skinfold_mm
First triceps skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is
BMPTR1.- second_triceps_skinfold_mm
Second triceps skinfold thickness, in millimetres. Numeric. The original variable name in the NHANES III dataset is
BMPTR2.- weight_kg
Body weight, in kilograms. Numeric. The original variable name in the NHANES III dataset is
BMPWT.- poverty_income_ratio
Poverty income ratio. Numeric. The original variable name in the NHANES III dataset is
DMPPIR.- smoke
Whether anyone living in the household smokes cigarettes inside the home. A factor variable with levels
YesandNo. The original variable name in the NHANES III dataset isHFF1.- health
General health status of the subject. An ordered factor with levels
Excellent,Very Good,Good,Fair, andPoor. The original variable name in the NHANES III dataset isHYD1.
Source
https://wwwn.cdc.gov/nchs/nhanes/nhanes3/datafiles.aspx
References
U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics. Third National Health and Nutrition Examination Survey (NHANES III, 1988-1994): Multiply Imputed Data Set. CD-ROM, Series 11, No. 7A. Hyattsville, MD: Centers for Disease Control and Prevention, 2001. Includes access software: Adobe Systems, Inc. Acrobat Reader version 4.
A small subset of the NHANES III (1988-1994) newborn data
Description
This dataset is a small subset of newborn. It is for demonstration purposes only. More information on NHANES III data can be found on https://wwwn.cdc.gov/Nchs/Data/Nhanes3/7a/doc/mimodels.pdf
Usage
data(nhanes3)
Format
A data frame of 500 rows and 6 variables. Three variables have missing values.
- age_months
Age at interview (screener), in months. An integer variable ranging from 2 to 11. The original variable name in the NHANES III dataset is
HSAGEIR.- sex
Sex of the subject. A factor variable with levels
MaleandFemale. The original variable name in the NHANES III dataset isHSSEX.- ethnicity
Ethnicity of the subject. A factor variable with levels
Mexican-American,Other Hispanic, andNot Hispanic. The original variable name in the NHANES III dataset isDMAETHNR.- head_circumference_cm
Head circumference, in centimetres. Numeric. The original variable name in the NHANES III dataset is
BMPHEAD.- recumbent_length_cm
Recumbent length, in centimetres. Numeric. The original variable name in the NHANES III dataset is
BMPRECUM.- weight_kg
Body weight, in kilograms. Numeric. The original variable name in the NHANES III dataset is
BMPWT.
Source
https://wwwn.cdc.gov/nchs/nhanes/nhanes3/datafiles.aspx
References
U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics. Third National Health and Nutrition Examination Survey (NHANES III, 1988-1994): Multiply Imputed Data Set. CD-ROM, Series 11, No. 7A. Hyattsville, MD: Centers for Disease Control and Prevention, 2001. Includes access software: Adobe Systems, Inc. Acrobat Reader version 4.
Overimpute main function
Description
Overimp main function to call different imputation methods.
Usage
overimp(
data,
m = 5,
p = 0.2,
test_ratio = 0,
method = "mixgb",
seed = NULL,
...
)
Arguments
data |
A data frame with missing values. |
m |
The number of imputation. |
p |
The extra proportion of missing values. |
test_ratio |
The proportion of test set. Default is 0, meaning no test set. |
method |
Can be one of the following: "mixgb","mice", and more in the future. |
seed |
Random seed. |
... |
Other arguments to be passed into the overimp function. |
Value
An overimp object containing imputed training, test data (if applicable) and essential parameters required for plotting.
Examples
obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0.2, method = "mixgb")
print method for vismi objects
Description
vismi Print method for vismi objects
Usage
## S3 method for class 'vismi'
print(x, ...)
Arguments
x |
An object of class 'vismi' created by the |
... |
Additional arguments (not used). |
Value
A vismi object, returned invisibly.
Trelliscope Visualisation of Distributional Characteristics
Description
Generates a Trelliscope display for distributional characteristics across all variables.
Usage
trellis_vismi(
data,
imp_list,
m = NULL,
imp_idx = NULL,
integerAsFactor = FALSE,
title = "auto",
subtitle = "auto",
color_pal = NULL,
marginal_x = "box+rug",
nrow = 2,
ncol = 4,
path = NULL,
verbose = FALSE,
...
)
Arguments
data |
A data frame containing the original data with missing values. |
imp_list |
A list of imputed data frames. |
m |
An integer specifying the number of imputed datasets to plot. It should be smaller than |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. Default is NULL (plot all). |
integerAsFactor |
A logical value indicating whether to treat integer variables as factors (TRUE) or numeric (FALSE). Default is FALSE. |
title |
A string specifying the title of the plot. Default is "auto" (automatic title based on |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle based on |
color_pal |
A named vector of colors for different imputation sets. If NULL (default), a default color palette is used. |
marginal_x |
A character string specifying the type of marginal plot to add for the x variable in 2D plots. Options are "hist", "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = TRUE. Options are "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = FALSE. |
nrow |
Number of rows in the Trelliscope display. Default is 2. |
ncol |
Number of columns in the Trelliscope display. Default is 4. |
path |
Optional path to save the Trelliscope display. If NULL, the display will not be saved to disk. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments passed to the underlying plotting functions, such as point_size, alpha, nbins, width, and boxpoints. |
Value
A Trelliscope display object visualising distributional characteristics for all variables.
Examples
trellis_vismi(data = nhanes3, imp_list = imp_nhanes3, marginal_x = "box")
Trelliscope Visualisation of Convergence Diagnostics
Description
Generates a Trelliscope display for convergence diagnostics across all variables.
Usage
trellis_vismi_converge(
obj,
tick_vals = NULL,
color_pal = NULL,
title = "auto",
subtitle = "auto",
nrow = 2,
ncol = 4,
path = NULL,
verbose = FALSE,
...
)
Arguments
obj |
An object of class 'mixgb' or 'mids' containing intermediate imputed result for each iteration. |
tick_vals |
A numeric vector specifying the tick values for the x-axis (iterations). If NULL, default tick values will be used. |
color_pal |
A vector of colors to use for the imputation lines. If NULL, default colors will be used. |
title |
A string specifying the title of the plot. If NULL, no title is shown. If "auto", a title will be generated based on the input. Default is "auto". |
subtitle |
A string specifying the subtitle of the plot. If NULL, no subtitle is shown. If "auto", a title will be generated based on the input. Default is "auto". |
nrow |
Number of rows in the Trelliscope display. Default is 2. |
ncol |
Number of columns in the Trelliscope display. Default is 4. |
path |
Optional path to save the Trelliscope display. If NULL, the display will not be saved to disk. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments to customize the Trelliscope display. |
Value
A Trelliscope display object visualising convergence diagnostics for all variables.
Examples
library(mixgb)
set.seed(2026)
mixgb_obj <- mixgb(data = nhanes3, m = 3, maxit = 4, pmm.type = "auto", save.models = TRUE)
trellis_vismi_converge(obj = mixgb_obj)
Trelliscope Visualisation of Overimputation Diagnostics
Description
Generates a Trelliscope display for overimputation diagnostics across all variables.
Usage
trellis_vismi_overimp(
obj,
m = NULL,
imp_idx = NULL,
integerAsFactor = FALSE,
title = "auto",
subtitle = "auto",
num_plot = "cv",
fac_plot = "cv",
train_color_pal = NULL,
test_color_pal = NULL,
stack_y = FALSE,
diag_color = "white",
seed = 2025,
nrow = 2,
ncol = 4,
path = NULL,
verbose = FALSE,
...
)
Arguments
obj |
An object of class 'overimp' containing imputed datasets and parameters. |
m |
A single positive integer specifying the number of imputed datasets to plot. It should be smaller than the total number of imputed datasets in the object. Default is NULL ( plot all). |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. Default is NULL (plot all). |
integerAsFactor |
A logical indicating whether integer variables should be treated as factors. Default is FALSE (treated as numeric). |
title |
A string specifying the title of the plot. Default is "auto" (automatic title). If NULL, no title is shown. |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle). If NULL, no subtitle is shown. |
num_plot |
A character string specifying the type of plot for numeric variables. Options are "cv" (cross-validation), "ridge", or "density". Default is "cv". |
fac_plot |
A character string specifying the type of plot for categorical variables. Options are "cv" (cross-validation), "bar", or "dodge". Default is "cv". |
train_color_pal |
A vector of colors for the training data. If NULL, default colors will be used. |
test_color_pal |
A vector of colors for the test data. If NULL, default colors will be used. |
stack_y |
A logical indicating whether to stack y-values in the plots. Default is FALSE. |
diag_color |
A color specification for the diagonal line in the plots. Default is NULL. |
seed |
An integer seed for reproducibility. Default is 2025. |
nrow |
Number of rows in the Trelliscope display. Default is 2. |
ncol |
Number of columns in the Trelliscope display. Default is 4. |
path |
Optional path to save the Trelliscope display. If NULL, the display will not be saved to disk. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments to customize the plots, such as point_size, xlim, ylim. |
Value
A Trelliscope display object visualising overimputation diagnostics for all variables.
Examples
obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0, method = "mixgb")
trellis_vismi_overimp(obj = obj, stack_y = TRUE)
Visualise Multiple Imputations Through Distributional Characteristics
Description
This function provides visual diagnostic tools for assessing multiply imputed datasets created with 'mixgb' or other imputers through inspecting the distributional characteristics of imputed variables. It supports 1D, 2D, and 3D visualisations for numeric and categorical variables using either interactive or static plots.
Usage
vismi(
data,
imp_list,
x = NULL,
y = NULL,
z = NULL,
m = NULL,
imp_idx = NULL,
interactive = FALSE,
integerAsFactor = FALSE,
title = "auto",
subtitle = "auto",
color_pal = NULL,
marginal_x = "box+rug",
marginal_y = NULL,
verbose = FALSE,
...
)
Arguments
data |
A data frame containing the original data with missing values. |
imp_list |
A list of imputed data frames. |
x |
A character string specifying the name of the variable to plot on the x axis. Default is NULL. |
y |
A character string specifying the name of the variable to plot on the y axis. Default is NULL. |
z |
A character string specifying the name of the variable to plot on the z axis. Default is NULL. |
m |
An integer specifying the number of imputed datasets used for visualisation. It should be smaller than |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. Default is NULL (plot all). |
interactive |
A logical value indicating whether to create an interactive plotly plot (TRUE by default) or a static ggplot2 plot (FALSE). |
integerAsFactor |
A logical value indicating whether to treat integer variables as factors (TRUE) or numeric (FALSE). Default is FALSE. |
title |
A string specifying the title of the plot. Default is "auto" (automatic title based on |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle based on |
color_pal |
A named vector of colors for different imputation sets. If NULL (default), a default color palette is used. |
marginal_x |
A character string specifying the type of marginal plot to add for the x variable in 2D plots. Options are "hist", "box", "rug", "box+rug"(default), or NULL when interactive = TRUE. Options are "box", "rug", "box+rug"(default), or NULL when interactive = FALSE. |
marginal_y |
A character string specifying the type of marginal plot to add for the y variable in 2D plots. Options are "hist", "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = TRUE. Options are "box", "rug", "box+rug", or NULL (default, no marginal plot) when interactive = FALSE. |
verbose |
A logical value indicating whether to print extra information. Default is FALSE. |
... |
Additional arguments passed to the underlying plotting functions, such as point_size, alpha, nbins, width, and boxpoints. |
Value
A plotly or ggplot2 object visualising the multiply-imputed data.
Examples
vismi(data = nhanes3, imp_list = imp_nhanes3, x = "weight_kg", y = "head_circumference_cm", z="sex")
Visualise convergence diagnostics
Description
This function generates convergence diagnostic plots showing the mean and standard deviation (SD) of imputed values for a specified variable across iterations.
Usage
vismi_converge(
obj,
x,
xlim = NULL,
mean_lim = NULL,
sd_lim = NULL,
title = "auto",
subtitle = "auto",
tick_vals = NULL,
color_pal = NULL,
linewidth = 0.8,
...
)
Arguments
obj |
A 'mixgb' object returned by |
x |
The name of the variable to plot convergence for. |
xlim |
Optional numeric vector of length 2 specifying the x-axis limits for iterations. |
mean_lim |
Optional numeric vector of length 2 specifying the y-axis limits for mean values of the variable. |
sd_lim |
Optional numeric vector of length 2 specifying the y-axis limits for standard deviation values of the variable. |
title |
A string specifying the title of the plot. If NULL, no title is shown. If "auto", a title will be generated based on the input. Default is "auto". |
subtitle |
A string specifying the subtitle of the plot. If NULL, no subtitle is shown. If "auto", a title will be generated based on the input. Default is "auto". |
tick_vals |
Optional numeric vector specifying x-axis tick values for iterations. |
color_pal |
A vector of m color codes (e.g., hex codes). If NULL, default colors will be used. |
linewidth |
The line width for the plot lines. Default is 0.8. |
... |
Additional arguments. |
Value
Two side-by-side ggplot2 object showing the mean and standard deviation (SD) of imputed values for a specified variable across iterations.
Examples
library(mixgb)
set.seed(2026)
mixgb_obj <- mixgb(data = nhanes3, m = 3, maxit = 4, pmm.type = "auto", save.models = TRUE)
vismi_converge(obj = mixgb_obj, x = "recumbent_length_cm")
Visualise Multiple Imputation Through Overimputation
Description
This function provides overimputation diagnostics for assessing imputations generated by 'mice', 'mixgb' or other imputers. It supports evaluation on both training and test data.
Usage
vismi_overimp(
obj,
x = NULL,
y = NULL,
z = NULL,
m = NULL,
imp_idx = NULL,
integerAsFactor = FALSE,
title = "auto",
subtitle = "auto",
num_plot = "cv",
fac_plot = "cv",
train_color_pal = NULL,
test_color_pal = NULL,
stack_y = FALSE,
diag_color = NULL,
seed = 2025,
...
)
Arguments
obj |
Overimputation object of class 'overimp' created by the |
x |
A character string specifying the name of the variable to plot on the x axis. Default is NULL. |
y |
A character string specifying the name of the variable to plot on the y axis. Default is NULL. |
z |
A character string specifying the name of the variable to plot on the z axis. Default is NULL. |
m |
A single positive integer specifying the number of imputed datasets to plot. It should be smaller than the total number of imputed datasets in the object. |
imp_idx |
A vector of integers specifying the indices of imputed datasets to plot. |
integerAsFactor |
A logical indicating whether integer variables should be treated as factors. Default is FALSE (treated as numeric). |
title |
A string specifying the title of the plot. Default is "auto" (automatic title based on |
subtitle |
A string specifying the subtitle of the plot. Default is "auto" (automatic subtitle based on |
num_plot |
A character string specifying the type of plot for numeric variables. |
fac_plot |
A character string specifying the type of plot for categorical variables. |
train_color_pal |
A vector of colors for the training data. If NULL, default colors will be used. |
test_color_pal |
A vector of colors for the test data. If NULL, default colors will be used. |
stack_y |
A logical indicating whether to stack y values in certain plots. Default is FALSE. |
diag_color |
A character string specifying the color of the diagonal line in scatter plots. Default is NULL. |
seed |
An integer specifying the random seed for reproducibility. Default is 2025. |
... |
Additional arguments to customize the plots, such as position, point_size, linewidth, alpha, xlim, ylim, boxpoints, width. |
Value
An overimp_plot object displaying the overimputation plots for training and test data (if users set test_ratio > 0 in the overimp() function.)
Examples
obj <- overimp(data = nhanes3, m = 3, p = 0.2, test_ratio = 0.2, method = "mixgb")
vismi_overimp(obj = obj, x = "head_circumference_cm", num_plot = "cv")