The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction to bayesDiagnostics

bayesDiagnostics: Comprehensive Bayesian Model Diagnostics

The bayesDiagnostics package provides a comprehensive suite of diagnostic tools for Bayesian models fitted with brms, rstan, and other popular Bayesian modeling packages. This vignette demonstrates all major functions organized by diagnostic category.

library(bayesDiagnostics)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.23.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(ggplot2)

Overview of Package Functions

The package contains 18 main functions organized into 5 categories:

  1. MCMC Convergence Diagnostics (3 functions)
  2. Posterior Predictive Checks (5 functions)
  3. Prior Specification & Sensitivity (3 functions)
  4. Model Comparison (3 functions)
  5. Utilities & Reporting (4 functions)

Part 1: MCMC Convergence Diagnostics

1.1 Quick MCMC Diagnostics Summary

The mcmc_diagnostics_summary() function provides a comprehensive overview of MCMC convergence, including R-hat, ESS, and NUTS-specific diagnostics.

# Fit a simple model
data <- data.frame(
  y = rnorm(100, mean = 5, sd = 2),
  x1 = rnorm(100),
  x2 = rnorm(100)
)

fit <- brm(y ~ x1 + x2, data = data, chains = 4, iter = 2000, refresh = 0)

# Run comprehensive diagnostics
diag <- mcmc_diagnostics_summary(fit, rhat_threshold = 1.01, ess_threshold = 400)
print(diag)

# Check results
diag$converged  # TRUE/FALSE
diag$summary_table  # Full diagnostic table
diag$divergences  # Number of divergent transitions

What it checks: - R-hat values (should be < 1.01) - Bulk ESS and Tail ESS (should be > 400) - Divergent transitions (should be 0) - Tree depth saturation (NUTS-specific)


1.2 Effective Sample Size Diagnostics

The effective_sample_size_diagnostics() function provides detailed ESS analysis with visual diagnostics.

# Detailed ESS analysis
ess_diag <- effective_sample_size_diagnostics(
  model = fit,
  parameters = c("b_x1", "b_x2", "sigma"),
  min_ess = 400,
  by_chain = TRUE,
  plot = TRUE
)

print(ess_diag)
plot(ess_diag)

# Access specific components
ess_diag$ess_summary        # Summary statistics
ess_diag$bulk_ess           # Bulk ESS per parameter
ess_diag$tail_ess           # Tail ESS per parameter
ess_diag$by_chain_ess       # Per-chain ESS breakdown
ess_diag$problematic_params # Parameters with low ESS
ess_diag$recommendations    # Actionable suggestions

Interpretation: - Bulk ESS: Measures precision of posterior mean/median estimates - Tail ESS: Measures precision of credible interval bounds - Per-chain ESS: Identifies which specific chains have issues


1.3 Hierarchical Model Convergence

For hierarchical/multilevel models, use hierarchical_convergence() to check group-level and population-level parameters separately.

# Fit hierarchical model
data_hier <- data.frame(
  y = rnorm(200),
  x = rnorm(200),
  group = rep(1:10, each = 20)
)

fit_hier <- brm(
  y ~ x + (1 + x | group),
  data = data_hier,
  chains = 4,
  refresh = 0
)

# Check hierarchical convergence
hier_conv <- hierarchical_convergence(
  model = fit_hier,
  group_vars = "group",
  plot = TRUE
)

print(hier_conv)
plot(hier_conv)

What it provides: - Group-level vs. population-level convergence breakdown - Variance component diagnostics - Shrinkage assessment - Visual comparison of group-level parameters


Part 2: Posterior Predictive Checks (PPC)

2.1 Comprehensive Posterior Predictive Checks

The posterior_predictive_check() function is the workhorse for model validation.

# Basic PPC with multiple test statistics
ppc_result <- posterior_predictive_check(
  model = fit,
  observed_data = data$y,
  n_samples = 1000,
  test_statistics = c("mean", "sd", "median", "min", "max", "skewness", "kurtosis"),
  plot = TRUE
)

print(ppc_result)
plot(ppc_result)

# Access results
ppc_result$observed_stats      # Observed test statistics
ppc_result$replicated_stats    # Posterior predictive statistics
ppc_result$p_values            # Bayesian p-values (should be ~0.5)

Interpretation Guide: - p-value ≈ 0.5: Good fit (model captures the statistic well) - p-value < 0.05 or > 0.95: Model misspecification - p-value near 0: Model underestimates the statistic - p-value near 1: Model overestimates the statistic


2.2 Automated PPC

For quick diagnostic checks across multiple statistics, use automated_ppc():

# Automated checks with flagging
auto_ppc <- automated_ppc(
  model = fit,
  observed_data = data$y,
  n_samples = 1000,
  p_value_threshold = 0.05
)

print(auto_ppc)

# Check for issues
auto_ppc$diagnostics  # Table with all statistics and flags
auto_ppc$flags        # Text warnings for problematic statistics

When to use: - Quick model validation - Screening for gross misspecification - Automated reporting workflows


2.3 Graphical PPC

Create publication-quality PPC visualizations:

# Density overlay
p1 <- graphical_ppc(fit, data$y, type = "density", n_draws = 50)
print(p1)

# Prediction intervals
p2 <- graphical_ppc(fit, data$y, type = "intervals", n_draws = 50)
print(p2)

# Ribbon plot (useful for ordered/time-series data)
p3 <- graphical_ppc(fit, data$y, type = "ribbon", n_draws = 50)
print(p3)

2.4 LOO Cross-Validation PPC

The ppc_crossvalidation() function performs Leave-One-Out Probability Integral Transform (LOO-PIT) checks:

# LOO-PIT check
loo_ppc <- ppc_crossvalidation(
  model = fit,
  observed_y = data$y,
  n_draws = NULL  # Use all draws for accuracy
)

print(loo_ppc)
plot(loo_ppc)

# Inspect results
loo_ppc$pit_values   # Should be ~Uniform(0,1) if well-calibrated
loo_ppc$loo_result   # LOO object
loo_ppc$weighted     # Whether weighted PIT was used

Interpretation: - Uniform PIT distribution: Model is well-calibrated - U-shaped: Over-dispersed predictions - Inverse-U: Under-dispersed predictions - Skewed: Systematic bias in predictions


2.5 Custom Bayesian P-Values

For custom test statistics, use the flexible bayesian_p_values() utility:

# Generate posterior predictive samples
yrep <- posterior_predict(fit, ndraws = 1000)

# Define custom test statistic
custom_stat <- function(x) max(x) - min(x)  # Range

# Calculate Bayesian p-value
result <- bayesian_p_values(
  yrep = yrep,
  y = data$y,
  statistic = custom_stat
)

result$observed_value     # Observed range
result$replicated_values  # Posterior predictive ranges
result$p_value           # P(replicated ≥ observed)

Part 3: Prior Specification & Sensitivity Analysis

3.1 Prior Elicitation Helper

The prior_elicitation_helper() translates expert knowledge into statistical priors:

# Define expert beliefs
expert_input <- list(
  parameter_name = "treatment_effect",
  plausible_range = c(-0.5, 1.5),      # Plausible values
  most_likely_value = 0.3,             # Best guess
  confidence = 0.8                      # How confident (0-1)
)

# Get prior recommendations
prior_rec <- prior_elicitation_helper(
  expert_beliefs = expert_input,
  parameter_type = "continuous",
  method = "quantile",
  data_sample = rnorm(100, 0.3, 0.5),  # Optional: existing data
  visualize = TRUE
)

print(prior_rec)

# Use recommended prior
prior_rec$recommended_prior   # brms::prior object
prior_rec$alternatives        # Alternative priors for sensitivity
prior_rec$sensitivity_note    # Guidance

Parameter types: - "continuous": Normal, Student-t, log-normal - "discrete": Poisson, negative binomial - "proportion": Beta distribution


3.2 Prior Sensitivity Analysis

The prior_sensitivity() function assesses robustness to prior choice:

# Define alternative priors
prior_grid <- list(
  weak = set_prior("normal(0, 10)", class = "b"),
  moderate = set_prior("normal(0, 2)", class = "b"),
  strong = set_prior("normal(0, 0.5)", class = "b")
)

# Run sensitivity analysis
sens_result <- prior_sensitivity(
  model = fit,
  parameters = c("b_x1", "b_x2"),
  prior_grid = prior_grid,
  comparison_metric = "KL",  # or "Wasserstein", "overlap"
  plot = TRUE,
  n_draws = 2000
)

print(sens_result)
plot(sens_result)

# Check sensitivity
sens_result$sensitivity_metrics  # How much posteriors differ

Metrics: - KL divergence: Information-theoretic difference - Wasserstein distance: L1 distance between distributions - Overlap coefficient: Proportion of overlapping density

Interpretation: - Low sensitivity: Conclusions robust to prior choice ✓ - High sensitivity: Data is weak; prior strongly influences results


3.3 Prior Robustness Analysis

For comprehensive multi-dimensional sensitivity assessment:

robust_result <- prior_robustness(
  model = fit,
  prior_specifications = prior_grid,
  parameters = c("b_x1", "b_x2"),
  perturbation_direction = "expand",  # or "contract", "shift"
  dimensions = c(0.5, 1, 2, 4),       # Scaling factors
  comparison_metric = "KL",
  plot = TRUE
)

print(robust_result)

# Check robustness
robust_result$robustness_index       # Composite score (higher = more robust)
robust_result$concerning_parameters  # Parameters with low robustness
robust_result$recommendations        # What to do

Part 4: Model Comparison

4.1 Comprehensive Model Comparison Suite

The model_comparison_suite() compares multiple models using information criteria:

# Fit competing models
fit1 <- brm(y ~ x1, data = data, refresh = 0)
fit2 <- brm(y ~ x1 + x2, data = data, refresh = 0)
fit3 <- brm(y ~ x1 * x2, data = data, refresh = 0)

# Compare models
comparison <- model_comparison_suite(
  Model_1 = fit1,
  Model_2 = fit2,
  Model_3 = fit3,
  criterion = "all",  # LOO, WAIC, Bayes R²
  plot = TRUE,
  detailed = TRUE
)

print(comparison)
plot(comparison)

# Results
comparison$comparison_table   # All metrics
comparison$ic_differences     # ΔLOO, ΔWAIC, model weights
comparison$plots              # Visualization

Metrics explained: - LOO-ELPD: Leave-one-out expected log predictive density (higher is better) - WAIC: Widely Applicable Information Criterion (lower is better) - Bayes R²: Bayesian coefficient of determination - Model weights: Probability each model is best


4.2 Bayes Factor Comparison

For direct model comparison via Bayes Factors:

# Compare two models
bf_result <- bayes_factor_comparison(
  Model_A = fit1,
  Model_B = fit2,
  method = "bridge_sampling",  # or "waic"
  repetitions = 5,
  silent = TRUE
)

print(bf_result)

# Interpretation
bf_result$bayes_factor     # BF_{1,2}
bf_result$log_bf           # Log Bayes Factor
bf_result$interpretation   # Text interpretation

# For 3+ models: pairwise comparisons
bf_multi <- bayes_factor_comparison(fit1, fit2, fit3, method = "bridge_sampling")
bf_multi$pairwise_comparisons

Bayes Factor Interpretation (Kass & Raftery, 1995): - BF < 1: Evidence for Model 2 - 1-3: Weak evidence for Model 1 - 3-10: Moderate evidence - 10-30: Strong evidence - > 30: Very strong evidence


4.3 Predictive Performance Evaluation

The predictive_performance() function evaluates out-of-sample predictions:

# In-sample performance
perf_in <- predictive_performance(
  model = fit,
  newdata = NULL,           # NULL = use training data
  observed_y = data$y,
  metrics = "all",          # RMSE, MAE, Coverage, CRPS
  credible_level = 0.95,
  n_draws = NULL
)

print(perf_in)
plot(perf_in)

# Out-of-sample performance
test_data <- data.frame(x1 = rnorm(50), x2 = rnorm(50), y = rnorm(50))
perf_out <- predictive_performance(
  model = fit,
  newdata = test_data,
  observed_y = test_data$y,
  metrics = "all"
)

# Compare metrics
perf_in$point_metrics      # RMSE, MAE, Correlation
perf_in$interval_metrics   # Coverage, Interval Width
perf_in$proper_scores      # CRPS (lower is better)
perf_in$prediction_summary # Per-observation predictions

Key metrics: - RMSE/MAE: Prediction error (lower is better) - Coverage: % of observations within credible intervals (should ≈ 0.95) - CRPS: Continuous Ranked Probability Score (proper scoring rule)


Part 5: Utilities & Comprehensive Reporting

5.1 Extract Posterior (Unified Interface)

The extract_posterior_unified() provides consistent posterior extraction across packages:

# Extract as data frame
draws_df <- extract_posterior_unified(
  model = fit,
  parameters = c("b_x1", "b_x2", "sigma"),
  format = "draws_df",
  ndraws = 1000,
  include_warmup = FALSE,
  chains = NULL  # All chains
)

# Extract as matrix
draws_matrix <- extract_posterior_unified(fit, format = "draws_matrix")

# Extract as array (iterations × chains × parameters)
draws_array <- extract_posterior_unified(fit, format = "draws_array")

# Extract as named list
draws_list <- extract_posterior_unified(fit, format = "list")

Supported model types: - brmsfit (brms) - stanfit (rstan) - stanreg (rstanarm) - CmdStanMCMC (cmdstanr) - mcmc.list (coda)


5.2 Diagnostic Report Generation

The diagnostic_report() function creates comprehensive HTML/PDF reports:

# Generate comprehensive report
diagnostic_report(
  model = fit,
  output_file = "bayesian_model_diagnostics.pdf",
  output_format = "pdf",  # or "html", "docx"
  include_sections = c(
    "model_summary",
    "convergence",
    "posterior_summary",
    "recommendations"
  ),
  rhat_threshold = 1.01,
  ess_threshold = 0.1,
  open_report = TRUE  # Open automatically
)

Sections included: 1. Model Summary: Formula, family, data info 2. Convergence Diagnostics: R-hat, ESS, divergences 3. Posterior Summary: Parameter estimates with credible intervals 4. Recommendations: Actionable suggestions for model improvement


Complete Workflow Example

Here’s a complete Bayesian workflow using all major functions:

# ===== STEP 1: FIT MODEL =====
library(bayesDiagnostics)
library(brms)

# Simulate data
set.seed(123)
N <- 200
data <- data.frame(
  y = rnorm(N, mean = 3 + 2*rnorm(N) - 0.5*rnorm(N), sd = 1.5),
  x1 = rnorm(N),
  x2 = rnorm(N)
)

# Fit Bayesian model
fit <- brm(
  y ~ x1 + x2,
  data = data,
  prior = c(
    prior(normal(0, 5), class = "b"),
    prior(student_t(3, 0, 2.5), class = "sigma")
  ),
  chains = 4,
  iter = 2000,
  warmup = 1000,
  refresh = 0
)

# ===== STEP 2: CONVERGENCE DIAGNOSTICS =====
# Quick check
diag <- mcmc_diagnostics_summary(fit)
print(diag)

# Detailed ESS analysis
ess_diag <- effective_sample_size_diagnostics(fit, plot = TRUE)
plot(ess_diag)

# ===== STEP 3: POSTERIOR PREDICTIVE CHECKS =====
# Comprehensive PPC
ppc <- posterior_predictive_check(fit, observed_data = data$y, plot = TRUE)
print(ppc)

# Automated screening
auto_ppc <- automated_ppc(fit, data$y)
print(auto_ppc)

# LOO cross-validation
loo_ppc <- ppc_crossvalidation(fit, data$y)
plot(loo_ppc)

# ===== STEP 4: PRIOR SENSITIVITY =====
# Define alternative priors
prior_grid <- list(
  weak = set_prior("normal(0, 10)", class = "b"),
  moderate = set_prior("normal(0, 5)", class = "b"),
  strong = set_prior("normal(0, 1)", class = "b")
)

# Test sensitivity
sens <- prior_sensitivity(
  fit,
  parameters = c("b_x1", "b_x2"),
  prior_grid = prior_grid,
  plot = TRUE
)
print(sens)

# ===== STEP 5: MODEL COMPARISON =====
# Fit alternative models
fit_x1 <- brm(y ~ x1, data = data, refresh = 0)
fit_x1x2 <- fit
fit_interaction <- brm(y ~ x1 * x2, data = data, refresh = 0)

# Compare
comp <- model_comparison_suite(
  Linear = fit_x1,
  Additive = fit_x1x2,
  Interaction = fit_interaction,
  criterion = "all",
  plot = TRUE
)
print(comp)

# ===== STEP 6: PREDICTIVE PERFORMANCE =====
# Hold-out validation
test_idx <- sample(1:N, 40)
test_data <- data[test_idx, ]
train_data <- data[-test_idx, ]

fit_train <- update(fit, newdata = train_data, refresh = 0)

perf <- predictive_performance(
  fit_train,
  newdata = test_data,
  observed_y = test_data$y,
  metrics = "all"
)
print(perf)
plot(perf)

# ===== STEP 7: GENERATE REPORT =====
diagnostic_report(
  fit,
  output_file = "full_diagnostics.html",
  output_format = "html",
  include_sections = c("model_summary", "convergence", 
                       "posterior_summary", "recommendations")
)

Interpretation Guidelines

Convergence (MCMC)

  • R-hat < 1.01: Chains have converged
  • ESS > 400: Sufficient effective samples
  • No divergences: Sampler is behaving well
  • R-hat > 1.01: Run longer chains or reparameterize
  • Divergences > 0: Increase adapt_delta or reparameterize

Posterior Predictive Checks

  • p-values ≈ 0.5: Model captures test statistic
  • LOO-PIT ~Uniform: Well-calibrated predictions
  • Extreme p-values: Model misspecification

Prior Sensitivity

  • Low KL/Wasserstein: Robust to prior choice
  • High sensitivity: Need more data or informative priors

Model Comparison

  • ΔLOO < 4: Models are similar
  • ΔLOO > 10: Strong preference for best model
  • Coverage ≈ 0.95: Credible intervals well-calibrated

Summary Table: All Functions

Function Category Purpose
mcmc_diagnostics_summary() Convergence Quick MCMC diagnostic overview
effective_sample_size_diagnostics() Convergence Detailed ESS analysis with plots
hierarchical_convergence() Convergence Hierarchical model convergence
posterior_predictive_check() PPC Comprehensive PPC with test statistics
automated_ppc() PPC Automated screening across statistics
graphical_ppc() PPC Publication-quality PPC plots
ppc_crossvalidation() PPC LOO-PIT cross-validation checks
bayesian_p_values() PPC Custom test statistic p-values
prior_elicitation_helper() Prior Translate expert knowledge to priors
prior_sensitivity() Prior Assess robustness to prior choice
prior_robustness() Prior Multi-dimensional sensitivity analysis
model_comparison_suite() Comparison Compare models via LOO/WAIC/R²
bayes_factor_comparison() Comparison Bayes Factor model comparison
predictive_performance() Comparison Evaluate predictive accuracy
extract_posterior_unified() Utility Unified posterior extraction
diagnostic_report() Utility Generate comprehensive reports

Further Reading

Books

  • Gelman et al. (2020). Bayesian Data Analysis, 3rd ed.
  • McElreath (2020). Statistical Rethinking, 2nd ed.
  • Kruschke (2015). Doing Bayesian Data Analysis, 2nd ed.

Papers

  • Vehtari et al. (2021). “Rank-Normalization, Folding, and Localization: An Improved R-hat for Assessing Convergence of MCMC.” Bayesian Analysis.
  • Gabry et al. (2019). “Visualization in Bayesian workflow.” Journal of the Royal Statistical Society, Series A.
  • Vehtari et al. (2017). “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC.” Statistics and Computing.

Packages

  • brms: Bayesian Regression Models using Stan
  • bayesplot: Plotting for Bayesian models
  • posterior: Tools for working with posterior draws
  • loo: Efficient leave-one-out cross-validation

Getting Help

# Function documentation
?mcmc_diagnostics_summary
?posterior_predictive_check
?prior_sensitivity

# Package vignettes
browseVignettes("bayesDiagnostics")

# Report issues
# https://github.com/yourusername/bayesDiagnostics/issues

End of Vignette

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.