Model Evaluation and Comparison

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction

Evaluating clinical prediction models requires assessing discrimination, calibration, and clinical utility. This vignette demonstrates a comprehensive evaluation workflow:

Model Comparison — Compare multiple models using discrimination and calibration metrics
Time-Dependent ROC — Evaluate survival prediction models at specific time points
C-Index — Global discrimination measure for survival models
Variable Importance — Identify key predictive features

Preparing Data

We’ll use the NCCTG Lung Cancer dataset and fit three logistic regression models of increasing complexity:

data(cancer, package = "survival")
cancer$dead <- cancer$status == 2
cancer$event <- ifelse(cancer$status == 2, 1, 0)

cancer_clean <- cancer %>%
  mutate(sex = factor(sex, labels = c("Male", "Female"))) %>%
  na.omit()

# Fit three models with increasing complexity
cancer_clean$pred_model1 <- predict(
  glm(dead ~ age + sex + ph.karno, data = cancer_clean, family = binomial),
  type = "response"
)
cancer_clean$pred_model2 <- predict(
  glm(dead ~ age + sex + ph.karno + pat.karno, data = cancer_clean, family = binomial),
  type = "response"
)
cancer_clean$pred_model3 <- predict(
  glm(dead ~ age + sex + ph.karno + wt.loss, data = cancer_clean, family = binomial),
  type = "response"
)

knitr::kable(head(cancer_clean[, c("dead", "pred_model1", "pred_model2", "pred_model3")]),
  caption = "Model Predictions vs Actual Outcomes (First 6 Rows)"
)

Model Predictions vs Actual Outcomes (First 6 Rows)
	dead	pred_model1	pred_model2	pred_model3
2	TRUE	0.7868731	0.7661243	0.7858287
4	TRUE	0.7436147	0.8215218	0.7434448
6	FALSE	0.9260244	0.8954047	0.9273628
7	TRUE	0.7052554	0.7566565	0.7049340
8	TRUE	0.7704063	0.7279677	0.7724185
9	TRUE	0.8207969	0.8021612	0.8203195

Model Comparison

classif_model_compare() provides a comprehensive comparison using multiple metrics (AUC, Accuracy, Sensitivity, Specificity, PPV, NPV, PRAUC, Brier Score) and generates ROC, PR, calibration, and DCA plots:

model_comparison <- classif_model_compare(
  data = cancer_clean,
  target_var = "dead",
  model_names = c("pred_model1", "pred_model2", "pred_model3"),
  save_output = FALSE
)

knitr::kable(model_comparison$metric_table, caption = "Model Performance Comparison")

Model Performance Comparison
	Model	AUC	PRAUC	Accuracy	Sensitivity	Specificity	Pos Pred Value	Neg Pred Value	F1	Kappa	Brier	cutoff	Youden	HosLem
3	pred_model3	0.684 (0.594, 0.774)	0.820	0.581	0.50	0.787	0.857	0.381	0.632	0.217	0.185	0.777	0.287	0.985
1	pred_model1	0.684 (0.594, 0.774)	0.817	0.557	0.45	0.830	0.871	0.371	0.593	0.203	0.185	0.786	0.280	0.967
2	pred_model2	0.684 (0.590, 0.777)	0.821	0.605	0.55	0.745	0.846	0.393	0.667	0.232	0.182	0.765	0.295	0.353

Visualization

model_comparison$roc_plot

model_comparison$pr_plot

model_comparison$calibration_plot

model_comparison$dca_plot

Time-Dependent ROC Analysis

For survival outcomes, time-dependent ROC accounts for the timing of events. First, we fit a Cox proportional hazards model and obtain risk scores:

# Fit Cox model for survival prediction
cox_model <- coxph(Surv(time, event) ~ age + sex + ph.karno, data = cancer_clean)
cancer_clean$risk_score <- predict(cox_model, type = "risk")

Now we can evaluate the model using time-dependent ROC at specific time points:

time_roc_result <- time_roc_plot(
  data = cancer_clean,
  event_var = "event",
  time_var = "time",
  marker_var = "risk_score",
  times = c(200, 365),
  time_unit = "days",
  save_plot = FALSE
)

time_roc_result
#> $time_roc
#> Time-dependent-Roc curve estimated using IPCW  (n=167, without competing risks). 
#>       Cases Survivors Censored AUC (%)
#> t=200    48       111        8   67.32
#> t=365    87        49       31   60.85
#> 
#> Method used for estimating IPCW:marginal 
#> 
#> Total computation time : 0.01  secs.
#> 
#> $plot

C-Index

The C-index (concordance index) measures discrimination for survival models. Interpretation: 0.5 = no discrimination, 0.7-0.8 = acceptable, >0.8 = excellent.

c_index <- calc_cindex(cancer_clean, "time", "event", "risk_score")
c_index
#>   C Index 
#> 0.6243374

Variable Importance

The variable importance list could be obtained from the different models like random forest, gradient boosting machine, etc. Here, we assume we have the variable importance scores from some random model.

importance_scores <- c(
  Age = 0.25, Sex = 0.15, Karnofsky = 0.30,
  WeightLoss = 0.12, Calories = 0.08, PatKarnofsky = 0.10
)

importance_plot(x = importance_scores, x_lab = "Variable Importance")

Show only top variables with custom styling:

importance_plot(
  x = importance_scores,
  x_lab = "Relative Importance",
  top_n = 4,
  color = "steelblue"
)

Summary

Key Functions

classif_model_compare(): Comprehensive model comparison with ROC, PR, calibration, and DCA plots
time_roc_plot(): Time-dependent ROC analysis for survival outcomes
calc_cindex(): Calculate concordance index for survival models
importance_plot(): Visualize variable importance

Evaluation Checklist

Discrimination: AUC (binary) or C-index (survival), sensitivity/specificity at clinical thresholds
Calibration: Calibration plots, Brier score
Clinical utility: Decision curve analysis, net benefit at relevant thresholds
External validation: Performance in independent populations

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.