Classical Benchmarks: Lalonde and RHC

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

This vignette demonstrates the causaldef framework on two famous datasets in causal inference: 1. Lalonde’s Job Training Data: Validating deficiency against an experimental application. 2. Right Heart Catheterization (RHC): Handling high-dimensional confounding and uncertainty quantification.

1. Lalonde’s NSW Benchmark

The Lalonde dataset allows us to verify our methods because it contains both an experimental control group and observational comparison groups.

We define the “True” experiment (\(E_{target}\)) using the randomized data, and the “Observational” experiment (\(E_{obs}\)) using the CPS controls.

Data Preparation

data("nsw_benchmark")
head(nsw_benchmark)
#>   treat age education black hispanic married nodegree re74 re75       re78
#> 1     1  37        11     1        0       1        1    0    0  9930.0459
#> 2     1  22         9     0        1       0        1    0    0  3595.8940
#> 3     1  30        12     1        0       0        0    0    0 24909.4492
#> 4     1  27        11     1        0       0        1    0    0  7506.1460
#> 5     1  33         8     1        0       0        1    0    0   289.7899
#> 6     1  22         9     1        0       0        1    0    0  4056.4939
#>     sample_id
#> 1 nsw_treated
#> 2 nsw_treated
#> 3 nsw_treated
#> 4 nsw_treated
#> 5 nsw_treated
#> 6 nsw_treated

# 1. The Experimental Benchmark (Gold Standard)
nsw_exp <- subset(nsw_benchmark, sample_id %in% c("nsw_treated", "nsw_control"))

# True Experimental Estimate (Unadjusted difference in means is valid due to randomization)
true_att <- mean(nsw_exp$re78[nsw_exp$treat == 1]) - mean(nsw_exp$re78[nsw_exp$treat == 0])
cat("True Experimental ATT:", round(true_att, 2), "\n")
#> True Experimental ATT: 1794.34

# 2. The Observational Challenge (NSW Treated + CPS Control)
nsw_obs <- subset(nsw_benchmark, sample_id %in% c("nsw_treated", "cps_control"))

# Naive Observational Estimate (Unadjusted)
naive_est <- mean(nsw_obs$re78[nsw_obs$treat == 1]) - mean(nsw_obs$re78[nsw_obs$treat == 0])
cat("Naive Observational Estimate:", round(naive_est, 2), "\n")
#> Naive Observational Estimate: -8497.52
cat("Bias:", round(naive_est - true_att, 2), "\n")
#> Bias: -10291.86

The massive bias (negative effect instead of positive) confirms the difficulty of this problem.

Deficiency Estimation

We now calculate a deficiency proxy \(\delta(E_{obs}, E_{do})\) using the available covariates.

covariates <- c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "re75")

# Specification
spec_lalonde <- causal_spec(
  data = nsw_obs,
  treatment = "treat",
  outcome = "re78",
  covariates = covariates
)
#> Warning: 14510 observations have extreme propensity scores
#> ✔ Created causal specification: n=16177, 8 covariate(s)

# Estimate deficiency using Propensity Score Weighting (IPTW)
res_lalonde <- estimate_deficiency(
  spec_lalonde,
  methods = c("unadjusted", "iptw"),
  n_boot = 50 # Kept low for vignette speed
)
#> ℹ Estimating deficiency: unadjusted
#> ℹ Estimating deficiency: iptw

print(res_lalonde)
#> 
#> -- Deficiency Proxy Estimates (PS-TV) ------
#> 
#>      Method  Delta     SE               CI            Quality
#>  unadjusted 0.7985 0.0266  [0.7408, 0.844] Insufficient (Red)
#>        iptw 0.0270 0.0046 [0.0213, 0.0382]  Excellent (Green)
#> Note: delta is a propensity-score TV proxy (overlap/balance diagnostic).
#> 
#> Best method: iptw (delta = 0.027 )
plot(res_lalonde)

The proxy \(\delta\) summarizes the distance between the reweighted observational distribution and the target randomized experiment. A lower \(\delta\) indicates that the reweighting more closely reconstructed the experimental conditions under the PS-TV diagnostic.

2. Right Heart Catheterization (RHC)

This dataset involves high-dimensional confounding (50+ covariates). We use it to demonstrate the Confounding Frontier and Policy Regret Bounds.

Data Setup

data("rhc")

# Convert treatment to binary (0/1) for causaldef
# Assuming 'swang1' is the treatment column, usually "RHC" vs "No RHC"
# Check levels first (simulated check here, dataset structure assumed from documentation)
if (is.factor(rhc$swang1)) {
  rhc$treat_bin <- as.numeric(rhc$swang1) - 1 # Assuming factor levels order
} else {
  rhc$treat_bin <- rhc$swang1
}

# Outcome: 30-day survival (inverse of dth30) or just standard outcome
# Let's say outcome is dth30 (binary).
if (is.factor(rhc$dth30)) {
  rhc$outcome_bin <- as.numeric(rhc$dth30) - 1
} else {
  rhc$outcome_bin <- rhc$dth30
}

# Select a subset of covariates for demonstration (to keep it fast)
# Real analysis would use all 50+.
rhc_covars <- c("age", "sex", "race", "aps1", "cat1") 
# Note: 'aps1' is APACHE III score


spec_rhc <- causal_spec(
  data = rhc,
  treatment = "treat_bin",
  outcome = "outcome_bin",
  covariates = rhc_covars
)
#> ✔ Created causal specification: n=5735, 5 covariate(s)

Quantifying the Information Gap

res_rhc <- estimate_deficiency(spec_rhc, methods = "iptw", n_boot = 0)
#> ℹ Estimating deficiency: iptw
print(res_rhc)
#> 
#> -- Deficiency Proxy Estimates (PS-TV) ------
#> 
#>  Method  Delta SE           Quality
#>    iptw 0.0184  - Excellent (Green)
#> Note: delta is a propensity-score TV proxy (overlap/balance diagnostic).
#> 
#> Best method: iptw (delta = 0.0184 )

Policy Regret Bounds

The “Safety Floor” tells us the minimum regret we risk by making a decision based on this imperfect observational evidence.

# Utility: Let's say preventing death has utility 1, death has utility 0.
# The outcome is death (1) or survival (0). 
# We want to minimize outcome (death). 
# This is equivalent to utility range [0, 1].

bounds_rhc <- policy_regret_bound(res_rhc, utility_range = c(0, 1), method = "iptw")
#> ℹ Transfer penalty: 0.0184 (delta = 0.0184)
print(bounds_rhc)
#> 
#> -- Policy Regret Bounds -------------------------------------------------
#> 
#> * Deficiency delta: 0.0184 
#> * Delta mode: point 
#> * Delta method: iptw 
#> * Delta selection: pre-specified method 
#> * Utility range: [0, 1]
#> * Transfer penalty: 0.0184 (additive regret upper bound)
#> * Minimax floor: 0.0092 (worst-case lower bound)
#> 
#> Note: this is a plug-in bound using a deficiency proxy rather than an identified exact deficiency.
#> 
#> Interpretation: Transfer penalty is 1.8 % of utility range given delta

The result typically shows a low safety floor (e.g., < 0.05), suggesting that the observational findings are actionable unless the decision hinges on a very small utility difference.

Confounding Frontier

Sensitivity analysis: If we missed a confounder \(U\) correlated with treatment by \(\alpha\) and outcome by \(\gamma\), how much would our deficiency increase?

frontier <- confounding_frontier(spec_rhc, grid_size = 30)
#> ℹ Computing benchmarks for observed covariates...
#> ✔ Computed confounding frontier: 30x30 grid
plot(frontier)

The plot shows the “safe” region (low confounding) versus “unsafe” region. If we suspect unmeasured confounders (like specific genetic factors) have strength \(|\alpha \gamma| > 0.1\), the yellow/red zones indicate high deficiency.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.