Transportability and Policy Learning

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

This vignette demonstrates two advanced applications of the causaldef framework: 1. Transportability: Generalizing experimental results to a new target population. 2. Policy Learning Bounds: Quantifying the limits of decision-making under confounding.

We utilize classical datasets (Lalonde NSW and Right Heart Catheterization) to illustrate these concepts.

1. Transportability: Lalonde’s Job Training

A common challenge in causal inference is external validity: Can we apply the results of a Randomized Controlled Trial (RCT) to a diferent target population?

We use the Lalonde dataset to simulate a transportability problem. * Source Population (\(S=1\)): The NSW experimental participants (typically disjoint from the general population). * Target Population (\(S=0\)): The CPS comparison group (representative of the broader population).

data("nsw_benchmark")

# Define Source: Experimental Sample
source_data <- subset(nsw_benchmark, sample_id %in% c("nsw_treated", "nsw_control"))

# Define Target: CPS Control Group (Broader population)
target_data <- subset(nsw_benchmark, sample_id == "cps_control")

# Covariates available for transport
transport_vars <- c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "re75")

# Comparison of demographics
print(summary(source_data[, c("age", "education", "re74")]))
#>       age          education         re74        
#>  Min.   :17.00   Min.   : 3.0   Min.   :    0.0  
#>  1st Qu.:20.00   1st Qu.: 9.0   1st Qu.:    0.0  
#>  Median :24.00   Median :10.0   Median :    0.0  
#>  Mean   :25.37   Mean   :10.2   Mean   : 2102.3  
#>  3rd Qu.:28.00   3rd Qu.:11.0   3rd Qu.:  824.4  
#>  Max.   :55.00   Max.   :16.0   Max.   :39570.7
print(summary(target_data[, c("age", "education", "re74")]))
#>       age          education          re74      
#>  Min.   :16.00   Min.   : 0.00   Min.   :    0  
#>  1st Qu.:24.00   1st Qu.:11.00   1st Qu.: 4403  
#>  Median :31.00   Median :12.00   Median :15124  
#>  Mean   :33.23   Mean   :12.03   Mean   :14017  
#>  3rd Qu.:42.00   3rd Qu.:13.00   3rd Qu.:23584  
#>  Max.   :55.00   Max.   :18.00   Max.   :25862

The target population (CPS) is significantly wealthier (re74 mean is much higher) and slightly older. We want to know: What would be the effect of job training if applied to the CPS population?

Transport Deficiency

We calculate the Transport Deficiency \(\delta(E_S, E_T)\). This measures how much information is lost due to the distributional shift between Source and Target.

# Create causal specification for the SOURCE
source_spec <- causal_spec(
  data = source_data,
  treatment = "treat",
  outcome = "re78",
  covariates = transport_vars
)
#> ✔ Created causal specification: n=445, 8 covariate(s)

# Compute Transport Deficiency
trans_def <- transport_deficiency(
  source_spec,
  target_data = target_data,
  transport_vars = transport_vars,
  method = "iptw",
  n_boot = 50 # Low for vignette speed
)
#> ✖ Transport proxy: 0.995 (ESS: 13.3%)
#> ℹ Severe distribution shift; transport may be unreliable

print(trans_def)
#> 
#> ── Transport Diagnostic Analysis ───────────────────────────────────────────────
#> Method: "iptw"
#> Source n: 445 | Target n: 15992
#> 
#> ── Transport Diagnostic ──
#> 
#>   transport proxy:       0.9954
#>   Standard error:        0.0250
#>   95% CI:               [1.1735, 1.2776]
#>   Effective sample size: 59.0 (13.3% of source)
#>   Note: this is a heuristic transport-risk proxy, not an exact transport deficiency.
#> ── Covariate Shift ──
#>            variable shift_metric severity
#> age             age   0.84595761   severe
#> education education   0.76555133   severe
#> black         black   2.36239417   severe
#> hispanic   hispanic   0.05755963      low
#> married     married   1.30665468   severe
#> nodegree   nodegree   1.11659926   severe
#> re74           re74   1.53592792   severe
#> re75           re75   1.77276154   severe
#> ── Effect Estimates ──
#>   ATE (source):     1676.3426
#>   ATE (transport):  778.8984
plot(trans_def, type = "shift")

Interpretation: * Covariate Shift: The plot shows which variables differ most (likely re74 and re75). * Transported ATE: The estimated effect in the target population. * Deficiency: A low delta implies we can reliably transport the result. A high delta warns that the populations are too distinct (lack of overlap or extreme weights).

2. Policy Learning Bounds: RHC

In Policy Learning, we seek an optimal treatment rule \(\pi(X)\) to maximize utility. However, with observational data, our estimate of a policy’s value is biased by confounding.

We use the Right Heart Catheterization (RHC) dataset to evaluate a risk-based policy. * Decision: Treat with RHC? * Outcome: 30-day Mortality (lower is better). * Policy: “Treat only high-risk patients” (e.g., APACHE score > 50).

data("rhc")

# Preprocessing
if (is.factor(rhc$swang1)) rhc$treat <- as.numeric(rhc$swang1) - 1 else rhc$treat <- rhc$swang1
if (is.factor(rhc$dth30)) rhc$outcome <- as.numeric(rhc$dth30) - 1 else rhc$outcome <- rhc$dth30

# Variables for adjustment
covariates <- c("age", "sex", "race", "aps1", "cat1") 

spec_rhc <- causal_spec(
  data = rhc,
  treatment = "treat",
  outcome = "outcome",
  covariates = covariates
)
#> ✔ Created causal specification: n=5735, 5 covariate(s)

Policy Evaluation

We compare two policies: 1. Treat All: Everyone gets RHC. 2. Risk-Based: Treat only if APACHE III score (aps1) > 50.

We estimate the observational value of these policies using IPW.

# Estimate propensity scores for adjustment
ps_model <- glm(treat ~ age + sex + race + aps1 + cat1, data = rhc, family = binomial)
rhc$ps <- predict(ps_model, type = "response")

# Define policies
policy_all <- rep(1, nrow(rhc))
policy_risk <- ifelse(rhc$aps1 > 50, 1, 0)

# Estimate Value (Inverse Propensity Weighted)
# Value = Mean of Y under policy. We want to MINIMIZE mortality.
# Equivalent to Maximizing Survival (1 - Y).
# Let's compute expected mortality.

get_policy_value <- function(policy, treat, outcome, ps) {
  # IPW estimator for policy value
  # Weight = I(A = \pi(X)) / P(A|X)
  w <- (treat == policy) / ifelse(policy == 1, ps, 1 - ps)
  mean(w * outcome) # Expected Mortality
}

val_all <- get_policy_value(policy_all, rhc$treat, rhc$outcome, rhc$ps)
val_risk <- get_policy_value(policy_risk, rhc$treat, rhc$outcome, rhc$ps)

cat("Estimated Mortality (Treat All):", round(val_all, 3), "\n")
#> Estimated Mortality (Treat All): 0.352
cat("Estimated Mortality (Risk-Based):", round(val_risk, 3), "\n")
#> Estimated Mortality (Risk-Based): 0.358

The Safety Floor

Even if the Risk-Based policy looks better, can we trust it? The Safety Floor tells us the worst-case error in our value estimate due to unmeasured confounding.

# 1. Estimate Deficiency of the dataset
defom <- estimate_deficiency(spec_rhc, methods = "iptw", n_boot = 0)
#> ℹ Estimating deficiency: iptw
delta <- defom$estimates["iptw"]

# 2. Compute Safety Floor
# Utility range is [0, 1] (Mortality 0 or 1)
bounds <- policy_regret_bound(defom, utility_range = c(0, 1), method = "iptw")
#> ℹ Transfer penalty: 0.0184 (delta = 0.0184)

print(bounds)
#> 
#> -- Policy Regret Bounds -------------------------------------------------
#> 
#> * Deficiency delta: 0.0184 
#> * Delta mode: point 
#> * Delta method: iptw 
#> * Delta selection: pre-specified method 
#> * Utility range: [0, 1]
#> * Transfer penalty: 0.0184 (additive regret upper bound)
#> * Minimax floor: 0.0092 (worst-case lower bound)
#> 
#> Note: this is a plug-in bound using a deficiency proxy rather than an identified exact deficiency.
#> 
#> Interpretation: Transfer penalty is 1.8 % of utility range given delta

Conclusion: * The Safety Floor represents the irreducible uncertainty. * If the difference between Treat All and Risk-Based is smaller than the safety floor, we cannot be confident the new policy is actually superior to the baseline, regardless of sample size. * This illustrates the fundamental limit of offline policy learning from observational data.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.