The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction to CausalSpline: Nonlinear Causal Dose-Response Estimation

Your Name

2026-03-21

Why nonlinear causal effects?

Most causal inference software assumes the treatment effect enters linearly:

\[Y = \beta_0 + \beta_1 T + \gamma X + \varepsilon\]

But many real-world relationships are genuinely nonlinear:

CausalSpline replaces the linear \(\beta_1 T\) with a spline \(f(T)\):

\[Y = \beta_0 + f(T) + \gamma X + \varepsilon, \quad E[Y(t)] = \beta_0 + f(t)\]

under standard unconfoundedness and positivity assumptions.


Installation

# From CRAN (once published)
install.packages("CausalSpline")

# Development version from GitHub
remotes::install_github("yourgithub/CausalSpline")

Quick start

library(CausalSpline)

1. Simulate data with a threshold effect

set.seed(42)
dat <- simulate_dose_response(n = 600, dgp = "threshold", confounding = 0.6)
head(dat)
#>          Y        T         X1         X2 X3 true_effect
#> 1 4.676083 5.251499  1.3709584 -0.2484829  0    4.502997
#> 2 3.505122 4.493596 -0.5646982  0.4223204  0    2.987192
#> 3 2.676237 4.328187  0.3631284  0.9876533  1    2.656375
#> 4 5.245936 5.958644  0.6328626  0.8355682  1    5.917287
#> 5 3.958747 4.996158  0.4042683 -0.6605219  1    3.992315
#> 6 3.086915 4.992211 -0.1061245  1.5640695  1    3.984422

The true dose-response is flat below \(T = 3\), then rises linearly:

plot(dat$T, dat$Y, pch = 16, col = rgb(0, 0, 0, 0.2),
     xlab = "Treatment T", ylab = "Outcome Y",
     main = "Observed data (confounded)")
lines(sort(dat$T), dat$true_effect[order(dat$T)],
      col = "red", lwd = 2)
legend("topleft", legend = "True f(T)", col = "red", lwd = 2)
True vs observed relationship
True vs observed relationship

2. Fit with IPW

fit_ipw <- causal_spline(
  Y ~ T | X1 + X2 + X3,
  data       = dat,
  method     = "ipw",
  df_exposure = 5,
  eval_grid  = 100
)
summary(fit_ipw)
#> === CausalSpline Summary ===
#> 
#> Call:
#>   causal_spline(formula = Y ~ T | X1 + X2 + X3, data = dat, method = "ipw", 
#>     df_exposure = 5, eval_grid = 100)
#> 
#> Estimation method  : IPW 
#> Spline type        : ns 
#> df (exposure)      : 5 
#> Interior knots     : 4.278 4.887 5.349 5.965 
#> 
#> Treatment variable:
#>   n = 600 
#>   Mean = 5.133   SD = 0.994 
#>   Range = [ 1.928 , 8.586 ]
#> 
#> IPW diagnostics:
#>   ESS = 406 / n = 600 ( 67.7 %)
#>   Weight range = [ 0.119 , 4.978 ]
#> 
#> Dose-response curve (selected percentiles):
#>      t estimate     se   lower   upper
#>  1.928  -1.0895 0.6925 -2.4468  0.2678
#>  3.072   0.5492 0.1771  0.2021  0.8963
#>  4.147   2.3213 0.1231  2.0800  2.5626
#>  5.223   4.7107 0.0928  4.5289  4.8926
#>  6.366   6.7306 0.1498  6.4370  7.0243
#>  7.510   8.9276 0.5554  7.8390 10.0162
#>  8.586  11.2355 1.5384  8.2204 14.2507
# Build true curve data frame for overlay
truth_df <- data.frame(
  t           = dat$T,
  true_effect = dat$true_effect
)
plot(fit_ipw, truth = truth_df)
IPW estimated dose-response with 95% CI
IPW estimated dose-response with 95% CI

3. Fit with G-computation

fit_gc <- causal_spline(
  Y ~ T | X1 + X2 + X3,
  data        = dat,
  method      = "gcomp",
  df_exposure = 5
)
plot(fit_gc, truth = truth_df,
     title = "G-computation — Threshold DGP")

4. Check overlap (positivity)

ov <- check_overlap(dat$T, fit_ipw$weights, plot = TRUE)
cat("ESS:", round(ov$ess), "/ n =", nrow(dat), "\n")
#> ESS: 406 / n = 600
ov$plot


Comparing DGPs

dgps <- c("threshold", "diminishing", "nonmonotone", "sinusoidal")
plots <- lapply(dgps, function(d) {
  dat_d <- simulate_dose_response(500, dgp = d, seed = 1)
  fit_d <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat_d,
                          method = "ipw", df_exposure = 5,
                          verbose = FALSE)
  truth_d <- data.frame(t = dat_d$T, true_effect = dat_d$true_effect)
  plot(fit_d, truth = truth_d,
       title = paste("DGP:", d),
       rug = FALSE)
})

# Combine with patchwork (if available) or print individually
for (p in plots) print(p)


Choosing degrees of freedom

The df_exposure argument controls spline flexibility. Too few df = underfitting; too many = high variance. As a guide:

Shape Recommended df
Linear / simple trend 3
One bend / threshold 4–5
Inverted-U / hump 5–6
Oscillatory 7–10

You can use AIC/BIC on the outcome model or cross-validation for selection.


Methods summary

Argument Consistent if …
method = "ipw" GPS model correctly specified
method = "gcomp" Outcome model correctly specified
method = "dr" At least one of the two models is correct

References

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.