The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
bayesmsm
for longitudinal data
without right-censoringThe bayesmsm
package enables easy implementation of
the Bayesian marginal structural models (BMSMs) for longitudinal data.
The methodology of BMSMs can be divided into 2 estimation steps:
For Step 1, we estimate treatment weights \(w_{ij}\) using posterior samples of the \(\alpha\) and \(\beta\) via fitting a series of logistic regressions in a Bayesian framework. The package is able to handle longitudinal data without and with right-censoring. For Step 2, \(P_n(v_{ij})\) is estimated via non-parametric Bayesian bootstrap with \(Dir(1,...,1)\) sampling weights.
The main functions in this package include:
bayesweight
: Calculates Bayesian weights for
subject-specific treatment effects.bayesweight_cen
: Calculates Bayesian weights for
subject-specific treatment effects with right-censored data.bayesmsm
: Estimates marginal structural models using
the calculated Bayesian weights.plot_ATE
: Plots the estimated Average Treatment Effect
(ATE).plot_APO
: Plots the estimated Average Potential Outcome
(APO).plot_est_box
: Plots the distribution of estimated
treatment effects.summary_bayesmsm
: Summarizes the model results from
bayesmsm
.simData
: Generates synthetic longitudinal data with
optional right-censoringInstallation
devtools
package to install it directly from GitHub.In this vignette, we demonstrate how to simulate a clean longitudinal
dataset without censoring using simData()
. We generate data
for 200 individuals observed over 2
visits. At each visit, two normally distributed covariates
(L1_j
, L2_j
) and a binary treatment assignment
(A_j
) are created. No censoring is applied, so all
covariates, treatments, and the final outcome Y are fully
observed. The outcome Y is continuous, generated from a linear
model on the covariate and treatment history.
Variable | Description |
---|---|
L1_1 , L2_1 |
Baseline covariates (continuous) |
L1_2 , L2_2 |
Time-varying covariates at visit 2 |
A1 , A2 |
Binary treatments at visits 1 and 2 |
Y |
Continuous end-of-study outcome |
# 1) Define coefficient lists for 2 visits
amodel <- list(
# Visit 1: logit P(A1=1) = -0.3 + 0.4*L1_1 - 0.2*L2_1
c("(Intercept)" = -0.3, "L1_1" = 0.4, "L2_1" = -0.2),
# Visit 2: logit P(A2=1) = -0.1 + 0.3*L1_2 - 0.1*L2_2 + 0.5*A_prev
c("(Intercept)" = -0.1, "L1_2" = 0.3, "L2_2" = -0.1, "A_prev" = 0.5)
)
# 2) Define binary outcome model: logistic on treatments and last covariates
ymodel <- c(
"(Intercept)" = -0.8,
"A1" = 0.2,
"A2" = 0.4,
"L1_2" = 0.3,
"L2_2" = -0.3
)
# 3) Load package and simulate data without censoring
simdat <- simData(
n = 100,
n_visits = 2,
covariate_counts = c(2, 2),
amodel = amodel,
ymodel = ymodel,
y_type = "binary",
right_censor = FALSE,
seed = 123
)
# 4) Inspect first rows
head(simdat)
#> L1_1 L2_1 A1 L1_2 L2_2 A2 Y
#> 1 -0.56047565 -0.71040656 1 -0.37560287 1.0149432 0 0
#> 2 -0.23017749 0.25688371 0 -0.56187636 -1.9927485 1 0
#> 3 1.55870831 -0.24669188 0 -0.34391723 -0.4272793 1 0
#> 4 0.07050839 -0.34754260 1 0.09049665 0.1166373 1 1
#> 5 0.12928774 -0.95161857 0 1.59850877 -0.8932076 0 1
#> 6 1.71506499 -0.04502772 1 -0.08856511 0.3339029 1 0
\[ ATE = E(Y \mid A_1 = 1, A_2 = 1) - E(Y \mid A_1 = 0, A_2 = 0) \]
bayesweight
bayesweight
to
run JAGS and calculate the weights.
n.chains = 1
.
Parallel MCMC requires at least 2 chains because computing is running on
1 core per chain, and we recommend using at most 2 chains less than the
number of available cores on your computer.trtmodel.list
: A list of formulas corresponding to each
time point with the time-specific treatment variable on the left hand
side and pre-treatment covariates to be balanced on the right hand side.
Interactions and functions of covariates are allowed.data
: The dataset containing all the variables
specified in trtmodel.list.n.chains
: Number of MCMC chains to run. For
non-parallel execution, this should be set to 1. For parallel execution,
it requires at least 2 chains.n.iter
: Total number of iterations for each chain
(including burn-in).n.burnin
: Number of iterations to discard at the
beginning of the simulation (burn-in).n.thin
: Thinning rate for the MCMC sampler.seed
: Seed to ensure reproducibility.parallel
: Logical flag indicating whether to run the
MCMC chains in parallel. Default is TRUE.weights <- bayesweight(
trtmodel.list = list(
A1 ~ L1_1 + L2_1,
A2 ~ L1_2 + L2_2 + A1),
data = simdat,
n.chains = 1,
n.iter = 200,
n.burnin = 100,
n.thin = 1,
seed = 890123,
parallel = FALSE)
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 400
#> Unobserved stochastic nodes: 10
#> Total graph size: 1824
#>
#> Initializing model
summary(weights)
#> Length Class Mode
#> weights 100 -none- numeric
#> model_string 1 -none- character
weights
: The calculated weights for subject-specific
treatment effects, computed by taking the average of the weights across
all MCMC iterations.model_string
: A character of the JAGS model based on
input of the argument trtmodel.list.model_string
from the above function output:cat(weights$model_string)
#> model{
#> #N = nobs
#> for(i in 1:N){
#>
#> # visit 1;
#> # marginal treatment assignment model, visit 1;
#> A1s[i] ~ dbern(pA1s[i])
#> pA1s[i] <- ilogit(bs10)
#>
#> # conditional treatment assignment model, visit 1;
#> A1[i] ~ dbern(pA1[i])
#> pA1[i] <- ilogit(b10 + b11*L1_1[i] + b12*L2_1[i])
#>
#> # visit 2;
#> # marginal treatment assignment model, visit 2;
#> A2s[i] ~ dbern(pA2s[i])
#> pA2s[i] <- ilogit(bs20 + bs21*A1s[i])
#>
#> # conditional treatment assignment model, visit 2;
#> A2[i] ~ dbern(pA2[i])
#> pA2[i] <- ilogit(b20 + b21*L1_2[i] + b22*L2_2[i] + b23*A1[i])
#>
#> # export quantity in full posterior specification;
#> w[i] <- (pA1s[i]*pA2s[i])/(pA1[i]*pA2[i])
#> }
#>
#> #prior;
#> bs10~dnorm(0,.01)
#> b10~dnorm(0,.01)
#> b11~dnorm(0,.01)
#> b12~dnorm(0,.01)
#> bs20~dnorm(0,.01)
#> bs21~dnorm(0,.01)
#> b20~dnorm(0,.01)
#> b21~dnorm(0,.01)
#> b22~dnorm(0,.01)
#> b23~dnorm(0,.01)
#> }
bayesmsm
The function bayesmsm
estimates causal effect of
time-varying treatments. It uses subject-specific treatment assignment
weights weights calculated using bayesweight
, and
performs Bayesian non-parametric bootstrap to estimate the causal
parameters.
ymodel
: A formula representing the outcome model, which
can include interactions and functions of covariates.nvisit
: Specifies the number of visits or time points
considered in the model.reference
: The baseline or reference intervention
across all visits, typically represented by a vector of zeros indicating
no treatment (default is a vector of all zeros).comparator
: The comparison intervention across all
visits, typically represented by a vector of ones indicating full
treatment (default is a vector of all ones).treatment_effect_type
: A character string specifying
the type of treatment effect to estimate. Options are “sq” for
sequential treatment effects, which estimates effects for specific
treatment sequences across visits, and “cum” for cumulative treatment
effects, which assumes a single cumulative treatment variable
representing the total exposure. The default is “sq”.family
: Specifies the outcome distribution family; use
“gaussian” for continuous outcomes or “binomial” for binary outcomes
(default is “gaussian”).data
: The dataset containing all variables required for
the model.wmean
: A vector of treatment assignment weights.
Default is a vector of ones, implying equal weighting.nboot
: The number of bootstrap iterations to perform
for estimating the uncertainty around the causal estimates.optim_method
: The optimization method used to find the
best parameters in the model (default is ‘BFGS’).seed
: A seed value to ensure reproducibility of
results.parallel
: A logical flag indicating whether to perform
computations in parallel (default is TRUE).ncore
: The number of cores to use for parallel
computation (default is 4).model <- bayesmsm(ymodel = Y ~ A1 + A2,
nvisit = 2,
reference = c(rep(0,2)),
comparator = c(rep(1,2)),
family = "binomial",
data = simdat,
wmean = weights$weights,
nboot = 50,
optim_method = "BFGS",
parallel = TRUE,
seed = 890123,
ncore = 2)
str(model)
#> List of 12
#> $ RD_mean : num 0.147
#> $ RR_mean : num 2.05
#> $ OR_mean : num 2.71
#> $ RD_sd : num 0.113
#> $ RR_sd : num 0.907
#> $ OR_sd : num 1.56
#> $ RD_quantile: Named num [1:2] -0.0657 0.3452
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ RR_quantile: Named num [1:2] 0.793 3.932
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ OR_quantile: Named num [1:2] 0.724 5.735
#> ..- attr(*, "names")= chr [1:2] "2.5%" "97.5%"
#> $ bootdata :'data.frame': 50 obs. of 5 variables:
#> ..$ effect_reference : num [1:50] -1.357 -0.801 -1.43 -0.751 -1.352 ...
#> ..$ effect_comparator: num [1:50] -1.073 -0.99 -0.805 -1.107 -0.216 ...
#> ..$ RD : num [1:50] 0.0501 -0.0389 0.1159 -0.0721 0.2408 ...
#> ..$ RR : num [1:50] 1.245 0.874 1.6 0.775 2.172 ...
#> ..$ OR : num [1:50] 1.329 0.828 1.868 0.701 3.117 ...
#> $ reference : num [1:2] 0 0
#> $ comparator : num [1:2] 1 1
mean
, sd
, quantile
: the mean,
standard deviation and 95% credible interval of the estimated causal
effect (ATE). From the above results, the mean of ATE is approximately
-3.161, which indicates that the expected outcome for always treated
patients is, on average, 3.161 units less than that for never treated
patients.bootdata
: a data frame containing the bootstrap samples
for the reference effect, comparator effect, and average treatment
effect (ATE).reference
, comparator
: the reference level
and comparator level the user chooses to compare. Here the reference
level is never treated (0,0), and the comparator level is always treated
(1,1).bayesmsm
summary_bayesmsm
function automatically generates a
summary table of the model output from the function
bayesmsm
.summary_bayesmsm(model)
#> mean sd 2.5% 97.5%
#> Reference -1.5636662 0.4170985 -2.20796693 -0.80418106
#> Comparator -0.7422943 0.3787915 -1.49598754 -0.09395954
#> RD 0.1467202 0.1129061 -0.06574406 0.34517250
#> RR 2.0546106 0.9072415 0.79269967 3.93180935
#> OR 2.7059804 1.5647939 0.72376031 5.73496587
plot_ATE
,
plot_APO
, plot_est_box
The bayesmsm
package also provides several other
functions for visualizing the above results: plot_ATE
,
plot_APO
, and plot_est_box
. These functions
help the user better interpret the estimated causal effects.
plot_ATE
function generates a plot of the estimated
ATE with its 95% credible interval.plot_APO
function plots the estimated
APO for both the reference and comparator level effects.plot_est_box
function generates an error bar plot
of the estimated treatment effects (APO and ATE) from the bootstrap
samples.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.