Title: | Bayesian Additive Regression Trees for Confounder Selection |
Version: | 1.3.0 |
Description: | Fit Bayesian Regression Additive Trees (BART) models to select true confounders from a large set of potential confounders and to estimate average treatment effect. For more information, see Kim et al. (2023) <doi:10.1111/biom.13833>. |
License: | GPL (≥ 3) |
URL: | https://github.com/yooyh/bartcs |
BugReports: | https://github.com/yooyh/bartcs/issues |
Depends: | R (≥ 3.4.0) |
Imports: | coda (≥ 0.4.0), ggcharts, ggplot2, invgamma, MCMCpack, Rcpp, rlang, rootSolve, stats |
Suggests: | knitr, microbenchmark, rmarkdown |
LinkingTo: | Rcpp |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-04-08 00:45:44 UTC; rstudio |
Author: | Yeonghoon Yoo [aut, cre] |
Maintainer: | Yeonghoon Yoo <yooyh.stat@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-08 02:30:02 UTC |
bartcs: Bayesian Additive Regression Trees for Confounder Selection
Description
Fit Bayesian Regression Additive Trees (BART) models to select true confounders from a large set of potential confounders and to estimate average treatment effect. For more information, see Kim et al. (2023) doi:10.1111/biom.13833.
Details
Functions in bartcs
serve one of three purposes.
Functions for fitting:
separate_bart()
andsingle_bart()
.Functions for summary:
summary()
andplot()
.Utility function for OpenMP:
count_omp_thread()
.
The code of BART model are based on the 'BART' package by
Sparapani et al. (2021) under the GPL license, with modifications.
The modifications from the BART
package include (but are not limited to):
Add CHANGE step.
Add Single and Separate Model.
Add causal effect estimation.
Add confounder selection.
Author(s)
Maintainer: Yeonghoon Yoo yooyh.stat@gmail.com
References
Sparapani R, Spanbauer C, McCulloch R (2021). “Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package.” Journal of Statistical Software, 97(1), 1–66. doi:10.18637/jss.v097.i01
Kim, C., Tec, M., & Zigler, C. M. (2023). Bayesian Nonparametric Adjustment of Confounding, Biometrics doi:10.1111/biom.13833
See Also
Useful links:
Fit BART models to select confounders and estimate treatment effect
Description
Fit Bayesian Regression Additive Trees (BART) models
to select relevant confounders among a large set of potential confounders
and to estimate average treatment effect E[Y(1) - Y(0)]
.
Usage
separate_bart(
Y, trt, X,
trt_treated = 1,
trt_control = 0,
num_tree = 50,
num_chain = 4,
num_burn_in = 100,
num_thin = 1,
num_post_sample = 100,
step_prob = c(0.28, 0.28, 0.44),
alpha = 0.95,
beta = 2,
nu = 3,
q = 0.95,
dir_alpha = 5,
parallel = FALSE,
verbose = TRUE
)
single_bart(
Y, trt, X,
trt_treated = 1,
trt_control = 0,
num_tree = 50,
num_chain = 4,
num_burn_in = 100,
num_thin = 1,
num_post_sample = 100,
step_prob = c(0.28, 0.28, 0.44),
alpha = 0.95,
beta = 2,
nu = 3,
q = 0.95,
dir_alpha = 5,
parallel = FALSE,
verbose = TRUE
)
Arguments
Y |
A vector of outcome values. |
trt |
A vector of treatment values. Binary treatment works for both model and continuous treatment works for single_bart(). For binary treatment, use 1 to indicate the treated group and 0 for the control group. |
X |
A matrix of potential confounders. |
trt_treated |
Value of |
trt_control |
Value of |
num_tree |
Number of trees in BART model. The default value is set to 100. |
num_chain |
Number of MCMC chains.
Need to set |
num_burn_in |
Number of MCMC samples to be discarded per chain as initial burn-in periods. The default value is set to 100. |
num_thin |
Number of thinning per chain.
One in every |
num_post_sample |
Final number of posterior samples per chain.
Number of MCMC iterations per chain is
|
step_prob |
A vector of tree alteration probabilities (GROW, PRUNE, CHANGE).
Each alteration is proposed to change the tree structure. |
alpha , beta |
Hyperparameters for tree regularization prior.
A terminal node of depth |
nu , q |
Values to calibrate hyperparameter of sigma prior. |
dir_alpha |
Hyperparameter of Dirichlet prior for selection probabilities. The default value is 5. |
parallel |
If |
verbose |
If |
Details
separate_bart()
and single_bart()
fit an exposure model and outcome model(s)
for estimating treatment effect with adjustment of confounders
in the presence of a large set of potential confounders (Kim et al. 2023).
The exposure model E[A|X]
and the outcome model(s) E[Y|A,X]
are
linked together with a common Dirichlet prior that accrues
posterior selection probabilities to the corresponding confounders (X
)
on the basis of association with both the exposure (A
) and the
outcome (Y
).
There is a distinction between fitting separate outcome models for the treated and control groups and fitting a single outcome model for both groups.
-
separate_bart()
specifies two "separate" outcome models for two binary treatment levels. Thus, it fits three models: one exposure model and two separate outcome models forA = 0, 1
. -
single_bart()
specifies one "single" outcome model. Thus, it fits two models: one exposure model and one outcome model for the entire sample.
All inferences are made with outcome model(s).
Value
A bartcs
object. A list object contains the following components.
mcmc_list |
A |
-
SATE
Posterior sample of average treatment effectE[Y(1) - Y(0)]
. -
Y1
Posterior sample of potential outcomeE[Y(1)]
. -
Y0
Posterior sample of potential outcomeE[Y(0)]
. -
dir_alpha
Posterior sample ofdir_alpha.
-
sigma2_out
Posterior sample ofsigma2
in the outcome model.
var_prob |
Aggregated posterior inclusion probability of each variable. |
var_count |
Number of selection of each variable in each MCMC iteration.
Its dimension is |
chains |
A list of results from each MCMC chain. Each chain contains
every posterior samples in |
model |
|
label |
Column names of |
params |
Parameters used in the model. |
References
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266-298. doi:10.1214/09-AOAS285
Kim, C., Tec, M., & Zigler, C. M. (2023). Bayesian Nonparametric Adjustment of Confounding, Biometrics doi:10.1111/biom.13833
Examples
data(ihdp, package = "bartcs")
single_bart(
Y = ihdp$y_factual,
trt = ihdp$treatment,
X = ihdp[, 6:30],
num_tree = 10,
num_chain = 2,
num_post_sample = 20,
num_burn_in = 10,
verbose = FALSE
)
separate_bart(
Y = ihdp$y_factual,
trt = ihdp$treatment,
X = ihdp[, 6:30],
num_tree = 10,
num_chain = 2,
num_post_sample = 20,
num_burn_in = 10,
verbose = FALSE
)
Count the number of OpenMP threads for parallel computation
Description
count_omp_thread()
counts the number of OpenMP threads for
parallel computation.
If it returns 1, OpenMP is not viable.
Usage
count_omp_thread()
Value
Number of OpenMP thread(s).
Examples
count_omp_thread()
Infant Health and Development Program Data
Description
Infant Health and Development Program (IHDP) is a randomized experiment from 1985 to 1988 which studied the effect of home visits on cognitive test scores for infants.
Usage
ihdp
Format
- treatment
Given treatment.
- y_factual
Observed outcome.
- y_cfactual
Potential outcome given the opposite treatment.
- mu0
Control conditional means.
- mu1
Treated conditional means.
- X1 ~ X6
Confounders with continuous values.
- X7 ~ X25
Confounders with binary values.
Details
This dataset was first used by Hill (2011), then used by other researchers (Shalit et al. 2017, Louizos et al. 2017).
Source
Our version of dataset is the dataset used by Louizos et al. (2017). This is the first realization of 10 generated datasets and you can find other realizations from https://github.com/AMLab-Amsterdam/CEVAE.
References
Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 217-240. doi:10.1198/jcgs.2010.08162
Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R., & Welling, M. (2017). Causal effect inference with deep latent-variable models. Advances in neural information processing systems, 30. doi:10.48550/arXiv.1705.08821 https://github.com/AMLab-Amsterdam/CEVAE
Shalit, U., Johansson, F. D., & Sontag, D. (2017, July). Estimating individual treatment effect: generalization bounds and algorithms. In International Conference on Machine Learning (pp. 3076-3085). PMLR. doi:10.48550/arXiv.1606.03976
Draw plot for bartcs
object
Description
Two options are available: posterior inclusion probability (PIP) plot and trace plot.
Usage
## S3 method for class 'bartcs'
plot(x, method = NULL, parameter = NULL, ...)
Arguments
x |
A |
method |
" |
parameter |
Parameter for traceplot. |
... |
Additional arguments for PIP plot.
Check |
Details
PIP plot
When a posterior sample is sampled during training,
separate_bart()
or single_bart()
also counts
which variables are included in the model and
compute PIP for each variable.
For bartcs
object x
,
this is stored in x$var_count
and x$var_prob
respectively.
plot(method = "pip")
uses this information and
draws plot using ggcharts::bar_chart()
.
Traceplot
Parameters are recorded for each MCMC iterations.
Parameters include "SATE
", "Y1
", "Y0
", "dir_alpha
",
and either "sigma2_out
" from single_bart()
or "sigma2_out1
" and "sigma2_out0
" from
separate_bart()
.
Vertical line indicates burn-in.
Value
A ggplot
object of either PIP plot or trace plot.
Examples
data(ihdp, package = "bartcs")
x <- single_bart(
Y = ihdp$y_factual,
trt = ihdp$treatment,
X = ihdp[, 6:30],
num_tree = 10,
num_chain = 2,
num_post_sample = 20,
num_burn_in = 10,
verbose = FALSE
)
# PIP plot
plot(x, method = "pip")
plot(x, method = "pip", top_n = 10)
plot(x, method = "pip", threshold = 0.5)
# Check `?ggcharts::bar_chart` for other possible arguments.
# trace plot
plot(x, method = "trace")
plot(x, method = "trace", "Y1")
plot(x, method = "trace", "dir_alpha")
Summary for bartcs
object
Description
Provide summary for bartcs
object.
Usage
## S3 method for class 'bartcs'
summary(object, ...)
Arguments
object |
A |
... |
Additional arguments. Not yet supported. |
Details
summary()
provides 95% posterior credible interval for both
aggregated outcome and individual outcomes from each MCMC chain.
Value
Provide list with the following components
model |
|
trt_value |
Treatment values for each treatment group:
|
tree_params |
Parameters for the tree structure. |
chain_params |
Parameters for MCMC chains. |
outcome |
Summary of outcomes from the model. This includes both aggregated outcome and individual outcomes from each MCMC chain. |
Examples
data(ihdp, package = "bartcs")
x <- single_bart(
Y = ihdp$y_factual,
trt = ihdp$treatment,
X = ihdp[, 6:30],
num_tree = 10,
num_chain = 2,
num_post_sample = 20,
num_burn_in = 10,
verbose = FALSE
)
summary(x)
Synthetic dataset for simulation
Description
Create synthetic dataset for simulation.
Usage
synthetic_data(N = 300, P = 100, seed = 42, binary_trt = TRUE)
Arguments
N |
Number of observations for dataset. The default value is set to 300. |
P |
Number of potential confounders for dataset. Need to set X > 7 for data generation. The default value is set to 100. |
seed |
Seed value for simulation. The default value is set to 42. |
binary_trt |
Whether the treatment is binary. The default value is set to TRUE. |
Details
synthetic_data()
generates synthetic dataset for Scenario 1 from
Kim et al. (2023). Among possible confounders, X1 - X5 are true confounders.
Value
Provide list with the following components
Y |
A vector of outcome values. |
Trt |
A vector of binary treatment values. |
X |
A matrix of potential confounders. |
References
Kim, C., Tec, M., & Zigler, C. M. (2023). Bayesian Nonparametric Adjustment of Confounding, Biometrics doi:10.1111/biom.13833
Examples
synthetic_data()