The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
learn_DAG()
using get_ functionsThis is the third of a series of three vignettes for the R package
BCDAG
. In this vignette, we show how to use the output of
learn_DAG()
for posterior inference on DAGs, DAG
parameters, and causal effect estimation. Specifically, we introduce the
functions of the get_
family. Remember that the output of
learn_DAG()
consists of an MCMC sample from the marginal
posterior distribution of DAG structures (collapse = TRUE
)
and the joint posterior of DAGs and DAG parameters
(collapse = FALSE
); see also the corresponding
vignette
To start with, we simulate a dataset X
from a randomly
generated Gaussian DAG model as shown in an other
vignette
## Generate data
set.seed(1)
q <- 8
w <- 0.2
DAG <- rDAG(q,w)
a <- q
U <- diag(1,q)
outDL <- rDAGWishart(n=1, DAG, a, U)
L <- outDL$L; D <- outDL$D
Omega <- L %*% solve(D) %*% t(L)
Sigma <- solve(Omega)
n <- 1000
X <- mvtnorm::rmvnorm(n = n, sigma = Sigma)
Next, we use learn_DAG()
to approximate the joint
posterior distribution over DAG structures and DAG parameters:
## Run MCMC
out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = FALSE)
get_diagnostics()
Before using the MCMC output for posterior inference, it is common
practice to perform some convergence checks. Function
get_diagnostics()
provides graphical diagnostics of
convergence for the MCMC output of learn_DAG()
. These are
based on: the number of edges in the DAGs; the posterior probability of
edge inclusion for each possible edge \(u
\rightarrow v\), both monitored across MCMC iterations. Input of
the function is an object of class bcdag
and the output
consists of:
a traceplot and running-mean plot of the number of edges in the DAGs (graph size);
a collection of traceplots of the posterior probabilities of edge inclusion computed across MCMC iterations.
For each pair of distinct nodes \((u,v)\), its posterior probability of inclusion at time \(s\) \((s = 1,\dots, S)\) is estimated as the proportion of DAGs visited by the MCMC up to time \(s\) which contain the directed edge \(u \rightarrow v\). Output is organized in \(q\) plots (one for each node \(v = 1, \dots, q\)), each summarizing the posterior probabilities of edges \(u \rightarrow v\), \(u = 1,\dots, q\).
We now show how to perform posterior inference of DAGs from the MCMC
output. To summarize the output of learn_DAG()
,
print()
, summary()
and
plot()
methods for objects of class bcdag
are
available. When print()
is applied to a bcdag
object, a printed message appears. This summarizes the type of
bcdag
object (see details on bcdag
object
types provided in the previous vignette) and the input
arguments of learn_DAG()
that generated the output. When
summary()
is used, also the posterior probabilities of edge
inclusion are reported. Finally, plot()
returns a graphical
representation of the Median Probability DAG, an heatmap with the
probabilities of edge inclusion and an histogram of the sizes of graph
visited by the Markov chain.
print(out)
#> A complete bcdag object containing 5000 draws from the joint posterior over DAGs, L and D. (Burnin = 1000 ).
#>
#> Prior hyperparameters:
#> w = 0.2
#> a = 8
#> U =
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 0 0 0 0 0 0 0
#> [2,] 0 1 0 0 0 0 0 0
#> [3,] 0 0 1 0 0 0 0 0
#> [4,] 0 0 0 1 0 0 0 0
#> [5,] 0 0 0 0 1 0 0 0
#> [6,] 0 0 0 0 0 1 0 0
#> [7,] 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 0 0 0 0 1
summary(out)
#> A complete bcdag object containing 5000 draws from the joint posterior over DAGs, L and D. (Burnin = 1000 ).
#>
#> Prior hyperparameters:
#> w = 0.2
#> a = 8
#> U =
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 0 0 0 0 0 0 0
#> [2,] 0 1 0 0 0 0 0 0
#> [3,] 0 0 1 0 0 0 0 0
#> [4,] 0 0 0 1 0 0 0 0
#> [5,] 0 0 0 0 1 0 0 0
#> [6,] 0 0 0 0 0 1 0 0
#> [7,] 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 0 0 0 0 1
#>
#> Posterior probabilities of edge inclusion:
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.000 0.048 0.060 0.000 0.000 0.018 0.016 0.000
#> [2,] 0.026 0.000 0.009 0.006 0.002 0.227 0.064 0.000
#> [3,] 0.047 0.006 0.000 0.010 0.000 0.004 0.016 0.416
#> [4,] 0.000 0.004 0.000 0.000 0.000 0.020 0.533 0.000
#> [5,] 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
#> [6,] 0.002 0.186 0.008 0.006 0.000 0.000 0.019 0.000
#> [7,] 0.004 0.049 0.011 0.467 0.000 0.037 0.000 0.009
#> [8,] 1.000 0.000 0.584 0.000 0.000 0.005 0.006 0.000
Function get_edgeprobs()
computes and returns the
collection of posterior probabilities of edge inclusion, arranged as a
\((q,q)\) matrix, with \((u,v)\)-element referring to edge \(u\rightarrow v\):
get_edgeprobs(out)
#> 1 2 3 4 5 6 7 8
#> 1 0.0000 0.0480 0.0596 0.0000 0.0000 0.0182 0.0158 0.0000
#> 2 0.0262 0.0000 0.0092 0.0064 0.0018 0.2270 0.0638 0.0000
#> 3 0.0474 0.0064 0.0000 0.0096 0.0000 0.0042 0.0162 0.4162
#> 4 0.0000 0.0040 0.0000 0.0000 0.0000 0.0204 0.5328 0.0000
#> 5 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
#> 6 0.0020 0.1864 0.0084 0.0064 0.0000 0.0000 0.0186 0.0000
#> 7 0.0044 0.0494 0.0106 0.4672 0.0000 0.0372 0.0000 0.0092
#> 8 1.0000 0.0000 0.5838 0.0000 0.0000 0.0054 0.0060 0.0000
The MPM model returned in the output of summary()
can be
used as a single DAG-model estimate and is obtained by including all
edges whose posterior probability exceeds the threshold \(0.5\). Function get_MPMdag()
applies to an object of class bcdag
and returns the \((q,q)\) adjacency matrix of the MPM:
MPMdag <- get_MPMdag(out)
MPMdag
#> 1 2 3 4 5 6 7 8
#> 1 0 0 0 0 0 0 0 0
#> 2 0 0 0 0 0 0 0 0
#> 3 0 0 0 0 0 0 0 0
#> 4 0 0 0 0 0 0 1 0
#> 5 1 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0 0
#> 7 0 0 0 0 0 0 0 0
#> 8 1 0 1 0 0 0 0 0
As an alternative, the Maximum A Posterior DAG estimate (MAP) can be
considered. This corresponds to the DAG with the highest MCMC frequency
of visits and can be recovered through the function
get_MAPdag()
:
MAPdag <- get_MAPdag(out)
MAPdag
#> 1 2 3 4 5 6 7 8
#> 1 0 0 0 0 0 0 0 0
#> 2 0 0 0 0 0 0 0 0
#> 3 0 0 0 0 0 0 0 0
#> 4 0 0 0 0 0 0 0 0
#> 5 1 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0 0
#> 7 0 0 0 1 0 0 0 0
#> 8 1 0 1 0 0 0 0 0
par(mfrow = c(1,3))
Rgraphviz::plot(as_graphNEL(DAG), main = "True DAG")
Rgraphviz::plot(as_graphNEL(MPMdag), main = "MPM DAG")
Rgraphviz::plot(as_graphNEL(MAPdag), main = "MAP DAG")
In this example, the MPM and MAP estimates differ by a single an edge between nodes \(4\) and \(7\) which is reversed among the two graphs. However, it can be shown that the two DAG estimates are Markov equivalent, meaning that they encode the same conditional independencies between variables. In a Gaussian setting, Markov equivalent DAGs cannot be distinguished with observational data as they represent the same statistical model. Therefore, there is no difference in choosing the MPM or the MAP estimate to infer the structure of dependencies between variables.
In addition, if compared with the true graph, the DAG estimate provided by MPM [CORRETTO?] differs by a single edge between nodes \(7\) and \(1\) which is missing from MPM. Interestingly, one can see that the regression coefficient associated with \(u\rightarrow v\), \(\boldsymbol L_{7,1}\), is relatively “small”, implying that the strength of the dependence between the two nodes is “weak”:
round(L, 3)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1.000 0 0.000 0.00 0 0 0 0
#> [2,] 0.000 1 0.000 0.00 0 0 0 0
#> [3,] 0.000 0 1.000 0.00 0 0 0 0
#> [4,] 0.000 0 0.000 1.00 0 0 0 0
#> [5,] -0.018 0 0.000 0.00 1 0 0 0
#> [6,] 0.000 0 0.000 0.00 0 1 0 0
#> [7,] -0.006 0 0.000 -1.89 0 0 1 0
#> [8,] 0.373 0 0.474 0.00 0 0 0 1
In this last section, we introduce functions
causaleffect()
and get_causaleffect()
, which
allow to compute and estimate causal effects between variables.
Specifically, we consider the causal effect on a response variable of
interest consequent to a joint intervention on a given set of variables;
see also Nandy et al. (2017) and Castelletti & Mascaro (2021) for
formal definitions.
For a given DAG, it is possible to identify and estimate the causal
effect on a node \(Y\) consequent to a
hypothetical hard intervention on node \(j\) using the rules of the
do-calculus (Pearl, 2000). A simple implementation of this set
of rules and an estimation method for the causally sufficient case and
for Gaussian data is provided by function causaleffect()
.
The function takes as input a numerical vector representing the labels
of the intervened nodes (also called intervention target
),
a numerical value indicating the response
variable and the
DAG model parameters L
and D
; see also
Castelletti & Mascaro (2021) or our previous
vignette for a detailed model description.
For a given response variable \(Y \in\{1, \ldots, q\}\), and intervention target \(I \subseteq \{1,\dots,q\}\) the total joint effect of an intervention \(\operatorname{do}\left\{X_{j}=\tilde{x}_{j}\right\}_{j \in I}\) on \(Y\) is \[ \theta_{Y}^{I}:=\left(\theta_{h, Y}^{I}\right)_{h \in I}, \] where for each \(h \in I\) \[ \theta_{h, Y}^{I}:=\frac{\partial}{\partial x_{h}} \mathbb{E}\left(Y \mid \operatorname{do}\left\{X_{j}=\tilde{x}_{j}\right\}_{j \in I}\right) \]
See also Castelletti & Mascaro (2021) and Nandy et al. (2017) for more details.
To better understand the difference between single and joint interventions, consider as an example the total causal effect on node \(Y=1\) (i.e. variable \(X_1\)) of a joint intervention on nodes \(4\) and \(5\), \(I=\{4,5\}\) (i.e. variables \(X_4, X_5\)) and the total causal effects of two separate interventions on nodes \(4\) and \(5\) under the causal model represented by the DAG generated before:
These are given by:
As it can be observed, the total causal effect of intervening on variable \(X_4\) is null both in a single intervention on \(4\) and in a joint intervention on \(\{X_4, X_5\}\), while intervening on \(X_5\) produces the same positive total causal effect in both cases. The total causal effects produced are thus exactly the same for both variables in the two cases. However, if we slightly modify the DAG by adding an edge from node \(4\) to node \(5\), so that:
DAG2 <- DAG
DAG2[4,5] <- 1
par(mfrow = c(1,2))
Rgraphviz::plot(as_graphNEL(DAG), main = "True DAG")
Rgraphviz::plot(as_graphNEL(DAG2), main = "Modified DAG")
and modify L
accordingly:
The comparison of single and joint total causal effects now produces different results:
As it can be observed, this time a single intervention on \(X_4\) produces a negative causal effect on \(X_1\), while jointly intervening on \(X_4\) and \(X_5\) makes the total causal effect of \(X_4\) on \(X_1\) null. The effect of \(X_4\) on \(X_1\) was in fact mediated by \(X_5\): intervening simultaneously also on \(X_5\) erases the effect of \(X_4\) on \(X_5\) and, in turn, of \(X_4\) on \(X_1\). See also Castelletti & Mascaro (2021) or Nandy et al. (2017) for a more detailed description.
The identification and estimation of causal effects requires the
specification of a DAG. When the DAG is unknown, function
get_causaleffect()
can be used. It applies to objects of
class bcdag
; the latter corresponds to the output of
learn_DAG()
and consists of a sample of size \(S\) from the posterior of DAGs and DAG
parameters. In addition get_causaleffect()
takes as input a
numerical vector representing the labels of the intervened nodes (the
intervention target
) and a numerical value indicating the
response
variable. Output of the function is an object of
class bcdagCE
, containing a sample of size \(S\) from the posterior distribution of the
causal effect coefficients associated with the intervention
targets
and some useful summaries of the posterior
distributions such as the mean and the quantiles:
effects_out <- get_causaleffect(out, targets = c(4,5), response = 1)
head(effects_out$causaleffects)
#> h = 4 h = 5
#> [1,] 0 0.01940036
#> [2,] 0 0.01435828
#> [3,] 0 0.01852334
#> [4,] 0 0.01807825
#> [5,] 0 0.01956232
#> [6,] 0 0.01922416
print()
, summary()
and plot()
methods are available for objects of class bcdagCE
.
print()
returns the input used and the prior
hyperparameters. summary()
returns the estimated mean,
quantiles of the estimated causal effects, as well as the probabilities
of it being greater, lower or equal to zero. plots()
produces a boxplot and a histogram of the causal effect of each target
node.
print(effects_out)
#> A complete bcdagCE object containing 5000 draws from the posterior distribution of causal effects of variables 4, 5 on 1
#>
#> Prior hyperparameters:
#> w = 0.2
#> a = 8
#> U =
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 0 0 0 0 0 0 0
#> [2,] 0 1 0 0 0 0 0 0
#> [3,] 0 0 1 0 0 0 0 0
#> [4,] 0 0 0 1 0 0 0 0
#> [5,] 0 0 0 0 1 0 0 0
#> [6,] 0 0 0 0 0 1 0 0
#> [7,] 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 0 0 0 0 1
summary(effects_out)
#> A complete bcdagCE object containing 5000 draws from the posterior distribution of causal effects of variables 4, 5 on 1
#>
#> Prior hyperparameters:
#> w = 0.2
#> a = 8
#> U =
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 0 0 0 0 0 0 0
#> [2,] 0 1 0 0 0 0 0 0
#> [3,] 0 0 1 0 0 0 0 0
#> [4,] 0 0 0 1 0 0 0 0
#> [5,] 0 0 0 0 1 0 0 0
#> [6,] 0 0 0 0 0 1 0 0
#> [7,] 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 0 0 0 0 1
#>
#> Posterior means of causal effects:
#> h = 4 h = 5
#> -5.474438e-06 1.793047e-02
#>
#> Posterior quantiles of causal effects:
#> 2.5% 25% 50% 75% 97.5%
#> h = 4 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> h = 5 0.01502222 0.01691248 0.01791702 0.01895081 0.02081435
#>
#> Posterior probability of causal effects being greater, equal or smaller than 0:
#> <0 =0 >0
#> h = 4 0.0024 0.9962 0.0014
#> h = 5 0.0000 0.0000 1.0000
Also notice that, if the BCDAG
object input of
get_causaleffect()
is of type collapsed
or
compressed and collapsed
, then
get_causaleffect()
requires drawing from the posterior
distribution of parameters \((\boldsymbol L,
\boldsymbol D)\) before estimating the required causal
effects:
coll_out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = TRUE)
Castelletti, F, Mascaro, A (2021). “Structural learning and estimation of joint causal effects among network-dependent variables”. Statistical Methods & Applications, 30(5), 1289-1314.
Castelletti F, Mascaro A (2022). “BCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.” arXiv pre-print.
Nandy, P, Maathuis, MH., Richardson, TS (2017). “Estimating the effect of joint interventions from observational data in sparse high-dimensional settings”. The Annals of Statistics, 45(2), 647-674.
Pearl J (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge. ISBN 0-521-77362-8.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.