The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
learn_DAG()
functionThis is the second of a series of three vignettes for the R package
BCDAG
. In this vignette we focus on function
learn_DAG()
, which implements a Markov Chain Monte Carlo
(MCMC) algorithm to sample from the joint posterior of DAG structures
and DAG-parameters under the Gaussian assumption.
The underlying Bayesian Gaussian DAG-model can be summarized as follows: \[\begin{eqnarray} X_1, \dots, X_q \,|\,\boldsymbol L, \boldsymbol D, \mathcal{D} &\sim& \mathcal{N}_q\left(\boldsymbol 0, (\boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top)^{-1}\right)\\ (\boldsymbol L, \boldsymbol D)\,|\,\mathcal{D} &\sim& \text{DAG-Wishart}(\boldsymbol{a}_{c}^{\mathcal{D}}, \boldsymbol U) \\ p(\mathcal{D}) &\propto& w^{|\mathcal{S}_\mathcal{D}|}(1-w)^{\frac{q(q-1)}{2} - {|\mathcal{S}_\mathcal{D}|}} \end{eqnarray}\]
In particular \(\mathcal{D}=(V,E)\) denotes a DAG structure with set of nodes \(V=\{1,\dots,q\}\) and set of edges \(E\subseteq V \times V\). Moreover, \((\boldsymbol L, \boldsymbol D)\) are model parameters providing the decomposition of the precision (inverse-covariance) matrix \(\boldsymbol{\Omega} = \boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top\); specifically, \(\boldsymbol{L}\) is a \((q, q)\) matrix of coefficients such that for each \((u, v)\)-element \(\boldsymbol{L}_{uv}\) with \(u \ne v\), \(\boldsymbol{L}_{uv} \ne 0\) if and only if \((u, v) \in E\), while \(\boldsymbol{L}_{uu} = 1\) for each \(u = 1,\dots, q\); also, \(\boldsymbol{D}\) is a \((q, q)\) diagonal matrix with \((u, u)\)-element \(\boldsymbol{D}_{uu}\).
The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG-model; see also Castelletti & Mascaro (2021).
Conditionally to \(\mathcal{D}\), a prior to \((\boldsymbol{L}, \boldsymbol{D})\) is assigned through a compatible DAG-Wishart distribution with rate hyperparameter \(\boldsymbol{U}\), a \((q,q)\) s.p.d. matrix, and shape hyperparameter \(\boldsymbol{a}^{\mathcal {D}}_{c}\), a \((q,1)\) vector; see also Cao et al. (2019) and Peluso & Consonni (2020).
Finally, a prior on DAG \(\mathcal{D}\) is assigned through a Binomial distribution on the number of edges in the graph; in \(p(\mathcal{D})\), \(w \in (0,1)\) is a prior probability of edge inclusion, while \(|\mathcal{S_{\mathcal{D}}}|\) denotes the number of edges in \(\mathcal{D}\); see again Castelletti & Mascaro (2021) for further details.
Target of the MCMC scheme is therefore the joint posterior of \((\boldsymbol{L},\boldsymbol{D},\mathcal{D})\), \[\begin{equation} p(\boldsymbol L, \boldsymbol D, \mathcal{D}\,|\, \boldsymbol X) \propto p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})p(\boldsymbol{L},\boldsymbol{D}\,|\,\mathcal{D}) \,p(\mathcal{D}), \end{equation}\] where \(\boldsymbol{X}\) denotes a \((n,q)\) data matrix as obtained through \(n\) i.i.d. draws from the Gaussian DAG-model and \(p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})\) is the likelihood function. See also Castelletti & Mascaro (2022+) for full details.
We first randomly generate a DAG \(\mathcal{D}\), the DAG parameters \((\boldsymbol{L},\boldsymbol{D})\) and then \(n=1000\) i.i.d. observations from a Gaussian DAG-model as follows:
set.seed(1)
q <- 8
w <- 0.2
DAG <- rDAG(q,w)
a <- q
U <- diag(1,q)
outDL <- rDAGWishart(n=1, DAG, a, U)
L <- outDL$L; D <- outDL$D
Omega <- L %*% solve(D) %*% t(L)
Sigma <- solve(Omega)
n <- 1000
X <- mvtnorm::rmvnorm(n = n, sigma = Sigma)
See also our vignette about data generation from Gaussian DAG-models.
learn_DAG()
Function learn_DAG()
implements an MCMC algorithm to
sample from the joint posterior of DAGs and DAG parameters. This is
based on a Partial Analytic Structure (PAS) algorithm (Godsill, 2012)
which, at each iteration:
fast = TRUE
);We implement it as follows:
out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = FALSE)
Inputs of learn_DAG()
correspond to three different sets
of arguments:
S
, burn
and data
are standard
inputs required by any MCMC algorithm. In particular, S
defines the desired length of the chain, which is obtained by discarding
the first burn
observations (the total number of sampled
observations is therefore S + burn
); data
is
instead the \((n,q)\) matrix \(\boldsymbol{X}\);a
, U
and w
are
hyperparameters of the priors on DAGs (w
) and DAG
parameters (a
, U
); see also Equation [REF].
The same appear in functions rDAG()
and
rDAGWishart()
which were introduced in our vignette [ADD
REF TO THE VIGNETTE].fast
, save.memory
and
collapse
are boolean arguments which allow to: implement an
approximate proposal distribution within the MCMC scheme
(fast = TRUE
); change the array structure of the stored
MCMC output into strings in order to save memory
(save.memory = TRUE
); implement an MCMC for DAG structure
learning only, without sampling from the posterior of parameters
(collapse = TRUE
). See also A note on fast = TRUE
and
Castelletti & Mascaro (2022+) for full details.The output of learn_DAG()
is an object of class
bcdag
:
bcdag
objects include the output of the MCMC algorithm
together with a collection of meta-data representing the input arguments
of learn_DAG()
; these are stored in the attributes of the
object::
str(out)
#> List of 3
#> $ Graphs: num [1:8, 1:8, 1:5000] 0 0 0 0 1 0 0 1 0 0 ...
#> $ L : num [1:8, 1:8, 1:5000] 1 0 0 0 -0.0194 ...
#> $ D : num [1:8, 1:8, 1:5000] 0.164 0 0 0 0 ...
#> - attr(*, "class")= chr "bcdag"
#> - attr(*, "type")= chr "complete"
#> - attr(*, "input")=List of 10
#> ..$ S : num 5000
#> ..$ burn : num 1000
#> ..$ data : num [1:1000, 1:8] -0.299 -0.351 -0.631 0.219 -0.351 ...
#> ..$ a : num 8
#> ..$ U : num [1:8, 1:8] 1 0 0 0 0 0 0 0 0 1 ...
#> ..$ w : num 0.2
#> ..$ fast : logi FALSE
#> ..$ save.memory: logi FALSE
#> ..$ collapse : logi FALSE
#> ..$ verbose : logi TRUE
Attribute type
refers to the output of
learn_DAG()
, whose structure depends on the choice of the
arguments save.memory
and collapse
.
When both are set equal to FALSE
, as in the previous
example, the output of learn_DAG()
is a complete
bcdag
object, collecting three \((q,q,S)\) arrays with the DAG structures
(in the form of \(q \times q\)
adjacency matrices) and the DAG parameters \(\boldsymbol{L}\) and \(\boldsymbol{D}\) (both as \(q \times q\) matrices) sampled by the
MCMC:
out$Graphs[,,1]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 1 0 0
#> [3,] 0 0 0 0 0 0 0 1
#> [4,] 0 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 1 0 1 0 0 0 0
#> [8,] 1 0 0 0 0 0 0 0
round(out$L[,,1],2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1.00 0.00 0 0.00 0 0.00 0 0.00
#> [2,] 0.00 1.00 0 0.00 0 0.07 0 0.00
#> [3,] 0.00 0.00 1 0.00 0 0.00 0 0.94
#> [4,] 0.00 0.00 0 1.00 0 0.00 0 0.00
#> [5,] -0.02 0.00 0 0.00 1 0.00 0 0.00
#> [6,] 0.00 0.00 0 0.00 0 1.00 0 0.00
#> [7,] 0.00 -0.02 0 -1.97 0 0.00 1 0.00
#> [8,] 0.40 0.00 0 0.00 0 0.00 0 1.00
round(out$D[,,1],2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.16 0.00 0.00 0.00 0.00 0.0 0.00 0.00
#> [2,] 0.00 0.57 0.00 0.00 0.00 0.0 0.00 0.00
#> [3,] 0.00 0.00 0.74 0.00 0.00 0.0 0.00 0.00
#> [4,] 0.00 0.00 0.00 0.94 0.00 0.0 0.00 0.00
#> [5,] 0.00 0.00 0.00 0.00 77.94 0.0 0.00 0.00
#> [6,] 0.00 0.00 0.00 0.00 0.00 0.8 0.00 0.00
#> [7,] 0.00 0.00 0.00 0.00 0.00 0.0 0.31 0.00
#> [8,] 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.96
When collapse = TRUE
and
save.memory = FALSE
the output of learn_DAG()
is a collapsed bcdag
object, consisting of a \((q,q,S)\) array with the adjacency matrices
of the DAGs sampled by the MCMC:
collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = TRUE)
collapsed_out$Graphs[,,1]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 1 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 0 0 0 0 1
#> [8,] 1 0 1 0 0 0 0 0
When save.memory = TRUE
and
collapse = FALSE
, the output is a compressed
bcdag
object, collecting samples from the joint posterior
on DAGs and DAG parameters in the form of a vector of strings:
compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = TRUE, collapse = FALSE)
In such a case, we can access to the MCMC draws as:
compressed_out$Graphs[1]
#> [1] "0;0;0;0;1;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0"
compressed_out$L[1]
#> [1] "1;0;0;0;-0.020714882228288;0;0;0.389080449183579;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0.46449274423891;0;0;0;1;0;0;-1.76444008583527;-0.0201974091006512;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1"
compressed_out$D[1]
#> [1] "0.154822760388321;0;0;0;0;0;0;0;0;0.544421200660814;0;0;0;0;0;0;0;0;0.395261505201722;0;0;0;0;0;0;0;0;1.04755025995277;0;0;0;0;0;0;0;0;67.9926231564593;0;0;0;0;0;0;0;0;0.83164913327451;0;0;0;0;0;0;0;0;0.303043124888535;0;0;0;0;0;0;0;0;1.76018923453861"
In addition, we implement bd_decode
, an internal
function that can be used to visualize the previous objects as
matrices:
BCDAG:::bd_decode(compressed_out$Graphs[1])
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 1 0 0 0 0
#> [8,] 1 0 1 1 0 0 0 0
round(BCDAG:::bd_decode(compressed_out$L[1]),2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1.00 0 0.00 0.00 0 0 0 0
#> [2,] 0.00 1 0.00 0.00 0 0 0 0
#> [3,] 0.00 0 1.00 0.00 0 0 0 0
#> [4,] 0.00 0 0.00 1.00 0 0 0 0
#> [5,] -0.02 0 0.00 0.00 1 0 0 0
#> [6,] 0.00 0 0.00 0.00 0 1 0 0
#> [7,] 0.00 0 0.00 -1.76 0 0 1 0
#> [8,] 0.39 0 0.46 -0.02 0 0 0 1
round(BCDAG:::bd_decode(compressed_out$D[1]),2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.15 0.00 0.0 0.00 0.00 0.00 0.0 0.00
#> [2,] 0.00 0.54 0.0 0.00 0.00 0.00 0.0 0.00
#> [3,] 0.00 0.00 0.4 0.00 0.00 0.00 0.0 0.00
#> [4,] 0.00 0.00 0.0 1.05 0.00 0.00 0.0 0.00
#> [5,] 0.00 0.00 0.0 0.00 67.99 0.00 0.0 0.00
#> [6,] 0.00 0.00 0.0 0.00 0.00 0.83 0.0 0.00
#> [7,] 0.00 0.00 0.0 0.00 0.00 0.00 0.3 0.00
#> [8,] 0.00 0.00 0.0 0.00 0.00 0.00 0.0 1.76
Finally, if save.memory = TRUE
and
collapse = TRUE
, the output of learn_DAG()
is
a compressed and collapsed bcdag
object collecting
only the sampled DAGs represented as vector of strings:
fast = TRUE
Step 1. of the MCMC scheme implemented by learn_DAG()
updates DAG \(\mathcal{D}\) by randomly
drawing a new candidate DAG \(\mathcal{D}'\) from a proposal
distribution and then accepting it with probability given by the
Metropolis Hastings (MH) acceptance rate; see also Castelletti &
Mascaro (2021). For a given DAG \(\mathcal{D}\), the proposal distribution
\(q(\mathcal{D}'\,|\,\mathcal{D})\)
is built over the set \(\mathcal{O}_{\mathcal{D}}\) of direct
successors DAGs that can be reached from \(\mathcal{D}\) by inserting, deleting or
reversing a single edge in \(\mathcal{D}\). A DAG \(\mathcal{D}'\) is then proposed
uniformly from the set \(\mathcal{O}_{\mathcal{D}}\) so that \(q(\mathcal{D}'\,|\,\mathcal{D})=1/|\mathcal{O}_{\mathcal{D}}|\).
Moreover, the MH rate requires to evaluate the ratio of proposals \(q(\mathcal{D}'\,|\,\mathcal{D})/q(\mathcal{D}\,|\,\mathcal{D}')
= |\mathcal{O}_{\mathcal{D}'}|/|\mathcal{O}_{\mathcal{D}}|\),
and accordingly the construction of both \(\mathcal{O}_{\mathcal{D}}\) and \(\mathcal{O}_{\mathcal{D}'}\).
If fast = FALSE
, the proposal ratio is computed exactly;
this requires the enumerations of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\) which may
become computationally expensive, especially when \(q\) is large. However, the ratio approaches
\(1\) as the number of variables \(q\) increases: option
fast = TRUE
implements such an approximation, which
therefore avoids the construction of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\). A
comparison between fast = FALSE
and
fast = TRUE
in the execution of learn_DAG()
produces the following results in terms of computational time:
# No approximation
time_nofast <- system.time(out_nofast <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = FALSE))
# Approximation
time_fast <- system.time(out_fast <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = TRUE, save.memory = FALSE, collapse = FALSE))
Finally, the corresponding estimated posterior probabilities of edge inclusion are the following:
round(get_edgeprobs(out_nofast), 2)
#> 1 2 3 4 5 6 7 8
#> 1 0.00 0.06 0.07 0.00 0 0.00 0.00 0.00
#> 2 0.05 0.00 0.02 0.03 0 0.21 0.04 0.00
#> 3 0.09 0.02 0.00 0.00 0 0.01 0.02 0.52
#> 4 0.00 0.06 0.00 0.00 0 0.01 0.52 0.00
#> 5 1.00 0.00 0.00 0.00 0 0.00 0.01 0.00
#> 6 0.03 0.26 0.00 0.00 0 0.00 0.02 0.00
#> 7 0.03 0.01 0.00 0.48 0 0.03 0.00 0.02
#> 8 1.00 0.00 0.48 0.01 0 0.00 0.03 0.00
round(get_edgeprobs(out_fast), 2)
#> 1 2 3 4 5 6 7 8
#> 1 0.00 0.04 0.01 0.02 0.00 0.04 0.01 0.00
#> 2 0.05 0.00 0.00 0.03 0.00 0.20 0.01 0.00
#> 3 0.05 0.00 0.00 0.00 0.01 0.01 0.06 0.55
#> 4 0.02 0.03 0.00 0.00 0.00 0.02 0.55 0.01
#> 5 1.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00
#> 6 0.06 0.24 0.01 0.00 0.01 0.00 0.03 0.01
#> 7 0.00 0.02 0.03 0.45 0.00 0.01 0.00 0.00
#> 8 1.00 0.00 0.45 0.02 0.00 0.04 0.00 0.00
Ben-David E, Li T, Massam H, Rajaratnam B (2015). “High dimensional Bayesian inference for Gaussian directed acyclic graph models.” arXiv pre-print.
Cao X, Khare K, Ghosh M (2019). “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” The Annals of Statistics, 47(1), 319–348.
Castelletti F, Consonni G (2021). “Bayesian causal inference in probit graphical models” Bayesian Analysis, In press.
Castelletti F, Mascaro A (2021). “Structural learning and estimation of joint causal effects among network-dependent variables.” Statistical Methods & Applications, 30, 1289–1314.
Castelletti F, Mascaro A (2022). “BCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.” arXiv pre-print.
Godsill, SJ (2012). “On the relationship between Markov chain Monte Carlo methods for model uncertainty.” Journal of computational and graphical statistics, 10(2), 230-248.
Peluso S, Consonni G (2020). “Compatible priors for model selection of high-dimensional Gaussian DAGs.” Electronic Journal of Statistics, 14(2), 4110–4132.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.