MCMC scheme for posterior inference of Gaussian DAG models: the learn

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

MCMC scheme for posterior inference of Gaussian DAG models: the learn_DAG() function

This is the second of a series of three vignettes for the R package BCDAG. In this vignette we focus on function learn_DAG(), which implements a Markov Chain Monte Carlo (MCMC) algorithm to sample from the joint posterior of DAG structures and DAG-parameters under the Gaussian assumption.

Model description

The underlying Bayesian Gaussian DAG-model can be summarized as follows: \[\begin{eqnarray} X_1, \dots, X_q \,|\,\boldsymbol L, \boldsymbol D, \mathcal{D} &\sim& \mathcal{N}_q\left(\boldsymbol 0, (\boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top)^{-1}\right)\\ (\boldsymbol L, \boldsymbol D)\,|\,\mathcal{D} &\sim& \text{DAG-Wishart}(\boldsymbol{a}_{c}^{\mathcal{D}}, \boldsymbol U) \\ p(\mathcal{D}) &\propto& w^{|\mathcal{S}_\mathcal{D}|}(1-w)^{\frac{q(q-1)}{2} - {|\mathcal{S}_\mathcal{D}|}} \end{eqnarray}\]

In particular \(\mathcal{D}=(V,E)\) denotes a DAG structure with set of nodes \(V=\{1,\dots,q\}\) and set of edges \(E\subseteq V \times V\). Moreover, \((\boldsymbol L, \boldsymbol D)\) are model parameters providing the decomposition of the precision (inverse-covariance) matrix \(\boldsymbol{\Omega} = \boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top\); specifically, \(\boldsymbol{L}\) is a \((q, q)\) matrix of coefficients such that for each \((u, v)\)-element \(\boldsymbol{L}_{uv}\) with \(u \ne v\), \(\boldsymbol{L}_{uv} \ne 0\) if and only if \((u, v) \in E\), while \(\boldsymbol{L}_{uu} = 1\) for each \(u = 1,\dots, q\); also, \(\boldsymbol{D}\) is a \((q, q)\) diagonal matrix with \((u, u)\)-element \(\boldsymbol{D}_{uu}\).

The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG-model; see also Castelletti & Mascaro (2021).

Conditionally to \(\mathcal{D}\), a prior to \((\boldsymbol{L}, \boldsymbol{D})\) is assigned through a compatible DAG-Wishart distribution with rate hyperparameter \(\boldsymbol{U}\), a \((q,q)\) s.p.d. matrix, and shape hyperparameter \(\boldsymbol{a}^{\mathcal {D}}_{c}\), a \((q,1)\) vector; see also Cao et al. (2019) and Peluso & Consonni (2020).

Finally, a prior on DAG \(\mathcal{D}\) is assigned through a Binomial distribution on the number of edges in the graph; in \(p(\mathcal{D})\), \(w \in (0,1)\) is a prior probability of edge inclusion, while \(|\mathcal{S_{\mathcal{D}}}|\) denotes the number of edges in \(\mathcal{D}\); see again Castelletti & Mascaro (2021) for further details.

Target of the MCMC scheme is therefore the joint posterior of \((\boldsymbol{L},\boldsymbol{D},\mathcal{D})\), \[\begin{equation} p(\boldsymbol L, \boldsymbol D, \mathcal{D}\,|\, \boldsymbol X) \propto p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})p(\boldsymbol{L},\boldsymbol{D}\,|\,\mathcal{D}) \,p(\mathcal{D}), \end{equation}\] where \(\boldsymbol{X}\) denotes a \((n,q)\) data matrix as obtained through \(n\) i.i.d. draws from the Gaussian DAG-model and \(p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})\) is the likelihood function. See also Castelletti & Mascaro (2022+) for full details.

Generating data

We first randomly generate a DAG \(\mathcal{D}\), the DAG parameters \((\boldsymbol{L},\boldsymbol{D})\) and then \(n=1000\) i.i.d. observations from a Gaussian DAG-model as follows:

set.seed(1)
q <- 8
w <- 0.2
DAG <- rDAG(q,w)
a <- q
U <- diag(1,q)
outDL <- rDAGWishart(n=1, DAG, a, U)
L <- outDL$L; D <- outDL$D
Omega <- L %*% solve(D) %*% t(L)
Sigma <- solve(Omega)
n <- 1000
X <- mvtnorm::rmvnorm(n = n, sigma = Sigma)

`learn_DAG()`

Function learn_DAG() implements an MCMC algorithm to sample from the joint posterior of DAGs and DAG parameters. This is based on a Partial Analytic Structure (PAS) algorithm (Godsill, 2012) which, at each iteration:

Updates the DAG through a Metropolis-Hastings (MH) step where, given the current DAG, a new (direct successor) DAG is drawn from a suitable proposal distribution and accepted with a probability given by the MH acceptance rate (see also section A note on fast = TRUE);
Samples from the posterior distribution of the (updated DAG) parameters; see also Castelletti & Consonni (2021) for more details.

We implement it as follows:

out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = FALSE, collapse = FALSE)

Input

Inputs of learn_DAG() correspond to three different sets of arguments:

S, burn and data are standard inputs required by any MCMC algorithm. In particular, S defines the desired length of the chain, which is obtained by discarding the first burn observations (the total number of sampled observations is therefore S + burn); data is instead the \((n,q)\) matrix \(\boldsymbol{X}\);
a, U and w are hyperparameters of the priors on DAGs (w) and DAG parameters (a, U); see also Equation [REF]. The same appear in functions rDAG() and rDAGWishart() which were introduced in our vignette [ADD REF TO THE VIGNETTE].
fast, save.memory and collapse are boolean arguments which allow to: implement an approximate proposal distribution within the MCMC scheme (fast = TRUE); change the array structure of the stored MCMC output into strings in order to save memory (save.memory = TRUE); implement an MCMC for DAG structure learning only, without sampling from the posterior of parameters (collapse = TRUE). See also A note on fast = TRUE and Castelletti & Mascaro (2022+) for full details.

Output

The output of learn_DAG() is an object of class bcdag:

class(out)
#> [1] "bcdag"

bcdag objects include the output of the MCMC algorithm together with a collection of meta-data representing the input arguments of learn_DAG(); these are stored in the attributes of the object::

str(out)
#> List of 3
#>  $ Graphs: num [1:8, 1:8, 1:5000] 0 0 0 0 1 0 0 1 0 0 ...
#>  $ L     : num [1:8, 1:8, 1:5000] 1 0 0 0 -0.0194 ...
#>  $ D     : num [1:8, 1:8, 1:5000] 0.164 0 0 0 0 ...
#>  - attr(*, "class")= chr "bcdag"
#>  - attr(*, "type")= chr "complete"
#>  - attr(*, "input")=List of 10
#>   ..$ S          : num 5000
#>   ..$ burn       : num 1000
#>   ..$ data       : num [1:1000, 1:8] -0.299 -0.351 -0.631 0.219 -0.351 ...
#>   ..$ a          : num 8
#>   ..$ U          : num [1:8, 1:8] 1 0 0 0 0 0 0 0 0 1 ...
#>   ..$ w          : num 0.2
#>   ..$ fast       : logi FALSE
#>   ..$ save.memory: logi FALSE
#>   ..$ collapse   : logi FALSE
#>   ..$ verbose    : logi TRUE

Attribute type refers to the output of learn_DAG(), whose structure depends on the choice of the arguments save.memory and collapse.

When both are set equal to FALSE, as in the previous example, the output of learn_DAG() is a complete bcdag object, collecting three \((q,q,S)\) arrays with the DAG structures (in the form of \(q \times q\) adjacency matrices) and the DAG parameters \(\boldsymbol{L}\) and \(\boldsymbol{D}\) (both as \(q \times q\) matrices) sampled by the MCMC:

out$Graphs[,,1]
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    1    0    0
#> [3,]    0    0    0    0    0    0    0    1
#> [4,]    0    0    0    0    0    0    0    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    1    0    1    0    0    0    0
#> [8,]    1    0    0    0    0    0    0    0
round(out$L[,,1],2)
#>       [,1]  [,2] [,3]  [,4] [,5] [,6] [,7] [,8]
#> [1,]  1.00  0.00    0  0.00    0 0.00    0 0.00
#> [2,]  0.00  1.00    0  0.00    0 0.07    0 0.00
#> [3,]  0.00  0.00    1  0.00    0 0.00    0 0.94
#> [4,]  0.00  0.00    0  1.00    0 0.00    0 0.00
#> [5,] -0.02  0.00    0  0.00    1 0.00    0 0.00
#> [6,]  0.00  0.00    0  0.00    0 1.00    0 0.00
#> [7,]  0.00 -0.02    0 -1.97    0 0.00    1 0.00
#> [8,]  0.40  0.00    0  0.00    0 0.00    0 1.00
round(out$D[,,1],2)
#>      [,1] [,2] [,3] [,4]  [,5] [,6] [,7] [,8]
#> [1,] 0.16 0.00 0.00 0.00  0.00  0.0 0.00 0.00
#> [2,] 0.00 0.57 0.00 0.00  0.00  0.0 0.00 0.00
#> [3,] 0.00 0.00 0.74 0.00  0.00  0.0 0.00 0.00
#> [4,] 0.00 0.00 0.00 0.94  0.00  0.0 0.00 0.00
#> [5,] 0.00 0.00 0.00 0.00 77.94  0.0 0.00 0.00
#> [6,] 0.00 0.00 0.00 0.00  0.00  0.8 0.00 0.00
#> [7,] 0.00 0.00 0.00 0.00  0.00  0.0 0.31 0.00
#> [8,] 0.00 0.00 0.00 0.00  0.00  0.0 0.00 0.96

When collapse = TRUE and save.memory = FALSE the output of learn_DAG() is a collapsed bcdag object, consisting of a \((q,q,S)\) array with the adjacency matrices of the DAGs sampled by the MCMC:

collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = FALSE, collapse = TRUE)

names(collapsed_out)
#> [1] "Graphs"
class(collapsed_out)
#> [1] "bcdag"
attributes(collapsed_out)$type
#> [1] "collapsed"
collapsed_out$Graphs[,,1]
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    0    0    0
#> [3,]    0    0    0    0    0    0    0    0
#> [4,]    0    0    0    0    0    0    1    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    0    0    0    0    0    0    1
#> [8,]    1    0    1    0    0    0    0    0

When save.memory = TRUE and collapse = FALSE, the output is a compressed bcdag object, collecting samples from the joint posterior on DAGs and DAG parameters in the form of a vector of strings:

compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = TRUE, collapse = FALSE)

names(compressed_out)
#> [1] "Graphs" "L"      "D"
class(compressed_out)
#> [1] "bcdag"
attributes(compressed_out)$type
#> [1] "compressed"

In such a case, we can access to the MCMC draws as:

compressed_out$Graphs[1]
#> [1] "0;0;0;0;1;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0"
compressed_out$L[1]
#> [1] "1;0;0;0;-0.020714882228288;0;0;0.389080449183579;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0.46449274423891;0;0;0;1;0;0;-1.76444008583527;-0.0201974091006512;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1"
compressed_out$D[1]
#> [1] "0.154822760388321;0;0;0;0;0;0;0;0;0.544421200660814;0;0;0;0;0;0;0;0;0.395261505201722;0;0;0;0;0;0;0;0;1.04755025995277;0;0;0;0;0;0;0;0;67.9926231564593;0;0;0;0;0;0;0;0;0.83164913327451;0;0;0;0;0;0;0;0;0.303043124888535;0;0;0;0;0;0;0;0;1.76018923453861"

In addition, we implement bd_decode, an internal function that can be used to visualize the previous objects as matrices:

BCDAG:::bd_decode(compressed_out$Graphs[1])
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    0    0    0
#> [3,]    0    0    0    0    0    0    0    0
#> [4,]    0    0    0    0    0    0    0    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    0    0    1    0    0    0    0
#> [8,]    1    0    1    1    0    0    0    0
round(BCDAG:::bd_decode(compressed_out$L[1]),2)
#>       [,1] [,2] [,3]  [,4] [,5] [,6] [,7] [,8]
#> [1,]  1.00    0 0.00  0.00    0    0    0    0
#> [2,]  0.00    1 0.00  0.00    0    0    0    0
#> [3,]  0.00    0 1.00  0.00    0    0    0    0
#> [4,]  0.00    0 0.00  1.00    0    0    0    0
#> [5,] -0.02    0 0.00  0.00    1    0    0    0
#> [6,]  0.00    0 0.00  0.00    0    1    0    0
#> [7,]  0.00    0 0.00 -1.76    0    0    1    0
#> [8,]  0.39    0 0.46 -0.02    0    0    0    1
round(BCDAG:::bd_decode(compressed_out$D[1]),2)
#>      [,1] [,2] [,3] [,4]  [,5] [,6] [,7] [,8]
#> [1,] 0.15 0.00  0.0 0.00  0.00 0.00  0.0 0.00
#> [2,] 0.00 0.54  0.0 0.00  0.00 0.00  0.0 0.00
#> [3,] 0.00 0.00  0.4 0.00  0.00 0.00  0.0 0.00
#> [4,] 0.00 0.00  0.0 1.05  0.00 0.00  0.0 0.00
#> [5,] 0.00 0.00  0.0 0.00 67.99 0.00  0.0 0.00
#> [6,] 0.00 0.00  0.0 0.00  0.00 0.83  0.0 0.00
#> [7,] 0.00 0.00  0.0 0.00  0.00 0.00  0.3 0.00
#> [8,] 0.00 0.00  0.0 0.00  0.00 0.00  0.0 1.76

Finally, if save.memory = TRUE and collapse = TRUE, the output of learn_DAG() is a compressed and collapsed bcdag object collecting only the sampled DAGs represented as vector of strings:

comprcoll_out <- learn_DAG(S = 5000, burn = 1000, data = X,
                 a, U, w, 
                 fast = FALSE, save.memory = TRUE, collapse = TRUE)

names(comprcoll_out)
#> [1] "Graphs"
class(comprcoll_out)
#> [1] "bcdag"
attributes(comprcoll_out)$type
#> [1] "compressed and collapsed"
BCDAG:::bd_decode(comprcoll_out$Graphs[1])
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    0    0    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    1    0    0
#> [3,]    0    0    0    0    0    0    0    0
#> [4,]    1    0    0    0    0    0    0    0
#> [5,]    1    0    0    0    0    0    0    0
#> [6,]    0    0    0    0    0    0    0    0
#> [7,]    0    0    0    1    0    0    0    0
#> [8,]    1    0    1    0    0    0    0    0

A note on `fast = TRUE`

Step 1. of the MCMC scheme implemented by learn_DAG() updates DAG \(\mathcal{D}\) by randomly drawing a new candidate DAG \(\mathcal{D}'\) from a proposal distribution and then accepting it with probability given by the Metropolis Hastings (MH) acceptance rate; see also Castelletti & Mascaro (2021). For a given DAG \(\mathcal{D}\), the proposal distribution \(q(\mathcal{D}'\,|\,\mathcal{D})\) is built over the set \(\mathcal{O}_{\mathcal{D}}\) of direct successors DAGs that can be reached from \(\mathcal{D}\) by inserting, deleting or reversing a single edge in \(\mathcal{D}\). A DAG \(\mathcal{D}'\) is then proposed uniformly from the set \(\mathcal{O}_{\mathcal{D}}\) so that \(q(\mathcal{D}'\,|\,\mathcal{D})=1/|\mathcal{O}_{\mathcal{D}}|\). Moreover, the MH rate requires to evaluate the ratio of proposals \(q(\mathcal{D}'\,|\,\mathcal{D})/q(\mathcal{D}\,|\,\mathcal{D}') = |\mathcal{O}_{\mathcal{D}'}|/|\mathcal{O}_{\mathcal{D}}|\), and accordingly the construction of both \(\mathcal{O}_{\mathcal{D}}\) and \(\mathcal{O}_{\mathcal{D}'}\).

If fast = FALSE, the proposal ratio is computed exactly; this requires the enumerations of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\) which may become computationally expensive, especially when \(q\) is large. However, the ratio approaches \(1\) as the number of variables \(q\) increases: option fast = TRUE implements such an approximation, which therefore avoids the construction of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\). A comparison between fast = FALSE and fast = TRUE in the execution of learn_DAG() produces the following results in terms of computational time:

# No approximation
time_nofast <- system.time(out_nofast <- learn_DAG(S = 5000, burn = 1000, data = X, 
                      a, U, w, 
                      fast = FALSE, save.memory = FALSE, collapse = FALSE))
# Approximation
time_fast <- system.time(out_fast <- learn_DAG(S = 5000, burn = 1000, data = X, 
                      a, U, w, 
                      fast = TRUE, save.memory = FALSE, collapse = FALSE))

time_nofast
#>    user  system elapsed 
#>   5.973   0.081   6.110
time_fast
#>    user  system elapsed 
#>   2.326   0.015   2.346

Finally, the corresponding estimated posterior probabilities of edge inclusion are the following:

round(get_edgeprobs(out_nofast), 2)
#>      1    2    3    4 5    6    7    8
#> 1 0.00 0.06 0.07 0.00 0 0.00 0.00 0.00
#> 2 0.05 0.00 0.02 0.03 0 0.21 0.04 0.00
#> 3 0.09 0.02 0.00 0.00 0 0.01 0.02 0.52
#> 4 0.00 0.06 0.00 0.00 0 0.01 0.52 0.00
#> 5 1.00 0.00 0.00 0.00 0 0.00 0.01 0.00
#> 6 0.03 0.26 0.00 0.00 0 0.00 0.02 0.00
#> 7 0.03 0.01 0.00 0.48 0 0.03 0.00 0.02
#> 8 1.00 0.00 0.48 0.01 0 0.00 0.03 0.00
round(get_edgeprobs(out_fast), 2)
#>      1    2    3    4    5    6    7    8
#> 1 0.00 0.04 0.01 0.02 0.00 0.04 0.01 0.00
#> 2 0.05 0.00 0.00 0.03 0.00 0.20 0.01 0.00
#> 3 0.05 0.00 0.00 0.00 0.01 0.01 0.06 0.55
#> 4 0.02 0.03 0.00 0.00 0.00 0.02 0.55 0.01
#> 5 1.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00
#> 6 0.06 0.24 0.01 0.00 0.01 0.00 0.03 0.01
#> 7 0.00 0.02 0.03 0.45 0.00 0.01 0.00 0.00
#> 8 1.00 0.00 0.45 0.02 0.00 0.04 0.00 0.00

References

Ben-David E, Li T, Massam H, Rajaratnam B (2015). “High dimensional Bayesian inference for Gaussian directed acyclic graph models.” arXiv pre-print.
Cao X, Khare K, Ghosh M (2019). “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” The Annals of Statistics, 47(1), 319–348.
Castelletti F, Consonni G (2021). “Bayesian causal inference in probit graphical models” Bayesian Analysis, In press.
Castelletti F, Mascaro A (2021). “Structural learning and estimation of joint causal effects among network-dependent variables.” Statistical Methods & Applications, 30, 1289–1314.
Castelletti F, Mascaro A (2022). “BCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.” arXiv pre-print.
Godsill, SJ (2012). “On the relationship between Markov chain Monte Carlo methods for model uncertainty.” Journal of computational and graphical statistics, 10(2), 230-248.
Peluso S, Consonni G (2020). “Compatible priors for model selection of high-dimensional Gaussian DAGs.” Electronic Journal of Statistics, 14(2), 4110–4132.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.

MCMC scheme for posterior inference of Gaussian DAG models: the learn_DAG() function