The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
After fitting bpgmm, posterior samples can be inspected
through chain-level summaries, trace plots, and co-clustering
probabilities. These diagnostics complement the model-selection
example.
The chains below are deliberately short so that the vignette builds quickly. Applied analyses should use longer chains and examine stability across independent runs.
library(bpgmm)
set.seed(2029)
X <- cbind(
matrix(rnorm(10, mean = -2.0, sd = 0.25), nrow = 2),
matrix(rnorm(10, mean = 0.0, sd = 0.25), nrow = 2),
matrix(rnorm(10, mean = 2.0, sd = 0.25), nrow = 2)
)
known_labels <- rep(1:3, each = 5)cluster_cols <- c("#0072B2", "#D55E00", "#009E73", "#CC79A7")
plot(
X[1, ], X[2, ],
col = cluster_cols[known_labels],
pch = 19,
xlab = "Variable 1",
ylab = "Variable 2",
main = "Diagnostic example",
asp = 1
)pgmm_rjmcmc_chains() runs independent chains. The
vignette uses cores = 1 for CRAN portability, but users can
increase cores locally.
For chain \(c\), simple empirical summaries are
\[ \widehat{p}_c(m = r \mid X) = \frac{1}{S_c}\sum_{s=1}^{S_c} I(m_c^{(s)} = r), \qquad \widehat{p}_c(v = a \mid X) = \frac{1}{S_c}\sum_{s=1}^{S_c} I(v_c^{(s)} = a). \]
Large disagreements across chains are a warning sign, especially when the chains use different random seeds.
chain_summaries <- lapply(fits, summarize_pgmm_rjmcmc, true_cluster = known_labels)
data.frame(
chain = names(chain_summaries),
ari = vapply(chain_summaries, function(x) x$ari, numeric(1)),
modal_clusters = vapply(chain_summaries, function(x) {
as.integer(names(which.max(x$n_clusters)))
}, integer(1))
)
#> chain ari modal_clusters
#> chain_1 chain_1 1.00 3
#> chain_2 chain_2 0.16 3The posterior samples store the active cluster indicators and covariance constraint at each saved iteration. These traces are first diagnostics.
cluster_count_trace <- function(fit) {
vapply(fit$active_cluster_samples, sum, numeric(1))
}
constraint_trace <- function(fit) {
vapply(fit$constraint_samples, constraint_to_model, character(1))
}
cluster_traces <- lapply(fits, cluster_count_trace)
constraint_traces <- lapply(fits, constraint_trace)
cluster_traces
#> $chain_1
#> [1] 3 3 3 3
#>
#> $chain_2
#> [1] 3 3 3 3
constraint_traces
#> $chain_1
#> [1] "UUU" "CUU" "CUC" "CCC"
#>
#> $chain_2
#> [1] "UUU" "UCU" "UCC" "UCC"old_par <- par(mar = c(4, 4, 3, 1))
plot(
cluster_traces[[1]],
type = "b",
pch = 19,
ylim = range(unlist(cluster_traces)),
col = "#0072B2",
xlab = "Saved iteration",
ylab = "Active clusters",
main = "Cluster-count trace"
)
lines(cluster_traces[[2]], type = "b", pch = 19, col = "#D55E00")
legend("topright", legend = names(fits), col = c("#0072B2", "#D55E00"), lty = 1, pch = 19, bty = "n")A co-clustering matrix estimates how often two observations are assigned to the same cluster across posterior samples. It reports pairwise uncertainty rather than only a single modal allocation.
\[ C_{ij} = \Pr(z_i = z_j \mid X) \approx \frac{1}{S}\sum_{s=1}^{S} I\{z_i^{(s)} = z_j^{(s)}\}. \]
co_clustering_matrix <- function(fit) {
n <- length(fit$allocation_samples[[1]])
out <- matrix(0, n, n)
for (allocation in fit$allocation_samples) {
out <- out + outer(allocation, allocation, "==")
}
out / length(fit$allocation_samples)
}
co_mat <- co_clustering_matrix(fits[[1]])
round(co_mat[1:6, 1:6], 2)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1.00 1.00 1.00 1.00 1.00 0.25
#> [2,] 1.00 1.00 1.00 1.00 1.00 0.25
#> [3,] 1.00 1.00 1.00 1.00 1.00 0.25
#> [4,] 1.00 1.00 1.00 1.00 1.00 0.25
#> [5,] 1.00 1.00 1.00 1.00 1.00 0.25
#> [6,] 0.25 0.25 0.25 0.25 0.25 1.00image(
seq_len(nrow(co_mat)),
seq_len(ncol(co_mat)),
co_mat[nrow(co_mat):1, ],
col = hcl.colors(20, "YlGnBu", rev = TRUE),
xlab = "Observation",
ylab = "Observation",
main = "Posterior co-clustering"
)Warning signs include:
m_range;v_step = 1;q_new, or
m_range.These diagnostics do not replace a long MCMC analysis, but they give a concise description of the posterior output.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.