The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette introduces the FBMS package and shows how to perform Flexible Bayesian Model Selection (BMS) and Bayesian Model Averaging (BMA) for linear, generalized, nonlinear, fractional–polynomial, mixed–effect, and logic–regression models. More details are provided in the paper: “FBMS: An R Package for Flexible Bayesian Model Selection and Model Averaging” available on arxiv: more explicit examples accompanying the paper can be found on github https://github.com/jonlachmann/GMJMCMC/tree/data-inputs/tests_current
Load technical markdown settings and a custom function for printing only what is needed through printn()
.
\[ P(\Delta|D) = \sum_{m\in\Omega} P(m|D)\, \int_{\Theta_m} P(\Delta|m,\theta,D)\, P(\theta|m,D)\, d\theta. \]
Reference: [@hubin2020flexible]
\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y \mid \mu_i,\phi), \quad i = 1,\dots,n,\\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{q} \gamma_j \beta_j F_j(\mathbf{x}_i, \boldsymbol{\alpha}_j) + \sum_{k=1}^{r} \delta_k. \end{aligned} \]
Depending on allowed nonlinear functions \(\mathbb{G}\): neural nets (sigmoid), decision trees (thresholds), MARS (hinges), fractional polynomials, logic regression, etc.
FBMS
Name | Function | Name | Function |
---|---|---|---|
sigmoid | \(1 / (1 + \exp(-x))\) | sqroot | \(|x|^{1/2}\) |
relu | \(\max(x, 0)\) | troot | \(|x|^{1/3}\) |
nrelu | \(\max(-x, 0)\) | sin_deg | \(\sin(x/180 \cdot \pi)\) |
hs | \(x > 0\) | cos_deg | \(\cos(x/180 \cdot \pi)\) |
nhs | \(x < 0\) | exp_dbl | \(\exp(-|x|)\) |
gelu | \(x \Phi(x)\) | gauss | \(\exp(-x^2)\) |
ngelu | \(-x \Phi(-x)\) | erf | \(2 \Phi(\sqrt{2}x) - 1\) |
pm2 | \(x^{-2}\) | p0pm2 | \(p0(x) \cdot x^{-2}\) |
pm1 | \(\text{sign}(x) |x|^{-1}\) | p0pm05 | \(p0(x) \cdot |x|^{-0.5}\) |
pm05 | \(|x|^{-0.5}\) | p0p0 | \(p0(x)^2\) |
p05 | \(|x|^{0.5}\) | p0p05 | \(p0(x) \cdot |x|^{0.5}\) |
p2 | \(x^2\) | p0p1 | \(p0(x) \cdot x\) |
p3 | \(x^3\) | p0p2 | \(p0(x) \cdot x^2\) |
p0p3 | \(p0(x) \cdot x^3\) |
Custom functions can be added to \(\mathbb{G}\).
Let \(M = (\gamma_1, \dots, \gamma_q)\) (for linear models \(q=p\)).
Penalizing complexity priors
\[ P(M) \propto \mathbb{I}(|\boldsymbol\gamma_{1:q}| \le Q) \prod_{j=1}^q r^{\gamma_j c(F_j(\mathbf{x}, \boldsymbol{\alpha}_j))}, \quad 0 < r < 1, \]
\(c(F_j(\cdot))\): complexity measure (linear models: \(c(x)=1\); BGNLM counts algebraic operators).
The following parameter will be used to change the prior penalization.
# model prior complexity penalty
= list(r = 0.005) #default is 1/n model_prior
Mixtures of g-priors
\[ \begin{aligned} P(\beta_0, \phi \mid M) &\propto \phi^{-1}, \\ P(\boldsymbol{\beta} \mid g) &\sim \mathbb{N}_{|M|}\!\left(\mathbf{0},\, g \cdot \phi\, J_n(\boldsymbol{\beta})^{-1}\right), \\ \frac{1}{1+g} &\sim \text{tCCH}\!\left(\frac{a}{2}, \frac{b}{2}, \rho, \frac{s}{2}, v, \kappa\right). \end{aligned} \]
Jeffreys prior
\[ P(\phi \mid M) = \phi^{-1}, \quad P(\beta_0, \boldsymbol{\beta} \mid M) = |J_n(\beta_0, \boldsymbol{\beta})|^{1/2}. \]
All priors from the table below (default is the g-prior)
Prior (Alias) | \(a\) | \(b\) | \(\rho\) | \(s\) | \(v\) | \(k\) | Families |
---|---|---|---|---|---|---|---|
Default: | |||||||
g-prior |
\(g\) (default: \(\max(n, p^2)\)) | GLM | |||||
tCCH-Related Priors: | |||||||
CH |
\(a\) | \(b\) | 0 | \(s\) | 1 | 1 | GLM |
uniform |
2 | 2 | 0 | 0 | 1 | 1 | GLM |
Jeffreys |
0 | 2 | 0 | 0 | 1 | 1 | GLM |
beta.prime |
\(1/2\) | \(n - p_M - 1.5\) | 0 | 0 | 1 | 1 | GLM |
benchmark |
0.02 | \(0.02 \max(n, p^2)\) | 0 | 0 | 1 | 1 | GLM |
TG |
\(2a\) | 2 | 0 | \(2s\) | 1 | 1 | GLM |
ZS-adapted |
1 | 2 | 0 | \(n + 3\) | 1 | 1 | GLM |
robust |
1 | 2 | 1.5 | 0 | \((n+1)/(p_M+1)\) | 1 | GLM |
hyper-g-n |
1 | 2 | 1.5 | 0 | 1 | \(1/n\) | GLM |
intrinsic |
1 | 1 | 1 | 0 | \((n + p_M + 1)/(p_M + 1)\) | \((n + p_M + 1)/n\) | GLM |
tCCH |
\(a\) | \(b\) | \(\rho\) | \(s\) | \(v\) | \(k\) | GLM |
Other Priors: | |||||||
EB-local |
\(a\) | GLM | |||||
EB-global |
\(a\) | G | |||||
JZS |
\(a\) | G | |||||
ZS-null |
\(a\) | G | |||||
ZS-full |
\(a\) | G | |||||
hyper-g |
\(a\) | GLM | |||||
hyper-g-laplace |
\(a\) | G | |||||
AIC |
None | GLM | |||||
BIC |
None | GLM | |||||
Jeffreys-BIC |
Var | GLM |
Here \(p_M\) is the number of predictors excluding the intercept. “G” denotes Gaussian-only; “GLM” additionally includes binomial, Poisson, and gamma.
How to switch priors in code (be explicit):
# g-prior with g = 1000
= list(type = "g-prior", alpha = 1000)
beta_prior
# Robust prior (tune by Table parameters)
= list(type = "robust")
beta_prior
# Jeffreys-BIC
= list(type = "Jeffreys-BIC")
beta_prior
# Generic tCCH (provide all hyperparameters explicitly)
= list(type = "tCCH", a = 2, b = 2, rho = 0, s = 0, v = 1, k = 1) beta_prior
Marginal likelihood \[ P(D|M) = \int_{\Theta_M} P(D|\theta_M, M) \, P(\theta_M|M) \, d\theta_M. \]
Posterior \[ P(M|D) = \frac{P(D|M) P(M)}{\sum_{M' \in \Omega} P(D|M') P(M')}. \]
Approximation over discovered models \(\Omega^*\) \[ P(M|D) \approx \frac{P(D|M) P(M)}{\sum_{M' \in \Omega^*} P(D|M') P(M')}, \quad M \in \Omega^*. \]
Marginal inclusion probability \[ P(\gamma_j=1|D) \approx \sum_{M \in \Omega^*: \gamma_j=1} P(M|D). \]
Run multiple chains and aggregate unique models \(\Omega^*\):
\[ \widehat{P}(\Delta|D) = \sum_{M \in \Omega^*} P(\Delta|M,D) \, \widehat{P}(M|D). \]
# Parameters for parallel runs are set to a single thread and single core to comply with CRAN requirenments (please tune for your machine if you have more capacity)
<- 1 # 1 set for simplicity; use rather 16 or more
runs <- 1 # 1 set for simplicity; use rather 8 or more cores
Data: \(n=939\) exoplanets; variables include semimajoraxis
, period
, hoststar_mass
, etc.
Target relationship: \({a \approx K_2 \left(P^2 M_h\right)^{1/3}}\).
We shall run a single chain GMJMCMC
# Load example
<- FBMS::exoplanet
data
# Choose a small but expressive transform set for a quick demo
<- c("sigmoid", "sin_deg", "exp_dbl", "p0", "troot", "p3")
transforms
# ---- fbms() call (simple GMJMCMC) ----
# Key parameters (explicit):
# - formula : semimajoraxis ~ 1 + . # response and all predictors
# - data : data # dataset
# - beta_prior : list(type = "g-prior") # parameter prior
# - model_prior : list(r = 1/dim(data)[1]) # model prior
# - method : "gmjmcmc" # exploration strategy
# - transforms : transforms # nonlinear feature dictionary
# - P : population size per generation (search breadth)
<- fbms(
result_single formula = semimajoraxis ~ 1 + .,
data = data,
beta_prior = list(type = "g-prior", alpha = dim(data)[1]),
model_prior = list(r = 1/dim(data)[1]),
method = "gmjmcmc",
transforms = transforms,
P = 10
)
# Summarize
printn(summary(result_single))
##
## Best population: 7 log marginal posterior: 3001.64
##
## feats.strings marg.probs
## 1 troot((period*sigmoid(mass))) 1.00000000
## 2 ((period*hoststar_mass)*period) 1.00000000
## 3 p3((period*hoststar_mass)) 1.00000000
## 4 (troot(period)*troot(period)) 1.00000000
## 5 (period*hoststar_mass) 1.00000000
## 6 (sigmoid(mass)*(period*hoststar_mass)) 1.00000000
## 7 hoststar_mass 0.91282456
## 8 hoststar_temperature 0.01243301
and a parallel GMJMCMC
# ---- fbms() call (parallel GMJMCMC) ----
# Key parameters (explicit):
# - formula : semimajoraxis ~ 1 + . # response and all predictors
# - data : data # dataset
# - beta_prior : list(type = "g-prior") # parameter prior
# - model_prior : list(r = 1/dim(data)[1]) # model prior
# - method : "gmjmcmc" # exploration strategy
# - transforms : transforms # nonlinear feature dictionary
# - runs, cores : parallelization controls
# - P : population size per generation (search breadth)
<- fbms(
result_parallel formula = semimajoraxis ~ 1 + .,
data = data,
beta_prior = list(type = "g-prior", alpha = dim(data)[1]),
model_prior = list(r = 1/dim(data)[1]),
method = "gmjmcmc.parallel",
transforms = transforms,
runs = runs, # by default the rmd has runs = 1; increase for convergence
cores = cores, # by default the rmd has cores = 1; increase for convergence
P = 10
)
# Summarize
printn(summary(result_parallel))
##
## Best population: 10 thread: 1 log marginal posterior: 2144.064
##
## feats.strings marg.probs
## 1 period 1.0000000
## 2 (period*mass) 1.0000000
## 3 ((period*mass)*(period*mass)) 1.0000000
## 4 sigmoid(period) 1.0000000
## 5 p0(period) 1.0000000
## 6 ((period*mass)*(radius*(period*period))) 0.9999902
## 7 (radius*(period*period)) 0.9999844
## 8 (hoststar_temperature*p0(((period*mass)*(period*mass)))) 0.9945983
## 9 eccentricity 0.9554446
Plot output
plot(result_parallel)
Convergence plots
diagn_plot(result_parallel)
Reference: [@hubin2018mode]
\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y\mid \mu_i, \phi), \quad i=1,\dots,n, \\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{p} \gamma_j \beta_j x_{ij}. \end{aligned} \]
We simulate data with a known sparse truth and run MJMCMC with an explicit g-prior.
library(mvtnorm)
<- 100 # sample size
n <- 20 # number of covariates
p <- 5 # size of true submodel
k
<- 1:k
correct.model <- (1:5)/5
beta.k
<- rep(0, p)
beta <- beta.k
beta[correct.model]
set.seed(123)
<- rmvnorm(n, rep(0, p))
x <- x %*% beta + rnorm(n)
y
# Standardize
<- scale(y)
y <- scale(x) / sqrt(n)
X
<- as.data.frame(cbind(y, X))
df colnames(df) <- c("Y", paste0("X", seq_len(ncol(df) - 1)))
printn(correct.model)
## [1] 1 2 3 4 5
printn(beta.k)
## [1] 0.2 0.4 0.6 0.8 1.0
Run MJMCMC with a g-prior (g = 100)
# ---- fbms() call (MJMCMC) ----
# Explicit prior choice:
# beta_prior = list(type = "g-prior", alpha = 100)
# To switch to another prior, e.g. robust:
# beta_prior = list(type = "robust")
<- fbms(
result.lin formula = Y ~ 1 + .,
data = df,
method = "mjmcmc",
N = 5000, # number of iterations
beta_prior = list(type = "g-prior", alpha = 100)
)
Plot results
plot(result.lin)
Summarize with posterior effects
# 'effects' specifies quantiles for posterior modes of effects across models
printn(summary(result.lin, effects = c(0.5, 0.025, 0.975)))
##
## Best log marginal posterior: 56.2259
##
## $PIP
## feats.strings marg.probs
## 1 X4 1.00000000
## 2 X3 1.00000000
## 3 X5 1.00000000
## 4 X2 0.95262397
## 5 X1 0.16880319
## 6 X18 0.10464730
## 7 X20 0.10105529
## 8 X15 0.08767336
## 9 X10 0.08257391
## 10 X16 0.08200567
## 11 X13 0.07741378
## 12 X12 0.07474969
## 13 X11 0.07090813
## 14 X19 0.06903813
## 15 X7 0.06899870
## 16 X14 0.06586870
## 17 X9 0.05690408
## 18 X8 0.05592665
## 19 X6 0.05154572
## 20 X17 0.04739590
##
## $EFF
## Covariate quant_0.5 quant_0.025 quant_0.975
## 1 intercept 0 0 0
## 2 X1 0 0 0.6606
## 3 X2 1.7024 0 1.7383
## 4 X3 4.8118 4.7487 4.8867
## 5 X4 4.9803 4.9086 5.0868
## 6 X5 4.7744 4.3659 4.8573
## 7 X6 0 -0.0694 0
## 8 X7 0 -0.1382 0
## 9 X8 0 0 0.072
## 10 X9 0 -0.1875 0
## 11 X10 0 0 0.5609
## 12 X11 0 0 0.444
## 13 X12 0 -0.2502 0
## 14 X13 0 -0.3655 0
## 15 X14 0 0 0.2716
## 16 X15 0 0 0.5654
## 17 X16 0 0 0.5074
## 18 X17 0 0 0.2266
## 19 X18 0 0 0.4858
## 20 X19 0 -0.2733 0
## 21 X20 0 -0.5012 0
Run parallel MJMCMC
# ---- fbms() call (parallel MJMCMC) ----
# Explicit prior choice:
# beta_prior = list(type = "g-prior", alpha = 100)
# To switch to another prior, e.g. robust:
# beta_prior = list(type = "robust")
# method = mjmcmc.parallel
# runs, cores : parallelization controls
<- fbms(
result.lin.par formula = Y ~ 1 + .,
data = df,
method = "mjmcmc.parallel",
N = 5000, # number of iterations
beta_prior = list(type = "g-prior", alpha = 100),
runs = runs,
cores = cores
)printn(summary(result.lin.par, effects = c(0.5, 0.025, 0.975)))
##
## Best log marginal posterior: 56.2259
##
## $PIP
## feats.strings marg.probs
## 1 X4 1.00000000
## 2 X3 1.00000000
## 3 X5 1.00000000
## 4 X2 0.95518016
## 5 X1 0.17518018
## 6 X10 0.10751407
## 7 X20 0.10044924
## 8 X18 0.09363710
## 9 X16 0.09089584
## 10 X15 0.08729800
## 11 X13 0.08418363
## 12 X19 0.07302402
## 13 X14 0.07262783
## 14 X11 0.06881624
## 15 X17 0.06104115
## 16 X9 0.05570830
## 17 X12 0.05215211
## 18 X6 0.04938128
## 19 X7 0.04711947
## 20 X8 0.04447923
##
## $EFF
## Covariate quant_0.5 quant_0.025 quant_0.975
## 1 intercept 0 0 0
## 2 X1 0 0 0.6606
## 3 X2 1.7027 0 1.7394
## 4 X3 4.8118 4.7487 4.8885
## 5 X4 4.9803 4.9043 5.0888
## 6 X5 4.7744 4.3659 4.8576
## 7 X6 0 -0.0694 0
## 8 X7 0 -0.1346 0
## 9 X8 0 0 0.072
## 10 X9 0 -0.1875 0
## 11 X10 0 0 0.5625
## 12 X11 0 0 0.4382
## 13 X12 0 -0.2501 0
## 14 X13 0 -0.3742 0
## 15 X14 0 0 0.2739
## 16 X15 0 0 0.5567
## 17 X16 0 0 0.5074
## 18 X17 0 0 0.2266
## 19 X18 0 0 0.4718
## 20 X19 0 -0.2733 0
## 21 X20 0 -0.511 0
Reference: [@hubin2023fractional]
We augment the linear example to follow an FP truth and fit with GMJMCMC.
\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y|\mu_i,\phi),\\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{p} \sum_{k \in K} \gamma_{jk}\, \beta_{jk}\, \rho_k(x_{ij}), \quad \text{with } K = \mathbf{F}_0 \cup \mathbf{F}_1 \cup \mathbf{F}_2, \end{aligned} \]
# Create FP-style response with known structure, covariates are from previous example
$Y <- p05(df$X1) + df$X1 + pm05(df$X3) + p0pm05(df$X3) + df$X4 +
dfpm1(df$X5) + p0(df$X6) + df$X8 + df$X10 + rnorm(nrow(df))
# Allow common FP transforms
<- c(
transforms "p0", "p2", "p3", "p05", "pm05", "pm1", "pm2", "p0p0",
"p0p05", "p0p1", "p0p2", "p0p3", "p0p05", "p0pm05", "p0pm1", "p0pm2"
)
# Generation probabilities — here only modifications and mutations
<- gen.probs.gmjmcmc(transforms)
probs $gen <- c(0, 1, 0, 1)
probs
# Feature-generation parameters
<- gen.params.gmjmcmc(ncol(df) - 1)
params $feat$D <- 1 # max depth 1 features params
Run GMJMCMC (single-thread)
<- fbms(
result formula = Y ~ 1 + .,
data = df,
method = "gmjmcmc",
transforms = transforms,
beta_prior = list(type = "Jeffreys-BIC"),
probs = probs,
params = params,
P = 10
)
printn(summary(result))
##
## Best population: 10 log marginal posterior: -206.4266
##
## feats.strings marg.probs
## 1 pm05(X3) 1.0000000000
## 2 p0p0(X3) 1.0000000000
## 3 pm1(X5) 1.0000000000
## 4 p0pm05(X6) 0.9999959549
## 5 p0pm1(X3) 0.9359835862
## 6 X17 0.8981659238
## 7 X15 0.8795347468
## 8 X10 0.8584612830
## 9 X9 0.8507841966
## 10 p0pm2(X13) 0.7855348631
## 11 X20 0.1825368449
## 12 X4 0.0362927730
## 13 X7 0.0166240434
## 14 X13 0.0079441065
## 15 X2 0.0002276410
## 16 p0p3(X20) 0.0001784105
Parallel GMJMCMC
<- fbms(
result_parallel formula = Y ~ 1 + .,
data = df,
method = "gmjmcmc.parallel",
transforms = transforms,
beta_prior = list(type = "Jeffreys-BIC"),
probs = probs,
params = params,
P = 10,
runs = runs,
cores = cores
)
printn(summary(result_parallel))
##
## Best population: 10 thread: 1 log marginal posterior: -506.0695
##
## feats.strings marg.probs
## 1 p0pm05(X3) 1.000000000
## 2 X5 1.000000000
## 3 pm2(X5) 1.000000000
## 4 X3 0.398001125
## 5 p0p0(X3) 0.001258547
Dataset: \(n=659\) kids; response \(y\): standardized height; covariates: c.bf, c.age, m.ht, m.bmi, reg
. Random intercept for dr.
We specify a custom estimator that uses a mixed model (via lme4
), and plug it into fbms()
with family = "custom"
. We pass extra parameters of the estimator through mlpost_params = list(dr = dr,r = r)
# Custom approximate log marginal likelihood for mixed model using Laplace approximation
<- function (y, x, model, complex, mlpost_params) {
mixed.model.loglik.lme4 if (sum(model) > 1) {
<- x[, model]
x.model <- data.frame(y, x = x.model[, -1], dr = mlpost_params$dr)
data <- lmer(as.formula(paste0("y ~ 1 +",
mm paste0(names(data)[2:(ncol(data)-1)], collapse = "+"),
" + (1 | dr)")), data = data, REML = FALSE)
else {
} <- data.frame(y, dr = mlpost_params$dr)
data <- lmer(y ~ 1 + (1 | dr), data = data, REML = FALSE)
mm
}# log marginal likelihood (Laplace approx) + log model prior
<- as.numeric(logLik(mm)) - 0.5 * log(length(y)) * (ncol(data) - 2)
mloglik if (length(mlpost_params$r) == 0) mlpost_params$r <- 1 / nrow(x)
<- log_prior(mlpost_params, complex)
lp list(crit = mloglik + lp, coefs = fixef(mm))
}
library(lme4)
## Loading required package: Matrix
data(Zambia, package = "cAIC4")
<- as.data.frame(sapply(Zambia[1:5], scale))
df
<- c("p0","p2","p3","p05","pm05","pm1","pm2",
transforms "p0p0","p0p05","p0p1","p0p2","p0p3",
"p0p05","p0pm05","p0pm1","p0pm2")
<- gen.probs.gmjmcmc(transforms)
probs $gen <- c(1, 1, 0, 1) # include modifications and interactions
probs
<- gen.params.gmjmcmc(ncol(df) - 1)
params $feat$D <- 1
params$feat$pop.max <- 10
params
<- fbms(
result2a formula = z ~ 1 + .,
data = df,
method = "gmjmcmc.parallel",
transforms = transforms,
probs = probs,
params = params,
P = 10,
N = 100,
runs = runs,
cores = cores,
family = "custom",
loglik.pi = mixed.model.loglik.lme4,
model_prior = list(r = 1 / nrow(df)), # model_prior is passed to mlpost_params
extra_params = list(dr = droplevels(Zambia$dr)) # extra_params are passed to mlpost_params
)
printn(summary(result2a, tol = 0.05, labels = names(df)[-1]))
##
## Best population: 6 thread: 1 log marginal posterior: -876.5476
##
## feats.strings marg.probs
## 1 c.bf 1.0000000
## 2 c.age 1.0000000
## 3 m.bmi 1.0000000
## 4 m.ht 1.0000000
## 5 (c.age*c.age) 0.9999999
Reference: [@hubin2020logic]
\[ \mathsf{h}(\mu_i) = \beta_0 + \sum_{j=1}^{q} \gamma_j \beta_j L_{ji}, \quad L_{ji} \text{ are logic trees (e.g., } (x_{i1}\wedge x_{i2}) \vee x_{i3}^c ). \]
We generate Boolean covariates and a known logic signal, define a custom estimator with a logic prior, and fit via GMJMCMC.
<- 2000
n <- 50
p
set.seed(1)
<- as.data.frame(matrix(rbinom(n * p, size = 1, prob = runif(n * p, 0, 1)), n, p))
X2 <- 1 + 7*(X2$V4*X2$V17*X2$V30*X2$V10) + 9*(X2$V7*X2$V20*X2$V12) +
y2.Mean 3.5*(X2$V9*X2$V2) + 1.5*(X2$V37)
<- rnorm(n, mean = y2.Mean, sd = 1)
Y2 <- data.frame(Y2, X2)
df
# Train/test split
<- df[1:(n/2), ]
df.training <- df[(n/2 + 1):n, ]
df.test $Mean <- y2.Mean[(n/2 + 1):n] df.test
Custom estimator with logic regression priors
<- function(y, x, model, complex, mlpost_params) {
estimate.logic.lm suppressWarnings({
<- fastglm(as.matrix(x[, model]), y, family = gaussian())
mod
})<- -(mod$aic + (log(length(y)) - 2) * (mod$rank)) / 2
mloglik <- complex$width
wj <- sum(log(factorial(wj))) - sum(wj * log(4 * mlpost_params$p) - log(4))
lp <- mloglik + lp
logpost if (logpost == -Inf) logpost <- -10000
list(crit = logpost, coefs = mod$coefficients)
}
Run GMJMCMC
set.seed(5001)
# Only "not" operator; "or" is implied by De Morgan via "and" + "not"
<- c("not")
transforms <- gen.probs.gmjmcmc(transforms)
probs $gen <- c(1, 1, 0, 1) # no projections
probs
<- gen.params.gmjmcmc(p)
params $feat$pop.max <- 50
params$feat$L <- 15
params
<- fbms(
result formula = Y2 ~ 1 + .,
data = df.training,
method = "gmjmcmc",
transforms = transforms,
N = 500,
P = 10,
family = "custom",
loglik.pi = estimate.logic.lm,
probs = probs,
params = params,
model_prior = list(p = p)
)
printn(summary(result))
##
## Best population: 9 log marginal posterior: -1912.786
##
## feats.strings marg.probs
## 1 ((V4*V10)*V17) 1.0000000000
## 2 (V9*V2) 1.0000000000
## 3 V37 1.0000000000
## 4 ((V20*V7)*V12) 1.0000000000
## 5 (V33*(V30*(V4*V17))) 0.9999999179
## 6 V30 0.9999969055
## 7 V17 0.0137256351
## 8 V32 0.0025183771
## 9 V35 0.0020386609
## 10 V12 0.0013621295
## 11 V45 0.0012730261
## 12 V18 0.0012697929
## 13 V16 0.0010825818
## 14 V41 0.0010734783
## 15 V44 0.0009967761
## 16 V31 0.0009461623
## 17 V8 0.0008938150
## 18 V13 0.0007713390
## 19 V3 0.0006926010
## 20 V24 0.0006547320
## 21 V21 0.0006439233
## 22 V49 0.0006393027
## 23 V9 0.0006283463
## 24 V36 0.0006209956
## 25 V15 0.0006131099
# Extract models
<- get.mpm.model(result, y = df.training$Y2, x = df.training[,-1],
mpm family = "custom", loglik.pi = estimate.logic.lm, params = list(p = 50))
printn(mpm$coefs)
## (Intercept) ((V4*V10)*V17) (V9*V2)
## 0.6578016 2.7049544 3.5767605
## V30 V37 (V33*(V30*(V4*V17)))
## 0.6051133 1.3582049 2.1279042
## ((V20*V7)*V12)
## 9.1213744
<- get.mpm.model(result, y = df.training$Y2, x = df.training[,-1])
mpm2 printn(mpm2$coefs)
## (Intercept) ((V4*V10)*V17) (V9*V2)
## 0.6578016 2.7049544 3.5767605
## V30 V37 (V33*(V30*(V4*V17)))
## 0.6051133 1.3582049 2.1279042
## ((V20*V7)*V12)
## 9.1213744
<- get.best.model(result)
mbest printn(mbest$coefs)
## Intercept ((V4*V10)*V17) (V9*V2)
## 0.6578016 2.7049544 3.5767605
## V30 V37 (V33*(V30*(V4*V17)))
## 0.6051133 1.3582049 2.1279042
## ((V20*V7)*V12)
## 9.1213744
Prediction
# Correct link is identity for Gaussian
<- predict(result, x = df.test[,-1], link = function(x) x)
pred <- predict(mpm, x = df.test[,-1], link = function(x) x)
pred_mpm <- predict(mbest, x = df.test[,-1], link = function(x) x)
pred_best
# RMSEs
printn(sqrt(mean((pred$aggr$mean - df.test$Y2)^2)))
## [1] 1.528321
printn(sqrt(mean((pred_mpm - df.test$Y2)^2)))
## [1] 1.528355
printn(sqrt(mean((pred_best - df.test$Y2)^2)))
## [1] 1.528355
printn(sqrt(mean((df.test$Mean - df.test$Y2)^2)))
## [1] 1.028222
# Errors to the true mean (oracle)
printn(sqrt(mean((pred$aggr$mean - df.test$Mean)^2)))
## [1] 1.076062
printn(sqrt(mean((pred_best - df.test$Mean)^2)))
## [1] 1.076113
printn(sqrt(mean((pred_mpm - df.test$Mean)^2)))
## [1] 1.076113
# Quick diagnostic plot
plot(pred$aggr$mean, df.test$Y2,
xlab = "Predicted (BMA)", ylab = "Observed")
points(pred$aggr$mean, df.test$Mean, col = 2)
points(pred_best, df.test$Mean, col = 3)
points(pred_mpm, df.test$Mean, col = 4)
spam
dataWe fit a binomial BGNLM and compare BMA, best-model, and MPM predictions.
Important: specify the correct link in predict()
(here logistic).
library(kernlab)
data("spam")
<- spam[, c(58, 1:57)]
df <- nrow(df)
n <- ncol(df) - 1
p
colnames(df) <- c("y", paste0("x", 1:p))
$y <- as.numeric(df$y == "spam")
df
<- function(x) x^3
to3 <- c("sigmoid","sin_deg","exp_dbl","p0","troot","to3")
transforms <- gen.probs.gmjmcmc(transforms)
probs $gen <- c(1, 1, 1, 1)
probs
<- gen.params.gmjmcmc(p)
params $feat$check.col <- FALSE
params
set.seed(6001)
<- fbms(
result formula = y ~ 1 + .,
data = df,
method = "gmjmcmc",
family = "binomial",
beta_prior = list(type = "Jeffreys-BIC"),
transforms = transforms,
probs = probs,
params = params
)
printn(summary(result))
##
## Best population: 2 log marginal posterior: -1048.305
##
## feats.strings marg.probs
## 1 x5 1.000000000
## 2 x8 1.000000000
## 3 x55 1.000000000
## 4 sigmoid(x21) 1.000000000
## 5 x16 1.000000000
## 6 x20 1.000000000
## 7 x21 1.000000000
## 8 x23 1.000000000
## 9 x25 1.000000000
## 10 x27 1.000000000
## 11 sigmoid(x7) 1.000000000
## 12 p0(x17) 1.000000000
## 13 to3(x25) 1.000000000
## 14 x41 1.000000000
## 15 x42 1.000000000
## 16 x44 1.000000000
## 17 x12 1.000000000
## 18 x6 1.000000000
## 19 sigmoid(x52) 1.000000000
## 20 x53 1.000000000
## 21 x52 1.000000000
## 22 x45 1.000000000
## 23 (x50*x25) 1.000000000
## 24 p0(x19) 1.000000000
## 25 p0(x22) 0.999999632
## 26 sigmoid(1+1*x16+1*x55) 0.999999632
## 27 (x24*x3) 0.998106918
## 28 x2 0.998091170
## 29 x28 0.998090532
## 30 troot(x27) 0.998090188
## 31 x39 0.998090175
## 32 x32 0.998090164
## 33 x46 0.994872577
## 34 x19 0.994843934
## 35 troot(x8) 0.807086602
## 36 x24 0.807017530
## 37 x49 0.806845676
## 38 x29 0.036497149
## 39 x40 0.017892855
## 40 x10 0.014132910
## 41 sigmoid(x46) 0.003217954
Prediction accuracy
# Logistic link
<- function(x) 1/(1 + exp(-x))
invlogit
# Model averaging
<- predict(result, x = df[,-1], link = invlogit)
pred printn(mean(round(pred$aggr$mean) == df$y))
## [1] 0.9397957
# Best model
<- get.best.model(result)
bm <- predict(bm, df[,-1], link = invlogit)
preds printn(mean(round(preds) == df$y))
## [1] 0.9402304
# Median Probability Model
<- get.mpm.model(result, family = "binomial", y = df$y, x = df[,-1])
mpm <- predict(mpm, df[,-1], link = invlogit)
preds_mpm printn(mean(round(preds_mpm) == df$y))
## [1] 0.9402304
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.