runMCMCbtadjust Presentation

Frédéric Gosselin

2023-06-16

Introduction

This file is meant to present the function runMCMC_btadjust in the runMCMCbtadjust package. The aim of this function is to run a Markov Chain Monte Carlo (MCMC) for a specified Bayesian model while adapting automatically the burn-in and thinning parameters to meet pre-specified targets in terms of MCMC convergence and number of effective values of MCMC outputs - where the term “number of effective values” for the MCMC outputs refers to sample size adjusted for autocorrelation. This is done in only one call to the function that repeatedly calls the MCMC until criteria for convergence and number of effective values are met. This allows to obtain a MCMC output that is out of the transient phase of the MCMC (convergence) and that contains a pre-specified number of nearly independent draws from the posterior distribution (number of effective values).

This function has four main advantages: (i) it saves the analyst’s programming time since he/she does not have to repeatedly diagnose and re-run MCMCs until desired levels of convergence and number of effective values are reached; (ii) it allows a minimal, normalized quality control of MCMC outputs by allowing to meet pre-specified levels in terms of convergence and number of quasi-independent values; (iii) it may save computer’s time when compared to cases where we have to restart the MCMC from the beginning if it has not converged or reached the specified number of effective values (as e.g. with runMCMC function in NIMBLE); and (iv) it can be applied with different MCMC R languages.

Indeed, runMCMC_btadjust uses other Bayesian packages to fit the MCMC. At present, only the JAGS, NIMBLE and greta languages can be used as these are the main Bayesian languages in R known by the package author and that permit to continue an already fitted MCMC - which is required for numerical efficiency. We will here show how to fit and compare a very simple model under these three languages, using the possibilities allowed by runMCMC_btadjust. Our model is one of the simplest statistical model we could think of: inspired from @Kery_2010, we model data of weights of 1,000 Pilgrim falcons (Falco peregrinus) simulated from a Gaussian distribution with mean 600 grams and standard error 30 grams:

set.seed(1)
y1000<-rnorm(n=1000,mean=600,sd=30)

This document is made to provide simple examples with the three languages - JAGS, NIMBLE and greta. Yet only the languages that are available will be developed. And if NIMBLE is not available, no example will be developed given that Nimble is the reference example herein.

NIMBLE

We start with fitting the example with NIMBLE (cf. https://r-nimble.org/).


library(runMCMCbtadjust)
library(nimble)

As NIMBLE distinguishes data that have random distributions from other data, we specify two distinct lists to contain these:


ModelData <-list(mass = y1000)
ModelConsts <- list(nobs = length(y1000))

We then write our Bayesian code within R with the nimbleCode function in the nimble package:

 ModelCode<-nimbleCode(
  {
    # Priors
    population.mean ~ dunif(0,5000)
    population.sd ~ dunif(0,100)
    
    # Normal distribution parameterized by precision = 1/variance in Nimble
    population.variance <- population.sd * population.sd
    precision <- 1 / population.variance
  
    # Likelihood
    for(i in 1:nobs){
      mass[i] ~ dnorm(population.mean, precision)
    }
  })

Our -optional- next step is to specify starting values for model’s parameters. This is done by first writing a function that is repetitively called for each chain. We - also optionally - indicate the names of parameters to be saved and diagnosed in a vector called params:

ModelInits <- function()
{list (population.mean = rnorm(1,600,90), population.sd = runif(1, 1, 30))}
  
Nchains <- 3

set.seed(1)
Inits<-lapply(1:Nchains,function(x){ModelInits()})

#specifying the names of parameters to analyse and save:
params <- c("population.mean", "population.sd") 

We are now ready to launch runMCMC_btadjust: since we use NIMBLE, we must specify arguments code, data, constants (see below) as well as MCMC_language="Nimble". We first do it on one chain (argument Nchains=1) using in the control list argument neff.method="Coda" to use the Coda method to calculate the number of effective parameters and convtype="Geweke" to use the Geweke method to diagnose convergence, with the pre-specified maximum - over analyzed parameters - convergence of 1.05 (conv.max=1.05) and the minimum - over analyzed parameters - number of effective values of 1,000 (neff.min=1000). Other arguments that are the same for all MCMC languages include params (parameter names to diagnose and save), inits (initial values), niter.min (minimum number of iterations), niter.max (maximum number of iterations), nburnin.min, nburnin.max thin.min, thin.max (minimum and maximum number of iterations for respectively the burn-in and thinning parameters):

out.mcmc.Coda.Geweke<-runMCMC_btadjust(code=ModelCode, constants = ModelConsts, data = ModelData, MCMC_language="Nimble",
    Nchains=1, params=params, inits=Inits[1],
    niter.min=1000, niter.max=300000,
    nburnin.min=100, nburnin.max=200000, 
    thin.min=1, thin.max=1000,
    conv.max=1.05, neff.min=1000,
    control=list(neff.method="Coda", convtype="Geweke"),
    control.MCMC=list(showCompilerOutput=FALSE))
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#>   - population.mean
#>   - population.sd
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "Case of niter update: Non convergence"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  4.837"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  1.32"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  1.363"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|

We then run the MCMC with Nchains MCMC chains, the -default- Gelman-Rubin diagnostic of convergence and the -default- rstan method for calculating the number of effective values:

out.mcmc<-runMCMC_btadjust(code=ModelCode, constants = ModelConsts, data = ModelData, MCMC_language="Nimble",
    Nchains=Nchains, params=params, inits=Inits,
    niter.min=1000, niter.max=300000,
    nburnin.min=100, nburnin.max=200000, 
    thin.min=1, thin.max=1000,
    conv.max=1.05, neff.min=1000,
    control.MCMC=list(showCompilerOutput=FALSE))
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#>   - population.mean
#>   - population.sd
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#>   - population.mean
#>   - population.sd
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#>   - population.mean
#>   - population.sd
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "Case of niter update: Non convergence"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  6.643"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  1.46"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  1.18"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  1.252"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|

We compare the characteristics of the two MCMCs, both in terms of burn-in, thinning parameter, number of iterations and in terms of time (both total time and CPU time).

Table: Comparison of the efficiency of first two NIMBLE models:

Nimble.Coda.Geweke Nimble.default
converged 1.0000 1.0000
burnin 5010.0000 586.0000
thin 10.0000 17.0000
niter.tot 16465.0000 7106.0000
duration 37.7800 81.1279
duration.MCMC.preparation 29.2399 72.2723
duration.MCMC.transient 0.7892 0.1729
duration.MCMC.asymptotic 1.8019 1.9162
duration.btadjust 5.9491 6.7665
CPUduration 10.6100 21.7400
CPUduration.MCMC.preparation 7.0000 18.3200
CPUduration.MCMC.transient 0.7797 0.1730
CPUduration.MCMC.asymptotic 1.7803 1.9170
CPUduration.btadjust 1.0500 1.3300

We acknowledge that the Coda.Geweke algorithm takes much less time (rows named duration and CPUduration in the previous table) than the classical setting to prepare data (rows named duration.MCMC.preparation and CPUduration.MCMC.preparation)- as NIMBLE takes quite a lot of time to prepare each MCMC chain - and we have 3 chains to prepare in the default setting compared to 1 with Geweke.

We also notice that the Coda.Geweke algorithm uses more time (duration.MCMC.transient and CPUduration.MCMC.transient) and iterations (burnin) to converge while the default setting takes more time for the asymptotic phase (duration.MCMC.asymptotic and CPUduration.MCMC.asymptotic), linked to a greater thinning parameter (thin). This transient part should be linked to the different behaviors between Geweke and Gelman-Rubin convergence diagnostics, while the differences in thinning parameters might be linked to the different methods in calculating the number of effective parameters. We therefore run a third MCMC on one chain with Geweke diagnostic but the default, rstan method for number of effective values.


out.mcmc.Geweke<-runMCMC_btadjust(code=ModelCode, constants = ModelConsts, data = ModelData, MCMC_language="Nimble",
    Nchains=1, params=params, inits=Inits[1],
    niter.min=1000, niter.max=300000,
    nburnin.min=100, nburnin.max=200000, 
    thin.min=1, thin.max=1000,
    conv.max=1.05, neff.min=1000,
    control=list(convtype="Geweke"),
    control.MCMC=list(showCompilerOutput=FALSE))
#> ===== Monitors =====
#> thin = 1: population.mean, population.sd
#> ===== Samplers =====
#> RW sampler (2)
#>   - population.mean
#>   - population.sd
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "Case of niter update: Non convergence"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  4.852"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|
#> [1] "raw multiplier of thin:  1.498"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> |-------------|-------------|-------------|-------------|
#> |-------------------------------------------------------|

We compare the characteristics of the three NIMBLE MCMCs,

Table: Comparison of the efficiency of the three NIMBLE models:

Nimble.Coda.Geweke Nimble.Geweke Nimble.default
converged 1.0000 1.0000 1.0000
burnin 5010.0000 1050.0000 586.0000
thin 10.0000 8.0000 17.0000
niter.tot 16465.0000 10041.0000 7106.0000
duration 37.7800 26.5011 81.1279
duration.MCMC.preparation 29.2399 20.8804 72.2723
duration.MCMC.transient 0.7892 0.1005 0.1729
duration.MCMC.asymptotic 1.8019 0.8601 1.9162
duration.btadjust 5.9491 4.6601 6.7665
CPUduration 10.6100 6.0700 21.7400
CPUduration.MCMC.preparation 7.0000 4.6700 18.3200
CPUduration.MCMC.transient 0.7797 0.1004 0.1730
CPUduration.MCMC.asymptotic 1.7803 0.8596 1.9170
CPUduration.btadjust 1.0500 0.4400 1.3300

Results do not corroborate our above expectations: indeed, first the Geweke model does not converge within the same number of iterations as the Coda.Geweke one (row burnin), which is strange since they use the same method for diagnosing convergence and the same seed. Second, the thinning parameter was not increased when changing from Coda.Geweke to Geweke as expected above, but actually decreased (thin).

We now turn to the comparison of the statistical parameter outputs. We use two sample Kolmogorov-Smirnov tests to compare each parameter by pairs of MCMC methods:

Table: P-values of paired Kolmogorov-Smirnov tests of output parameters between the first three NIMBLE models:

mean sd
default vs. Geweke 0.86129 0.12546
Coda.Geweke vs. Geweke 0.57106 0.89642
Default vs. Coda.Geweke 0.47597 0.07599

The p-values associated to the KS tests are not very small - only one out of six is near 0.05. This indicates that the MCMC outputs can be considered as being drawn from the same distributions.

These are summarized in the next tables.

Table: Summary of the statistical parameters of the Nimble Coda.Geweke model:

Mean SD Naive SE Time-series SE
population.mean 599.636 0.989 0.029 0.029
population.sd 31.067 0.682 0.020 0.016

Table: Summary of the statistical parameters of the Nimble Geweke model:

Mean SD Naive SE Time-series SE
population.mean 599.655 0.951 0.028 0.029
population.sd 31.076 0.664 0.020 0.020

Table: Summary of the statistical parameters of the Nimble default model:

Mean SD Naive SE Time-series SE
population.mean 599.682 0.967 0.029 0.029
population.sd 31.127 0.713 0.021 0.020
We notice that parameter values are very close, that naive standard errors (SEs) are very close to Time-series SEs - which is linked to the automatic tuning of the thinning parameter which produces output samples which are nearly independent - and that differences between mean estimators are within several units of Time series-SEs - which we interpret is mostly due to the control of convergence.

JAGS

We now turn to analyzing the same data with the same statistical model using JAGS with runMCMC_btadjust. We rely on the data simulated above. In JAGS, we now put all the data in the same list:


ModelData <-list(mass = y1000, nobs = length(y1000))

We then propose the use of JAGS with a specification of the model from within R - which we find more convenient. We therefore write the model within R as a character chain:


modeltotransfer<-"model {

		# Priors
			population.mean ~ dunif(0,5000)
			population.sd ~ dunif(0,100)

			# Normal distribution parameterized by precision = 1/variance in Jags
    	population.variance <- population.sd * population.sd
      precision <- 1 / population.variance

			# Likelihood
			for(i in 1:nobs){
			  mass[i] ~ dnorm(population.mean, precision)
			}
		}"

The other objects useful or required for running runMCMC_btadjust with JAGS are similar to what is required with NIMBLE (Inits, Nchains, params) and are not repeated here.

We then launch runMCMC_btadjust with MCMC_language="Jags", specifying arguments code and data which are required in this case:


set.seed(1)
out.mcmc.Jags<-runMCMC_btadjust(code=modeltotransfer,  data = ModelData, MCMC_language="Jags", 
    Nchains=Nchains, params=params, inits=Inits,
    niter.min=1000,niter.max=300000,
    nburnin.min=100,nburnin.max=200000,
    thin.min=1,thin.max=1000,
		conv.max=1.05,neff.min=1000)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 1000
#>    Unobserved stochastic nodes: 2
#>    Total graph size: 1009
#> 
#> Initializing model
#> 
#> [1] "###################################################################################"

Note that if we had written the JAGS code in a text file named "ModelJags.txt", we would just have replaced in the command above code=modeltotransfer by code="ModelJags.txt".

Table: Summary of the statistical parameters of the Jags model:

Mean SD Naive SE Time-series SE
population.mean 599.640 1.009 0.019 0.023
population.sd 31.081 0.681 0.013 0.016

Results seem in line with those of NIMBLE. We check this using a paired Kolmogorov-Smirnov tests with NIMBLE models:

Table: P-values of paired Kolmogorov-Smirnov tests of output parameters of the Jags model with the three NIMBLE models:

mean sd
Nimble.Geweke vs. Jags 0.6526 0.3490
Nimble.Coda.Geweke vs. Jags 0.4099 0.4237
Nimble.Default vs. Jags 0.4286 0.2620
Our results do confirm that the JAGS result cannot be considered as stemming from a different probability distribution than NIMBLE results.

We finally compare the efficiency of the JAGS and default NIMBLE MCMCs:

Table: Comparison of the efficiency of the default NIMBLE model and the Jags model:

Nimble.default Jags
converged 1.0000 1.0000
burnin 586.0000 101.0000
thin 17.0000 1.0000
niter.tot 7106.0000 1000.0000
duration 81.1279 5.9794
duration.MCMC.preparation 72.2723 2.9442
duration.MCMC.transient 0.1729 1.2223
duration.MCMC.asymptotic 1.9162 0.9980
duration.btadjust 6.7665 0.8148
CPUduration 21.7400 4.5100
CPUduration.MCMC.preparation 18.3200 2.2500
CPUduration.MCMC.transient 0.1730 1.2166
CPUduration.MCMC.asymptotic 1.9170 0.9934
CPUduration.btadjust 1.3300 0.0500

The conclusion is that JAGS is much faster than NIMBLE on this example, due to much less time devoted to MCMC preparation - as well as to burn-in/thinning adjustment. Actually there is no adjustment with JAGS (niter.tot is equal to the initial number of iterations). Yet, NIMBLE is quicker regarding MCMC updating by iteration since it took NIMBLE less time for the transient phase (respectively less than twice time for the asymptotic phase) although using more than seven (resp. seventeen) times more iterations than JAGS.

At first sight, we would also conclude that MCMC efficiency per effective value is also better with NIMBLE since both languages had the same target for the minimum number of effective value - 1,000 and the total MCMC time was lower with NIMBLE. Yet, a more rigorous comparison finds the reverse result:

Table: Comparison of the number of effective values between the default NIMBLE model and the JAGS model:

Nimble.default Jags
Min. Number Eff. values 1177.000000 1699.000000
MCMC CPU time per Effective Value 0.001776 0.001301

Indeed, JAGS with just the first iterations produced a high number of effective values - actually much bigger than the targeted neff.min - which renders the MCMC time per effective value now lower with JAGS than NIMBLE with this model (cf. table above).

Greta

We finally run the greta version of our model with runMCMC_btadjust. greta is rather different from JAGS and NIMBLE in that the model defines objects in R and thus does not require a model code to be passed to runMCMC, nor Data or Constants. We rely on the very data simulated above. The coding with greta is as below:

#in my setting I need to load not only greta but R6 & tensorflow packages
library(greta)
library (R6)
library(tensorflow)

#first requirement of greta: declaring the data that will be analyzed with the function as_data
Y<-as_data(y1000)

#we then proceed by writing the model directly in R, starting with the priors of the parameters using greta functions for probability distributions - here uniform()
population.mean<-uniform(0,5000)
population.sd<-uniform(0,100)
    
#we then define the distribution of the data - here with the normal distribution - by default parametrized with a standard deviation in greta:
try({distribution(Y)<-normal(population.mean,population.sd) })

#we finally declare the greta model, which will be the object passed to runMCMC_btadjust 
m<-model(population.mean, population.sd)

### we finally have to prepare initial values with a specific greta function - initials:
ModelInits <- function()
    {initials(population.mean = rnorm(1,600,90), population.sd = runif(1, 1, 30))}

set.seed(1)
  Inits<-lapply(1:Nchains,function(x){ModelInits()})

We are now ready to fit the model with runMCMC_btadjust, specifying MCMC_language="Greta" and giving the argument model instead of code and data:

out.mcmc.greta<-runMCMC_btadjust(model=m, MCMC_language="Greta",
    Nchains=Nchains,params=params,inits=Inits,
		niter.min=1000,niter.max=300000,
    nburnin.min=100,nburnin.max=200000,
		thin.min=1,thin.max=1000,
		conv.max=1.05, neff.min=1000)
#> [1] "###################################################################################"
#> 
#> [1] "raw multiplier of thin:  4.878"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> 
#> [1] "raw multiplier of thin:  1.643"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"
#> 
#> [1] "raw multiplier of thin:  1.486"
#> [1] "###################################################################################"
#> [1] "Case of niter update: Convergence and trying to reach end of MCMC at the end of next cycle"
#> [1] "###################################################################################"
#> [1] "###################################################################################"

Table: Summary of the statistical parameters of the greta model:

Mean SD Naive SE Time-series SE
population.mean 599.601 0.973 0.029 0.029
population.sd 31.075 0.670 0.020 0.020

We first check that estimations are similar to those with NIMBLE and JAGS with paired Kolmogorov-Smirnov tests:

Table: P-values of paired Kolmogorov-Smirnov tests of output parameters of the greta model with the default NIMBLE model and the JAGS model:

mean sd
Nimble vs. greta 0.15416 0.04417
Jags vs. greta 0.49189 0.56537

We then report the efficiency of the MCMCs.

Table: Comparison of the efficiency of the default NIMBLE, the JAGS and the greta models:

Nimble.default Jags Greta
converged 1.0000 1.0000 1.0000
burnin 586.0000 101.0000 2.0000
thin 17.0000 1.0000 14.0000
niter.tot 7106.0000 1000.0000 5373.0000
duration 81.1279 5.9794 67.3941
duration.MCMC.preparation 72.2723 2.9442 0.3830
duration.MCMC.transient 0.1729 1.2223 9.8277
duration.MCMC.asymptotic 1.9162 0.9980 52.6792
duration.btadjust 6.7665 0.8148 4.5041
CPUduration 21.7400 4.5100 216.8800
CPUduration.MCMC.preparation 18.3200 2.2500 0.0000
CPUduration.MCMC.transient 0.1730 1.2166 34.0693
CPUduration.MCMC.asymptotic 1.9170 0.9934 182.6207
CPUduration.btadjust 1.3300 0.0500 0.1900
MCMC time (rows duration.MCMC.transient & duration.MCMC.asymptotic) was far greater with greta than with JAGS and NIMBLE, for a minimum number of effective values with greta of 1085. Total duration is rather close with greta compared with NIMBLE, due to the great time required by NIMBLE for MCMC preparation - while this preparation is done outside runMCMC_btadjust with greta. Yet, when we compare CPU total durations (CPUduration), greta gets worse than NIMBLE, simply because greta parallelized its process and therefore required more CPU time per time unit.

We tried to give a second chance to greta, based on the following post: https://forum.greta-stats.org/t/size-and-number-of-leapfrog-steps-in-hmc/332. The idea was to let greta have more information to adapt its hmc parameters during the warm-up phase by just having more chains to run - hereafter, 15.

Nchains<-15
ModelInits <- function()
    {initials(population.mean = rnorm(1,600,90), population.sd = runif(1, 1, 30))}

set.seed(1)
Inits<-lapply(1:Nchains,function(x){ModelInits()})
  
  out.mcmc.greta.morechains<-runMCMC_btadjust(model=m, MCMC_language="Greta",
    Nchains=Nchains,params=params,inits=Inits,
		niter.min=1000,niter.max=300000,
    nburnin.min=100,nburnin.max=200000,
		thin.min=1,thin.max=1000,
		conv.max=1.05, neff.min=1000)
#> [1] "###################################################################################"

Table: Summary of the statistical parameters of the greta model with 15 chains:

Mean SD Naive SE Time-series SE
population.mean 599.6811 0.9780 0.0080 0.0166
population.sd 31.1106 0.7094 0.0058 0.0111

This run was indeed much faster. Parameter estimates were still not significantly different from those with NIMBLE and JAGS based on paired Kolmogorov-Smirnov tests:

Table: P-values of paired Kolmogorov-Smirnov tests of output parameters of the greta model with 15 chains with the default NIMBLE model and the JAGS model:

mean sd
Nimble vs. greta.morechains 0.75058 0.22602
Jags vs. greta.morechains 0.11353 0.07976

We now report the efficiency of the MCMCs:

Table: Comparison of the efficiency of the default NIMBLE, the JAGS and the greta.morechains models:

Nimble.default Jags Greta.morechains
converged 1.0000 1.0000 1.0000
burnin 586.0000 101.0000 1.0000
thin 17.0000 1.0000 1.0000
niter.tot 7106.0000 1000.0000 1000.0000
duration 81.1279 5.9794 29.0809
duration.MCMC.preparation 72.2723 2.9442 0.3913
duration.MCMC.transient 0.1729 1.2223 13.8746
duration.MCMC.asymptotic 1.9162 0.9980 13.8469
duration.btadjust 6.7665 0.8148 0.9681
CPUduration 21.7400 4.5100 111.0100
CPUduration.MCMC.preparation 18.3200 2.2500 0.0200
CPUduration.MCMC.transient 0.1730 1.2166 55.4504
CPUduration.MCMC.asymptotic 1.9170 0.9934 55.3396
CPUduration.btadjust 1.3300 0.0500 0.2000
We still observed more CPU duration with greta, although the difference had decreased and the associated number of effective values for greta was now 3318, which rendered MCMC CPU efficiency with greta closer to NIMBLE. Yet, overall, JAGS performed better than greta and NIMBLE on this example.

Conclusion

We hope we have convinced the R user of Bayesian models that runMCMC_btadjust can help having a more efficient, quality oriented use of these types of models while saving analyst’s and potentially computer time. Indeed, to recap, the aim of this function is to run a Markov Chain Monte Carlo (MCMC) for a specified Bayesian model while adapting automatically the burn-in and thinning parameters to meet pre-specified targets in terms of MCMC convergence and number of effective values of MCMC outputs. This is done in only one call to the function that repeatedly calls the MCMC until criteria for convergence and number of effective values are met. The function has four main advantages:

(i) it saves the analyst’s programming time since he/she does not have to repeatedly diagnose and re-run MCMCs until desired levels of convergence and number of effective values are reached;

(ii) it allows a minimal, normalized quality control of MCMC outputs by allowing to meet pre-specified levels in terms of convergence and number of quasi-independent values;

(iii) it may save computer’s time when compared to cases where we have to restart the MCMC from the beginning if it has not converged or reached the specified number of effective values;

(iv) it can be applied with different MCMC R languages - at present greta, NIMBLE and JAGS. This comes with two positive consequences in practice: first, allowing the user a more rigorous comparison between the three Bayesian fitting languages in terms of comparability of inference and of MCMC efficiency - especially in terms of CPU time per effective value; second, making it easier to develop the same Bayesian model with these different languages, which is to our experience welcome in practical cases, since these different languages have advantages over the other ones that vary from one context to the other.

References