bridge-sampling-vignette

Introduction

This vignette is an elaboration of the Hierarchical Normal Example Stan vignette in the bridgesampling R package (Gronau, Singmann, and Wagenmakers 2020). It shows that the LoRaD method (Wang et al. 2023) produces an estimate of the marginal likelihood nearly identical to that produced by Bridge Sampling (Meng and Wong 1996) using the bridgesampling package. The steps in this vignette are:

  1. Simulate data
  2. Use rstan (Stan Development Team 2023) to obtain a posterior sample via MCMC
  3. Use bridgesampling to estimate the marginal likelihood
  4. Use LoRaD to estimate the marginal likelihood

The first three steps are identical to what is shown in the bridgesampling vignette.

Load packages

We will use three packages for this vignette: rstan, bridgesampling, and lorad.

library(rstan)
#> Loading required package: StanHeaders
#> 
#> rstan version 2.32.3 (Stan version 2.26.1)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)
library(bridgesampling)
library(lorad)

Simulating data

The data for this example comprise \(n=20\) values \(\{y_i : i=1,2,\cdots, n\}\), where \[y_i \sim N(\theta_i, \sigma^2) \\ \theta_i \sim N(\mu, \tau^2)\\ \mu = 0\\ \tau^2 = \tfrac{1}{2}\]

set.seed(12345)

mu <- 0
tau2 <- 0.5
sigma2 <- 1

n <- 20
theta <- rnorm(n, mu, sqrt(tau2))
y <- rnorm(n, theta, sqrt(sigma2))

For Bayesian analyses, the priors used for the group-level mean and variance are: \[\mu \sim N(\mu_0, \tau_0^2)\\ \tau^2 \sim \mbox{InvGamma}(\alpha, \beta)\\ \mu_0 = 0\\ \tau_0^2 = 1\\ \alpha = \beta = 1\]

mu0 <- 0
tau20 <- 1
alpha <- 1
beta <- 1

Specify the models for STAN

We will compare the following two models using marginal likelihood:

  1. \({\cal H}_0: \mu = 0\)
  2. \({\cal H}_1:\mu\) estimated

These models differ only in whether the group-level mean is fixed to 0 or is estimated. As mentioned in the original bridgesampling vignette, it is important when using STAN to use the target += ... method for specifying log probability densities so that normalizing constants are included.

stancodeH0 <- 'data {
  int<lower=1> n; // number of observations
  vector[n] y; // observations
  real<lower=0> alpha;
  real<lower=0> beta;
  real<lower=0> sigma2;
}
parameters {
  real<lower=0> tau2; // group-level variance
  vector[n] theta; // participant effects
}
model {
  target += inv_gamma_lpdf(tau2 | alpha, beta);
  target += normal_lpdf(theta | 0, sqrt(tau2));
  target += normal_lpdf(y | theta, sqrt(sigma2));
}
'
stancodeH1 <- 'data {
  int<lower=1> n; // number of observations
  vector[n] y; // observations
  real mu0;
  real<lower=0> tau20;
  real<lower=0> alpha;
  real<lower=0> beta;
  real<lower=0> sigma2;
}
parameters {
  real mu;
  real<lower=0> tau2; // group-level variance
  vector[n] theta; // participant effects
}
model {
  target += normal_lpdf(mu | mu0, sqrt(tau20));
  target += inv_gamma_lpdf(tau2 | alpha, beta);
  target += normal_lpdf(theta | mu, sqrt(tau2));
  target += normal_lpdf(y | theta, sqrt(sigma2));
}
'
# compile models
stanmodelH0 <- stan_model(model_code = stancodeH0, model_name="stanmodel")
stanmodelH1 <- stan_model(model_code = stancodeH1, model_name="stanmodel")

Fitting the models using rstan

Now use STAN to sample from the posterior distribution of both models.

# fit models
stanfitH0 <- sampling(stanmodelH0, data = list(y = y, n = n,
                                               alpha = alpha,
                                               beta = beta,
                                               sigma2 = sigma2),
                      iter = 50000, warmup = 1000, chains = 3, cores = 1, seed = 12345)
#> 
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 2.9e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:     1 / 50000 [  0%]  (Warmup)
#> Chain 1: Iteration:  1001 / 50000 [  2%]  (Sampling)
#> Chain 1: Iteration:  6000 / 50000 [ 12%]  (Sampling)
#> Chain 1: Iteration: 11000 / 50000 [ 22%]  (Sampling)
#> Chain 1: Iteration: 16000 / 50000 [ 32%]  (Sampling)
#> Chain 1: Iteration: 21000 / 50000 [ 42%]  (Sampling)
#> Chain 1: Iteration: 26000 / 50000 [ 52%]  (Sampling)
#> Chain 1: Iteration: 31000 / 50000 [ 62%]  (Sampling)
#> Chain 1: Iteration: 36000 / 50000 [ 72%]  (Sampling)
#> Chain 1: Iteration: 41000 / 50000 [ 82%]  (Sampling)
#> Chain 1: Iteration: 46000 / 50000 [ 92%]  (Sampling)
#> Chain 1: Iteration: 50000 / 50000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.041 seconds (Warm-up)
#> Chain 1:                1.892 seconds (Sampling)
#> Chain 1:                1.933 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 7e-06 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:     1 / 50000 [  0%]  (Warmup)
#> Chain 2: Iteration:  1001 / 50000 [  2%]  (Sampling)
#> Chain 2: Iteration:  6000 / 50000 [ 12%]  (Sampling)
#> Chain 2: Iteration: 11000 / 50000 [ 22%]  (Sampling)
#> Chain 2: Iteration: 16000 / 50000 [ 32%]  (Sampling)
#> Chain 2: Iteration: 21000 / 50000 [ 42%]  (Sampling)
#> Chain 2: Iteration: 26000 / 50000 [ 52%]  (Sampling)
#> Chain 2: Iteration: 31000 / 50000 [ 62%]  (Sampling)
#> Chain 2: Iteration: 36000 / 50000 [ 72%]  (Sampling)
#> Chain 2: Iteration: 41000 / 50000 [ 82%]  (Sampling)
#> Chain 2: Iteration: 46000 / 50000 [ 92%]  (Sampling)
#> Chain 2: Iteration: 50000 / 50000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 0.043 seconds (Warm-up)
#> Chain 2:                1.742 seconds (Sampling)
#> Chain 2:                1.785 seconds (Total)
#> Chain 2: 
#> 
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 3).
#> Chain 3: 
#> Chain 3: Gradient evaluation took 8e-06 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3: 
#> Chain 3: 
#> Chain 3: Iteration:     1 / 50000 [  0%]  (Warmup)
#> Chain 3: Iteration:  1001 / 50000 [  2%]  (Sampling)
#> Chain 3: Iteration:  6000 / 50000 [ 12%]  (Sampling)
#> Chain 3: Iteration: 11000 / 50000 [ 22%]  (Sampling)
#> Chain 3: Iteration: 16000 / 50000 [ 32%]  (Sampling)
#> Chain 3: Iteration: 21000 / 50000 [ 42%]  (Sampling)
#> Chain 3: Iteration: 26000 / 50000 [ 52%]  (Sampling)
#> Chain 3: Iteration: 31000 / 50000 [ 62%]  (Sampling)
#> Chain 3: Iteration: 36000 / 50000 [ 72%]  (Sampling)
#> Chain 3: Iteration: 41000 / 50000 [ 82%]  (Sampling)
#> Chain 3: Iteration: 46000 / 50000 [ 92%]  (Sampling)
#> Chain 3: Iteration: 50000 / 50000 [100%]  (Sampling)
#> Chain 3: 
#> Chain 3:  Elapsed Time: 0.04 seconds (Warm-up)
#> Chain 3:                1.716 seconds (Sampling)
#> Chain 3:                1.756 seconds (Total)
#> Chain 3:
stanfitH1 <- sampling(stanmodelH1, data = list(y = y, n = n,
                                               mu0 = mu0,
                                               tau20 = tau20,
                                               alpha = alpha,
                                               beta = beta,
                                               sigma2 = sigma2),
                      iter = 50000, warmup = 1000, chains = 3, cores = 1, seed = 12345)
#> 
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 3.1e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.31 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:     1 / 50000 [  0%]  (Warmup)
#> Chain 1: Iteration:  1001 / 50000 [  2%]  (Sampling)
#> Chain 1: Iteration:  6000 / 50000 [ 12%]  (Sampling)
#> Chain 1: Iteration: 11000 / 50000 [ 22%]  (Sampling)
#> Chain 1: Iteration: 16000 / 50000 [ 32%]  (Sampling)
#> Chain 1: Iteration: 21000 / 50000 [ 42%]  (Sampling)
#> Chain 1: Iteration: 26000 / 50000 [ 52%]  (Sampling)
#> Chain 1: Iteration: 31000 / 50000 [ 62%]  (Sampling)
#> Chain 1: Iteration: 36000 / 50000 [ 72%]  (Sampling)
#> Chain 1: Iteration: 41000 / 50000 [ 82%]  (Sampling)
#> Chain 1: Iteration: 46000 / 50000 [ 92%]  (Sampling)
#> Chain 1: Iteration: 50000 / 50000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.048 seconds (Warm-up)
#> Chain 1:                2.546 seconds (Sampling)
#> Chain 1:                2.594 seconds (Total)
#> Chain 1: 
#> 
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 2).
#> Chain 2: 
#> Chain 2: Gradient evaluation took 8e-06 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2: 
#> Chain 2: 
#> Chain 2: Iteration:     1 / 50000 [  0%]  (Warmup)
#> Chain 2: Iteration:  1001 / 50000 [  2%]  (Sampling)
#> Chain 2: Iteration:  6000 / 50000 [ 12%]  (Sampling)
#> Chain 2: Iteration: 11000 / 50000 [ 22%]  (Sampling)
#> Chain 2: Iteration: 16000 / 50000 [ 32%]  (Sampling)
#> Chain 2: Iteration: 21000 / 50000 [ 42%]  (Sampling)
#> Chain 2: Iteration: 26000 / 50000 [ 52%]  (Sampling)
#> Chain 2: Iteration: 31000 / 50000 [ 62%]  (Sampling)
#> Chain 2: Iteration: 36000 / 50000 [ 72%]  (Sampling)
#> Chain 2: Iteration: 41000 / 50000 [ 82%]  (Sampling)
#> Chain 2: Iteration: 46000 / 50000 [ 92%]  (Sampling)
#> Chain 2: Iteration: 50000 / 50000 [100%]  (Sampling)
#> Chain 2: 
#> Chain 2:  Elapsed Time: 0.049 seconds (Warm-up)
#> Chain 2:                2.274 seconds (Sampling)
#> Chain 2:                2.323 seconds (Total)
#> Chain 2: 
#> 
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 3).
#> Chain 3: 
#> Chain 3: Gradient evaluation took 8e-06 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3: 
#> Chain 3: 
#> Chain 3: Iteration:     1 / 50000 [  0%]  (Warmup)
#> Chain 3: Iteration:  1001 / 50000 [  2%]  (Sampling)
#> Chain 3: Iteration:  6000 / 50000 [ 12%]  (Sampling)
#> Chain 3: Iteration: 11000 / 50000 [ 22%]  (Sampling)
#> Chain 3: Iteration: 16000 / 50000 [ 32%]  (Sampling)
#> Chain 3: Iteration: 21000 / 50000 [ 42%]  (Sampling)
#> Chain 3: Iteration: 26000 / 50000 [ 52%]  (Sampling)
#> Chain 3: Iteration: 31000 / 50000 [ 62%]  (Sampling)
#> Chain 3: Iteration: 36000 / 50000 [ 72%]  (Sampling)
#> Chain 3: Iteration: 41000 / 50000 [ 82%]  (Sampling)
#> Chain 3: Iteration: 46000 / 50000 [ 92%]  (Sampling)
#> Chain 3: Iteration: 50000 / 50000 [100%]  (Sampling)
#> Chain 3: 
#> Chain 3:  Elapsed Time: 0.049 seconds (Warm-up)
#> Chain 3:                2.818 seconds (Sampling)
#> Chain 3:                2.867 seconds (Total)
#> Chain 3:

Estimate log marginal likelihoods using bridgesampling

Now load the bridgesampling package and estimate the log marginal likelihood for each model. The null model \(\cal{H}_0\) is slightly preferred, which makes sense because the data were simulated assuming \(\mu=0\).

# compute log marginal likelihood via bridge sampling for H0
H0.bridge <- bridge_sampler(stanfitH0, silent = TRUE)

# compute log marginal likelihood via bridge sampling for H1
H1.bridge <- bridge_sampler(stanfitH1, silent = TRUE)

print(H0.bridge)
#> Bridge sampling estimate of the log marginal likelihood: -37.53668
#> Estimate obtained in 4 iteration(s) via method "normal".
print(H1.bridge)
#> Bridge sampling estimate of the log marginal likelihood: -37.7984
#> Estimate obtained in 4 iteration(s) via method "normal".

The log marginal likelihood estimates are -37.53668 for \({\cal H}_0\) and -37.7984 for \({\cal H}_1\).

Estimate log marginal likelihoods using LoRaD

Export the sampled parameters

In order to use the LoRaD method, we need to export the sampled parameter values along with the unnormalized log posterior to a data frame. The permuted=TRUE setting causes samples from all 3 chains to be merged.

paramsH0 <- data.frame(extract(stanfitH0, permuted=TRUE))

paramsH1 <- data.frame(extract(stanfitH1, permuted=TRUE))

The null hypothesis

After loading the lorad package, specify what each column in the exported parameter file represents. STAN transforms all parameters so that they are unconstrained (e.g. it log-transforms parameters with support restricted to the positive real numbers so that the transformed parameter has support equal to the entire real line). This means that the lorad package need not perform any transformations. Create a named list that specifies unconstrained for all columns and posterior for the column labeled lp. The lorad package assumes that all columns labeled posterior should be summed to form the log posterior kernel (the posterior kernel is simply the unnormalized posterior: the product of prior and likelihood).

The command lorad_estimate estimates the log marginal likelihood. This function is provided

The LoRaD method uses the training sample to determine the mean and covariance matrix used to standardize the parameter vectors and the radius to be used in defining the working parameter space. It then uses only a small coverage fraction of the highest density points in the remaining estimation sample to estimate the marginal likelihood. The values 0.5 and 0.1 are generally good values to use for the training fraction and coverage. Using too large a fraction for coverage risks a poor fit of the multivariate normal reference distribution to the sampled values inside the working parameter space.

colspecH0 <- c("tau2"="unconstrained", "theta.1"="unconstrained", "theta.2"="unconstrained", "theta.3"="unconstrained", "theta.4"="unconstrained", "theta.5"="unconstrained", "theta.6"="unconstrained", "theta.7"="unconstrained", "theta.8"="unconstrained", "theta.9"="unconstrained", "theta.10"="unconstrained", "theta.11"="unconstrained", "theta.12"="unconstrained", "theta.13"="unconstrained", "theta.14"="unconstrained", "theta.15"="unconstrained", "theta.16"="unconstrained", "theta.17"="unconstrained", "theta.18"="unconstrained", "theta.19"="unconstrained", "theta.20"="unconstrained", "lp__"="posterior")
results0 <- lorad_estimate(paramsH0, colspecH0, 0.5, "random", 0.1)
lorad_summary(results0)
#> This is lorad 0.0.1.0 
#>     Parameter sample comprises 147000 sampled points
#>     Each sample point is a vector of 21 parameter values
#> 
#>    Training fraction is 0.5
#>     Coverage specified is 0.1
#>  
#> Partitioning samples into training and estimation:
#>     Sample size is 147000
#>     Training sample size is 73500
#>     Estimation sample size 73500
#>  
#> Processing training sample...
#>     Lowest radial distance is 3.418842393
#>     Log Delta -2.949772427
#>  
#> Processing estimation sample...
#>     Number of samples used is 7335
#>     Nominal coverage specified is 0.100000
#>     Realized coverage is 0.099796
#>     Log marginal likelihood is -37.385193
#> 

The log marginal likelihood for the null hypothesis, estimated by LoRaD, is -37.385193, which is quite close to the value -37.53668 estimated by the bridgesampling package.

The alternative hypothesis

Repeat the LoRaD analysis for the alternative hypothesis.

colspecH1 <- c("mu"="unconstrained", "tau2"="unconstrained", "theta.1"="unconstrained", "theta.2"="unconstrained", "theta.3"="unconstrained", "theta.4"="unconstrained", "theta.5"="unconstrained", "theta.6"="unconstrained", "theta.7"="unconstrained", "theta.8"="unconstrained", "theta.9"="unconstrained", "theta.10"="unconstrained", "theta.11"="unconstrained", "theta.12"="unconstrained", "theta.13"="unconstrained", "theta.14"="unconstrained", "theta.15"="unconstrained", "theta.16"="unconstrained", "theta.17"="unconstrained", "theta.18"="unconstrained", "theta.19"="unconstrained", "theta.20"="unconstrained", "lp__"="posterior")
results1 <- lorad_estimate(paramsH1, colspecH1, 0.5, "random", 0.1)
lorad_summary(results1)
#> This is lorad 0.0.1.0 
#>     Parameter sample comprises 147000 sampled points
#>     Each sample point is a vector of 22 parameter values
#> 
#>    Training fraction is 0.5
#>     Coverage specified is 0.1
#>  
#> Partitioning samples into training and estimation:
#>     Sample size is 147000
#>     Training sample size is 73500
#>     Estimation sample size 73500
#>  
#> Processing training sample...
#>     Lowest radial distance is 3.483237724
#>     Log Delta -3.091635220
#>  
#> Processing estimation sample...
#>     Number of samples used is 7161
#>     Nominal coverage specified is 0.100000
#>     Realized coverage is 0.097429
#>     Log marginal likelihood is -37.693758
#> 

The log marginal likelihood for the altenative hypothesis was -37.693758 for lorad (compared to -37.7984 for bridgesampling).

Literature Cited

Gronau, Quentin F., Henrik Singmann, and Eric-Jan Wagenmakers. 2020. bridgesampling: An R Package for Estimating Normalizing Constants.” Journal of Statistical Software 92 (10): 1–29. https://doi.org/10.18637/jss.v092.i10.
Meng, Xiao-Li, and Wing Hung Wong. 1996. “Simulating Ratios of Normalizing Constants via a Simple Identity: A Theoretical Exploration.” Statistica Sinica 6: 831–60.
Stan Development Team. 2023. RStan: The R Interface to Stan.” https://mc-stan.org/.
Wang, Yu-Bo, Analisa Milkey, Aolan Li, Ming-Hui Chen, Lynn Kuo, and Paul O. Lewis. 2023. “LoRaD: Marginal Likelihood Estimation with Haste (but No Waste).” Systematic Biology 72 (3): 639–48. https://doi.org/10.1093/sysbio/syad007.