Bayesian models are essential for studying complex biological systems of interest (SOIs), such as irradiation effects on metabolite concentrations or vaccine impacts on immune gene repertoires. In these models layered parameters describe key biological features of the SOIs. After model fitting, each parameter is characterized by a posterior distribution: a probability distribution representing all plausible effect values given the observed data. With thousands of such posteriors in high-dimensional analyses, identifying large, reliable effects becomes challenging.
Traditional volcano plots address this for frequentist analyses by plotting fold-changes against –log(p-values). We introduce Bayesian volcano plots that instead visualize the posterior mean effect size of parameter \(i\) (\(\theta_i\)) against the probability, \(\pi_i\), where the \(\pi_i\) quantifies the posterior probability that \(\theta_i\) is not equal the null effect. This directly highlights both magnitude and biological relevance. While Sousa et al. (2020) conceptualized Bayesian volcano plots, our R package provides the first practical implementation for automated \(\pi_i\) calculation and visualization of complex biological effects.
To create a Bayesian volcano plot, BayesVolcano needs two main inputs:
If you used rstan
or brms
to fit your model, our extract_fit() function automatically
converts your results into the right format. Just install the
corresponding package (rstan or brms) first, then run
extract_fit(your_model,parameter_name) to prepare your
data.
data("posterior")
head(posterior[, 1:4])
#> delta_mu.1.2.3.4 delta_mu.2.2.3.4 delta_mu.3.2.3.4 delta_mu.4.2.3.4
#> 1 1.1527980 0.24442670 -0.5283300 -2.8424959
#> 2 1.5428293 0.41885565 -0.8167788 0.2726311
#> 3 1.4201598 -0.92882009 -0.6627982 -1.8188774
#> 4 0.3277584 -0.40256872 0.3988543 -0.7784557
#> 5 0.8694004 -0.06791806 -0.9230497 0.3559686
#> 6 0.2562036 -2.11318184 1.0385491 -1.1102383This data frame maps parameter names to biological entities with at least a column named “parameter” and a column named “label”. Additional columns can be provided for later visualization (categorical like “group” or numerical like “value”).
data("annotation_df")
head(annotation_df)
#> parameter label group value
#> 1 delta_mu.1.2.3.4 1-Aminocyclopropanecarboxylic acid A 9.533826
#> 2 delta_mu.2.2.3.4 2-Aminomuconate B 12.031100
#> 3 delta_mu.3.2.3.4 2-Phosphoglyceric acid A 3.567905
#> 4 delta_mu.4.2.3.4 3-Hydroxybutyric acid B 8.407870
#> 5 delta_mu.5.2.3.4 Acetoacetate A 9.891394
#> 6 delta_mu.6.2.3.4 Acetyl coenzyme A B 9.806375The prepare_volcano_input() function takes as input the
posterior and the annotation data frame. Run this now:
The output contains
str(d)
#> 'data.frame': 98 obs. of 11 variables:
#> $ parameter : chr "delta_mu.1.2.3.4" "delta_mu.2.2.3.4" "delta_mu.3.2.3.4" "delta_mu.4.2.3.4" ...
#> $ pi.value : num 0.58 0.138 0.292 0.184 0.594 0.916 0.88 0.27 0.23 0.518 ...
#> $ null.effect : num 0 0 0 0 0 0 0 0 0 0 ...
#> $ parameter.median: num 0.643 -0.156 -0.271 -0.213 -0.9 ...
#> $ parameter.low : num -1.23 -2.06 -1.98 -2.11 -3.31 ...
#> $ parameter.high : num 2.59 1.96 1.34 1.77 1.5 ...
#> $ CrI.width : num 3.82 4.02 3.32 3.88 4.8 ...
#> $ CrI.level : num 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 ...
#> $ label : chr "1-Aminocyclopropanecarboxylic acid" "2-Aminomuconate" "2-Phosphoglyceric acid" "3-Hydroxybutyric acid" ...
#> $ group : chr "A" "B" "A" "B" ...
#> $ value : num 9.53 12.03 3.57 8.41 9.89 ...The function prepare_volcano_input() uses the posterior
of each parameter from the annotations (e.g., effect size \(\theta_i\) of parameter \(i\)) to calculate \(\pi_i\):
\(\pi_i = 2 \cdot \max\left(\int_{\theta_i = -\infty}^{\bar{\theta}} p(\theta_i)\mathrm{d}\theta_i, \int_{\theta_i = \bar{\theta}}^{\infty} p(\theta_i)\mathrm{d}\theta_i\right) - 1\)
Where \(\bar{\theta}\) is the null effect. This measures the probability that the effect is in the “direction” away from the null.
Important Note: A \(\pi\)-value near 0 doesn’t prove the absence of an effect - it indicates the posterior is widely distributed around the null.
The plot_volcano() function creates a ggplot visual
based on the formatted input:
This plot shows:
Where each point is a parameter with median effect size (x-axis) and \(\pi\) (y-axis). The null effect is shown by the vertical line.
To visualize effect size uncertainty, add credible intervals as error bars:
This adds:
You can also represent credible interval width through point size:
# Customization requires the ggplot2 package
if (!require("BayesVolcano", quietly = TRUE)) {
install.packages("BayesVolcano")
}
library(ggplot2)
p <- plot_volcano(d)
p + xlab("my informative parameter") +
ggtitle("My amazing plot")Use geom_text() for basic labeling or
ggrepel to avoid overplotting:
Show labels only for reliable effects:
p <- plot_volcano(d)
p + geom_text(aes(label = ifelse(abs(parameter.median) > 1.6 & # only show for parameter value > 0.5
pi > 0.95, # only show for pi > 0.95
label, # which variable contains the label
""
))) # do not display label if outside of set rangesThis highlights:
Julie de Sousa, Ondřej Vencálek, Karel Hron, Jan Václavík, David Friedecký, Tomáš Adam, Bayesian multiple hypotheses testing in compositional analysis of untargeted metabolomic data, Analytica Chimica Acta, Volume 1097, 2020, Pages 49-61, ISSN 0003-2670, https://doi.org/10.1016/j.aca.2019.11.006
Corresponding GitHub Repository: https://github.com/sousaju/BayesVolcano
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.2 BayesVolcano_1.0.1
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.7.2 cli_3.6.5 knitr_1.51 rlang_1.1.7
#> [5] xfun_0.56 otel_0.2.0 purrr_1.2.1 generics_0.1.4
#> [9] S7_0.2.1 jsonlite_2.0.0 labeling_0.4.3 glue_1.8.0
#> [13] htmltools_0.5.9 sass_0.4.10 scales_1.4.0 rmarkdown_2.30
#> [17] grid_4.5.1 tibble_3.3.1 evaluate_1.0.5 jquerylib_0.1.4
#> [21] fastmap_1.2.0 yaml_2.3.12 lifecycle_1.0.5 compiler_4.5.1
#> [25] dplyr_1.2.0 HDInterval_0.2.4 RColorBrewer_1.1-3 pkgconfig_2.0.3
#> [29] tidyr_1.3.2 rstudioapi_0.18.0 farver_2.1.2 digest_0.6.39
#> [33] viridisLite_0.4.3 R6_2.6.1 tidyselect_1.2.1 pillar_1.11.1
#> [37] magrittr_2.0.4 bslib_0.10.0 withr_3.0.2 tools_4.5.1
#> [41] gtable_0.3.6 cachem_1.1.0