Estimate growth using MCMC

library(BayesGrowth)

BayesGrowth combines length-at-age modelling with MCMC implemented using Stan and the rstan package. Growth modelling using models such as the von Bertalanffy growth function involves three parameters: \(L_{\infty}\), \(k\) and either \(L_{0}\) or \(t_{0}\). Two of these parameters: \(L_{0}\) and \(L_{\infty}\) have direct biological meaning as the length-at-birth and maximum length, respectively. This package provides the tools to run an MCMC model with these two parameters treated as size-at-birth and maximum length using a Stan model. This MCMC model is pre-specified and built into wrapper functions.

The user can therefore run an MCMC growth model using knowledge of species length-at-birth and maximum size as priors.

Installation

This package provides a series of wrapper functions to the rstan package which will run MCMC models in Stan using No U-turn sampling (NUTS). rstan models are automatically run by the package. However, you’ll probably want to install the rstan package anyway in order to work with the resulting model outputs.

install.packages("rstan")

You can install the released version of BayesGrowth from Github with:

devtools::install_github("jonathansmart/BayesGrowth")

Usage

The main BayesGrowth function is Estimate_MCMC_Growth() which is the wrapper function around an rstan model. It requires a data input that includes columns that can be identified “Age” and “Length”, the model needs to be specified (several options are available) and the priors must be specified. Priors include the max size with an error, length-at-birth with an error and upper limits for \(k\) and \(\sigma\). These latter two parameters have no informative priors and only require sensible upper bounds. Many fish species (including this example) have a size at birth of zero. Therefore, this can value can be used as a prior along with a very small error to indicate high certainty of this prior. The L0.se argument cannot be zero, but the model is specified to truncate \(L_{0}\) at zero and keep growth positive.

Note that rstan estimates parameter precision (\(\tau\)) rather than parameter standard error (\(\theta{\sigma}\)). Therefore, standard error is converted to precision as \(\tau = (1/\theta{\sigma})^2\). This is handled by the BayesGrowth package so the user does not need to do this conversion themselves.

Here is an example of Estimate_MCMC_Growth() in action:

library(BayesGrowth)

# built in dataset for running examples
data("example_data")

## Biological info - lengths in mm
max_size <- 440
max_size_se <- 5
birth_size <- 0
birth_size_se <- 0.001 #  cannot be zero

# Use the function to estimate the rstan model
MCMC_example_results <- Estimate_MCMC_Growth(example_data, 
                                             Model = "VB" ,
                                             iter = 10000, 
                                             n.chains = 4,  # minimum of 3 chains recommended
                                             BurnIn = 1000, # default is 10% of iterations
                                             thin = 10,      # a thinning rate of 10 is applied to overcome autocorrelation
                                             Linf = max_size,
                                             Linf.se = max_size_se,
                                             L0 = birth_size,
                                             L0.se = birth_size_se,
                                             sigma.max = 100,
                                             verbose = TRUE,
                                             k.max = 1)

Outputs and compatibility with rstan and bayesplot packages

Estimate_MCMC_Growth returns the rstan outputs which is an object of class stanfit. Therefore, all accompanying R packages to rstan can be used to interrogate the results. A useful package for analysing the posteriors is bayesplot which has several functions to explore chain convergence, posterior densities, autocorrelation, etc.

Summaries

As an rstan model object is returned from the function, all of the diagnostics from the rstan library can be used. For example, a summary and structure of results can be produced.

library(rstan)
summary(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma"))$summary
#>               mean      se_mean           sd         2.5%          25%
#> Linf  3.180221e+02 6.880817e-02 4.1518326541 3.109460e+02 3.151457e+02
#> k     6.603822e-01 5.857158e-04 0.0344834676 5.928279e-01 6.373009e-01
#> L0    7.937532e-04 1.083977e-05 0.0005947255 3.223029e-05 3.237756e-04
#> sigma 2.431159e+01 1.502347e-02 0.8879123712 2.271614e+01 2.369463e+01
#>                50%          75%        97.5%    n_eff      Rhat
#> Linf  3.176650e+02 3.205548e+02 3.269781e+02 3640.824 0.9994656
#> k     6.602703e-01 6.843590e-01 7.264078e-01 3466.155 0.9996347
#> L0    6.680529e-04 1.142935e-03 2.152788e-03 3010.183 0.9998376
#> sigma 2.426977e+01 2.489364e+01 2.617737e+01 3493.011 0.9995634
str(MCMC_example_results,max.level = 2)
#> Formal class 'stanfit' [package "rstan"] with 10 slots
#>   ..@ model_name: chr "VB_stan_model"
#>   ..@ model_pars: chr [1:6] "L0" "Linf" "k" "sigma" ...
#>   ..@ par_dims  :List of 6
#>   ..@ mode      : int 0
#>   ..@ sim       :List of 12
#>   ..@ inits     :List of 4
#>   ..@ stan_args :List of 4
#>   ..@ stanmodel :Formal class 'stanmodel' [package "rstan"] with 5 slots
#>   ..@ date      : chr "Thu Nov 05 17:13:23 2020"
#>   ..@ .MISC     :<environment: 0x000002042e574400>

Chain convergence

The chains of the MCMC model can be examined using a trace plot. The right hand panels show the chains for each parameter as they progress through the MCMC iterations. The left hand panels show the posterior distributions of each parameter. As this model is specified to have a normal distribution, all parameter posteriors should be normally distributed once convergence has been reached. \(L_{0}\) is an exception as this parameter is truncated to remain above zero. Therefore, as \(L_{0}\) approaches zero it will be right skewed. This plot shows good model convergence. All of the chains overlap and each of the parameters has the correct posterior distribution with a single mode.

library(bayesplot)

mcmc_combo(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma")) 

Convergence can also be checked using a Gelman Rubin test. If all point estimates of the Gelman-Rubin diagnostic (\(\hat{R}\)) for each parameter are less than 1.2 then convergence has been reached. Ideally these should be close to one. If not, then the number of iterations should be increased. This result is returned as part of the Estimate_MCMC_Growth summary and can also be accessed using the rhat function. These can also be plotted using mcmc_rhat.

rhats <- rhat(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma"))

mcmc_rhat(rhats)

Autocorrelation

Autocorrelation can be examined using the mcmc_acf function in the bayesplot package. This plot shows the autocorrelation lag that occurs in the MCMC model for each parameter. If each parameter has an autocorrelation value of zero at the end of the lag series, then no autocorrelation has occurred.

mcmc_acf(MCMC_example_results,pars = c("Linf", "k", "L0", "sigma"))

Other useful BayesGrowth functions

Additional BayesGrowth functions are available that help the user manipulate the returned Estimate_MCMC_Growth object. For example Get_MCMC_parameters will return a tibble of parameter results and statistics as a more manageable object than summary().

Get_MCMC_parameters(MCMC_example_results)
#>   Parameter   mean se_mean   sd   2.5%    50%  97.5%   n_eff Rhat
#> 1      Linf 318.02    0.07 4.15 310.95 317.66 326.98 3640.82    1
#> 2         k   0.66    0.00 0.03   0.59   0.66   0.73 3466.16    1
#> 3        L0   0.00    0.00 0.00   0.00   0.00   0.00 3010.18    1
#> 4     sigma  24.31    0.02 0.89  22.72  24.27  26.18 3493.01    1

The Calculate_MCMC_growth_curve function will provide confidence intervals around the growth curve based on MCMC parameter percentiles. This is essentially a wrapper around the tidybayes::mean_qi() function which means it can be passed straight into a ggplot with the tidybayes::geom_line_ribbon function.

# Return a growth curve with 50th and 95th percentiles
growth_curve <- Calculate_MCMC_growth_curve(obj = MCMC_example_results, Model = "VB",
                                            max.age = max(example_data$Age), probs = c(.5,.95))
library(tidybayes)
library(ggplot2)

ggplot(growth_curve, aes(Age, LAA))+
  geom_point(data = example_data, aes(Age, Length), alpha = .3)+
geom_lineribbon(aes( ymin = .lower, ymax = .upper,
                       fill = factor(.width)), size = .8) +  labs(y = "Total Length (mm)", x = "Age (yrs)")+
  scale_fill_brewer(palette="BuPu", direction=1,name =NULL)+
  scale_y_continuous(expand = c(0,0))+
  scale_x_continuous(expand = c(0,0), breaks = seq(0,13,1))+
  theme_bw()+
  theme(text = element_text(size = 14))

This represents a much improved fit over a standard non-linear estimated model, even if the length-at-birth were fixed at zero. Here the fit is compared using an nls model fit using the AquaticLifeHistory package (https://jonathansmart.github.io/AquaticLifeHistory/).

Model comparison and selection

The Leave One Out Information Criterion (LOOIC) and/or Widely Applicable Information Critereon (WAIC) should be calculated for different candidate models to determine the best fitting growth model. The Compare_Growth_Models function can calculate both for all three models in a single function call. The user should specify the same arguments as the original Estimate_MCMC_Growth function call to ensure consistency. If the arguments of Estimate_MCMC_Growth are updated (for example priors are changed, more iterations are run, thinning is used, etc.) then Compare_Growth_Models should be re-run accordingly.

This function has a longer runtime than the initial model run as it will fit all three candidate Bayesian growth models. This is necessary as all models will be run with identical priors and model options which is important for determining the best model.

The model with the highest LOOIC or WAIC weights provides the best fit to the data and should be used as the final growth model. LOOIC is considered more robust than WAIC but comparing both can be useful. The stats argument in the function call lets the user decide if they would like to return results for LooIC, WAIC or both. This model comparison analysis should be the first step in the model fitting process so that subsequent diagnostics and results are only produced for the best fitting model.

From these results we can see that the best fitting model was the Von Bertalanffy growth model. Note that this function does not check model convergence (a large number of warning messages suggests that one or more models has struggled to converge). Therefore, full diagnostics need to be run on the best fitting model. If the model configurations are updated to deal with issues such as autocorrelation then model comparisons and selection should be undertaken again.

Looic_example_results <- Compare_Growth_Models(data = example_data,
                                               stats = "LooIC",
                                               iter = 10000, 
                                               n_cores = 3,
                                               n.chains = 4,
                                               BurnIn = 1000,
                                               thin = 1, 
                                               Linf = max_size,
                                               Linf.se = max_size_se,
                                               L0 = birth_size,
                                               L0.se = birth_size_se,
                                               verbose = TRUE,
                                               sigma.max = 100,
                                               k.max = 1)
#>      Model elpd_diff  se_diff  elpd_loo se_elpd_loo       p_loo     se_p_loo
#> 1       VB     0.000  0.00000 -3640.812    33.08032 8.864896494 1.476362e+00
#> 2 Gompertz -3669.257 40.17058 -7310.070    16.80033 0.295843815 2.751125e-02
#> 3 Logistic -3832.046 40.65479 -7472.858    20.21025 0.001686265 3.736302e-05
#>       looic se_looic looic_Weight
#> 1  7281.625 66.16063            1
#> 2 14620.139 33.60067            0
#> 3 14945.716 40.42050            0