Estimating the effective reproductive number using ern

Introduction

The package ern has two functions with which to estimate the daily effective reproduction number, \(\mathcal{R}_t\), each for a different data stream:

In both cases, the general method is the same:

  1. use the input data to infer daily incidence
  2. use the inferred daily incidence to compute \(\mathcal{R}_t\)

Step 2 is common to both wastewater and clinical methods; where they differ is in step 1. We give details of both steps below, followed by full demos for both wastewater and clinical input data.

Using inferred daily incidence to compute \(\mathcal{R}_t\)

In both the clinical and wastewater cases, step 1 will produce an ensemble of realizations for the inferred daily incidence. We use these timeseries of realizations, along with a family of generation interval distributions specified by the user, to compute \(\mathcal{R}_t\) in an ensemble of realizations.

To compute a single realization for the \(\mathcal{R}_t\) ensemble, we draw one realization out of the ensemble of inferred daily incidence as well as one generation interval distribution out of the user-specified family, and feed both of these components into EpiEstim::estimate_R(). Once all realizations of \(\mathcal{R}_t\) have been computed, the ensemble is summarized by day with a mean and confidence interval bounds.

Note that the estimation of \(R_t\), once the daily incidence has been inferred, is outsourced to the R library EpiEstim. Put simply, ern is a wrapper around EpiEstim.

Using the input data to infer daily incidence

How we infer daily incidence from the input depends on the input data source.

Wastewater input data

We convert the pathogen concentration in wastewater over time to a daily disease incidence by performing a deconvolution with a fecal shedding distribution, which describes the distribution of virus shed in feces by an infected individual during their disease course.

Clinical input data

If the input clinical reports are not daily, ern assumes that they are aggregated over the time between report dates and infer the daily count of cases using a Markov Chain Monte-Carlo algorithm implemented in the R library rjags.

Then, we convert the daily count of clinical reports over time to the actual daily incidence of infections in the following way:

  1. scale daily clinical reports up by the user-specified reporting fraction (assumed to be constant over time)
  2. take scaled clinical reports and perform a deconvolution with a reporting delay distribution to get daily scaled onsets
  3. take daily scaled onsets and perform a deconvolution with an incubation period distribution to get the daily incidence of infections

Estimating \(\mathcal{R}_t\) with wastewater data

The function estimate_R_ww() estimates \(R_t\) from pathogen concentration measured in wastewater. It takes several inputs and parameters, described in the next two sections.

Specifying inputs

estimate_R_ww() requires the following inputs from the user:

Here, we estimate \(\mathcal{R}_t\) from a subset of wastewater data from the Iona Island wastewater treatment plant in Vancouver.

# Loading sample SARS-CoV-2 wastewater data
ww.conc = ern::ww.data

head(ww.conc)
#> # A tibble: 6 × 2
#>   date       value
#>   <date>     <dbl>
#> 1 2023-07-23  11.4
#> 2 2023-07-27  38.3
#> 3 2023-07-30  59.4
#> 4 2023-08-03  36.1
#> 5 2023-08-06  24.9
#> 6 2023-08-10  12.0
# Define SARS-CoV-2 fecal shedding and generation interval distributions
dist.fec = ern::def_dist(
  dist = "gamma",
  mean = 12.90215,
  mean_sd = 1.136829,
  shape = 1.759937,
  shape_sd = 0.2665988,
  max = 33
)

dist.gi  = ern::def_dist(
  dist     = "gamma",
  mean     = 6.84,
  mean_sd  = 0.7486,
  shape    = 2.39,
  shape_sd = 0.3573,
  max      = 15
)

We can visualize the assumed distributions with plot_dist():

plot_dist(dist.fec) + labs(title = paste0("Mean fecal shedding distribution (", dist.fec$dist, ")"))

plot_dist(dist.gi) + labs(title = paste0("Mean generation interval distribution (", dist.gi$dist, ")"))

plot_dist() returns a ggplot object, and so it can be further annotated with the usual ggplot2 tools (like labs() as above).

Note that the above dist.x lists define families of distributions (there is uncertainty specified in the mean distribution parameters), while plot_dist() only plots the mean distribution in this family.

Specifying parameters

estimate_R_ww() also takes a number of parameter sets that give the user control over various components of the \(\mathcal{R}_t\) estimation:

All of these parameters have defaults, but they can also be adjusted by the user. These settings are further described in the example below, but you may also want to consult the documentation of estimate_R_ww() for more details.

# Initializing scaling factor
scaling.factor = 1

# Initializing smoothing parameters
prm.smooth = list(
  align  = 'center', # smoothing alignment
  method = 'loess',  # smoothing method
  span   = 0.30,     # smoothing span (used for loess smoothing only)
  floor  = 5         # minimum smoothed concentration value (optional, loess smoothing only)
)

# Initialzing Rt settings
prm.R = list(
  iter   = 20,   # number of iterations in Rt ensemble
  CI     = 0.95, # confidence interval
  window = 10,   # Time window for Rt calculations
  config.EpiEstim = NULL # optional EpiEstim configuration for Rt calculations
)

Estimating \(\mathcal{R}_t\)

Once the above inputs and parameters are defined, we estimate \(\mathcal{R}_t\) as follows:

r.estim = ern::estimate_R_ww(
  ww.conc        = ww.conc,
  dist.fec       = dist.fec,
  dist.gi        = dist.gi,
  scaling.factor = scaling.factor,
  prm.smooth     = prm.smooth,
  prm.R = prm.R,
  silent = TRUE # suppress output messages
)

estimate_R_ww() returns a list with four elements:

Visualizing \(\mathcal{R}_t\) estimates

The output of estimate_R_ww() can be visualized readily using plot_diagnostic_ww(), which generates a figure with the following panels:

  1. original wastewater concentration against the smoothed signal over time
  2. inferred daily incidence
  3. the time-varying effective reproduction number, \(\mathcal{R}_t\)
g = ern::plot_diagnostic_ww(r.estim)
plot(g)


Estimating \(\mathcal{R}_t\) with clinical data

estimate_R_cl() takes several inputs and parameters, described in the next two sections.

Specifying inputs

estimate_R_cl() requires the following inputs from the user:

Here, we estimate \(\mathcal{R}_t\) for a sample of weekly clinical COVID-19 reports in the province of Quebec:

dat <- (ern::cl.data 
    |> dplyr::filter(
      pt == "qc",
      dplyr::between(date, as.Date("2021-07-01"), as.Date("2021-09-01"))
))
# define reporting delay 
dist.repdelay = ern::def_dist(
  dist = 'gamma',
  mean = 5,
  mean_sd = 1,
  sd = 1,
  sd_sd = 0.1,
  max = 10
)

# define reporting fraction 
dist.repfrac = ern::def_dist(
  dist = "unif",
  min = 0.1,
  max = 0.3
)

# define incubation period
dist.incub = ern::def_dist(
  dist     = "gamma",
  mean     = 3.49,
  mean_sd  = 0.1477,
  shape    = 8.5,
  shape_sd = 1.8945,
  max      = 8
)

# define generation interval
dist.gi = ern::def_dist(
  dist     = "gamma",
  mean     = 6.84,
  mean_sd  = 0.7486,
  shape    = 2.39,
  shape_sd = 0.3573,
  max      = 15
)

We can visualize the assumed distributions with plot_dist():

plot_dist(dist.repdelay) + labs(title = paste0("Mean reporting delay distribution (", dist.repdelay$dist, ")"))

plot_dist(dist.incub) + labs(title = paste0("Mean incubation period distribution (", dist.incub$dist, ")"))

plot_dist(dist.gi) + labs(title = paste0("Mean generation interval distribution (", dist.gi$dist, ")"))

plot_dist() returns a ggplot object, and so it can be further annotated with the usual ggplot2 tools (like labs() as above).

Note that the above dist.x lists define families of distributions (there is uncertainty specified in the mean distribution parameters), while plot_dist() only plots the mean distribution in this family.

Specifying parameters

estimate_R_cl() also takes a number of parameter sets that give the user control over various components of the \(\mathcal{R}_t\) estimation:

All of these parameters have defaults, but they can also be adjusted by the user. These settings are further described in the example below, but you may also want to consult the documentation of estimate_R_cl() for more details.

# settings for daily report inference
prm.daily = list(
  method = "renewal",
  popsize = 1e7, # population size
   # Here, low value for `burn` and `iter` 
   # to have a fast compilation of the vignette.
   # For real-world applications, both `burn` and `iter`
   # should be significantly increased (e.g., 10,000).
   # Also, the number of chains should be at least 3 
   # (instead of 1 here) for real-world applications.
  burn = 100,
  iter = 200,
  chains = 1,
  prior_R0_shape = 2,
  prior_R0_rate = 0.6,
  prior_alpha_shape = 1,
  prior_alpha_rate = 1
)

# settings for checks of daily inferred reports
prm.daily.check = list(
  agg.reldiff.tol = 200
)

# smoothing settings for daily inferred reports
prm.smooth = list(
  method = "rollmean",
  window = 3,
  align = 'center'
)

# Rt settings
prm.R = list(
  iter = 10, # number of iterations in Rt ensemble
  CI = 0.95, # 95% confidence interval
  window = 7, # time window for each Rt estimate
  config.EpiEstim = NULL
)

Estimating \(\mathcal{R}_t\)

Once the above inputs and parameters are defined, we estimate \(\mathcal{R}_t\) as follows:

r.estim = estimate_R_cl(
  cl.data      = dat,
  dist.repdelay = dist.repdelay,
  dist.repfrac  = dist.repfrac,
  dist.incub    = dist.incub,
  dist.gi       = dist.gi,
  prm.daily     = prm.daily,
  prm.daily.check = prm.daily.check,
  prm.smooth    = prm.smooth,
  prm.R         = prm.R,
  silent        = TRUE # suppress output messages
)

estimate_R_cl() returns a list with four elements:

Visualizing \(\mathcal{R}_t\) estimates

The output of estimate_R_cl() can be visualized readily using plot_diagnostic_cl(), which generates a figure with the following panels:

  1. the time-varying effective reproduction number, \(\mathcal{R}_t\)
  2. the (optionally inferred and smoothed) daily case reports
  3. (if the input data were not daily) the aggregated case reports (as observed vs inferred). Note that the inference should improve with larger values for the MCMC parameters burn and iter.
g = plot_diagnostic_cl(r.estim)
plot(g)