This vignette describes the statistical model, likelihood, prior
distributions, and MCMC algorithm implemented in
hhdynamics. For a practical workflow guide, see
vignette("hhdynamics-intro").
The model is a discrete-time Bayesian household transmission model that separates community and within-household infection sources. It was originally described in Tsang et al. (2014) and has been extended here to support covariate effects, missing data imputation, and optional serial interval estimation.
The data come from a case-ascertained household study. Each household is enrolled through an index case (the first identified case), and household contacts are followed prospectively for a defined period. The observed data for each contact include whether they became infected and, if so, the day of symptom onset. Covariates (age, sex, etc.) may be recorded for each individual.
For household contact \(i\) in household \(h\) on day \(t\), the hazard of infection is:
\[ \lambda_i(t) = \left[\beta_c + \sum_{\substack{j \in \mathcal{I}_h \\ j \neq i}} \frac{\beta_h \cdot w(t - t_j) \cdot \exp(\mathbf{x}_j^\top \boldsymbol{\alpha}_{\text{inf}} + \varepsilon_j)}{(n_h - 1)^\gamma}\right] \cdot \exp(\mathbf{z}_i^\top \boldsymbol{\alpha}_{\text{sus}}) \]
where:
The daily probability that contact \(i\) becomes infected on day \(t\), given they are still susceptible, is:
\[ P(\text{infected on day } t \mid \text{susceptible at } t) = 1 - \exp(-\lambda_i(t)) \]
Covariate effects enter the hazard multiplicatively on the log scale:
Factor covariates are encoded as dummy variables using treatment contrasts (the first level is the reference).
For contact \(i\) in household \(h\), let \(T_i\) be the infection day (if infected) or the end of follow-up plus one (if not infected), and let \(s_h\) be the start of the observation period (the index case’s onset day). Let \(\delta_i = 1\) if infected, \(0\) otherwise.
The log-likelihood contribution of contact \(i\) is:
\[ \ell_i = -\sum_{t=s_h}^{T_i - 2} \lambda_i(t) + \delta_i \cdot \log\left(1 - \exp(-\lambda_i(T_i - 1))\right) \]
The first term captures the survival probability (not being infected on days before \(T_i\)), and the second term captures the probability of infection on day \(T_i\) (only included if the contact was actually infected).
The full data log-likelihood is the sum over all non-index contacts in all households:
\[ \ell(\boldsymbol{\theta}) = \sum_{h=1}^{H} \sum_{i=1}^{n_h - 1} \ell_i \]
where \(\boldsymbol{\theta} = (\beta_c, \beta_h, \gamma, \boldsymbol{\alpha}_{\text{inf}}, \boldsymbol{\alpha}_{\text{sus}}, \sigma_\varepsilon)\) is the full parameter vector.
When random effects are included, the likelihood also includes the random effects density:
\[ \ell_{\text{RE}} = \sum_{h=1}^{H} \sum_{j \in \mathcal{I}_h} \log \phi(\varepsilon_j; 0, \sigma_\varepsilon^2) \]
where \(\phi(\cdot; 0, \sigma^2)\) is the normal density.
| Parameter | Prior | Description |
|---|---|---|
| \(\sigma_\varepsilon\) | Uniform(0.009, 5) | RE standard deviation (fixed at 1 when random effects are disabled) |
| \(\beta_c\) | Uniform(\(10^{-18}\), 9.99) | Community infection rate |
| \(\beta_h\) | Uniform(\(10^{-18}\), 9.99) | Household transmission rate |
| \(\gamma\) | Uniform(0, 1) | Household size parameter (currently fixed at 0) |
| \(\alpha_{\text{inf},k}\) | Normal(0, 3) | Each infectivity covariate effect |
| \(\alpha_{\text{sus},k}\) | Normal(0, 3) | Each susceptibility covariate effect |
| Weibull shape | Uniform(0.1, 10) | SI shape (only when estimate_SI = TRUE) |
| Weibull scale | Uniform(0.1, 20) | SI scale (only when estimate_SI = TRUE) |
The uniform priors on \(\beta_c\) and \(\beta_h\) are deliberately wide and effectively non-informative for realistic transmission rates. The Normal(0, 3) prior on covariate effects is weakly informative — it assigns most probability mass to relative risks between \(\exp(-6) \approx 0.002\) and \(\exp(6) \approx 400\), which is very permissive for epidemiological covariate effects.
Parameters are updated one at a time via Metropolis-Hastings with Gaussian random-walk proposals. At iteration \(b\):
move[k] = 1):
If the proposed parameter falls outside the prior support, the prior log-density is \(-\infty\) and the proposal is automatically rejected.
After the first 500 iterations, proposal standard deviations \(\sigma_k\) are set to the empirical posterior standard deviation of the chain so far, with multiplicative adjustments based on acceptance rates:
| Acceptance rate | Adjustment |
|---|---|
| < 10% | \(\sigma_k \times 0.5\) |
| 10–15% | \(\sigma_k \times 0.8\) |
| 15–20% | \(\sigma_k \times 0.95\) |
| 20–30% | No change (target range) |
| 30–40% | \(\sigma_k \times 1.05\) |
| 40–90% | \(\sigma_k \times 1.2\) |
| > 90% | \(\sigma_k \times 2.0\) |
This adaptive scheme typically achieves stable acceptance rates within 1000–2000 iterations.
At each MCMC iteration, after updating all parameters, the following latent variables are updated for each household member via a joint Metropolis-Hastings step:
Infection times: For infected non-index contacts, a new onset time is proposed uniformly over the follow-up window. The proposal is accepted or rejected based on the full household likelihood ratio.
Random effects (when enabled): For each infected individual, a new random effect \(\varepsilon_j^*\) is proposed from \(\varepsilon_j + \text{Normal}(0, \sigma_\varepsilon)\).
Missing covariates: For individuals with missing factor covariates, a new level is proposed uniformly from all levels of the factor. This is a symmetric proposal, so no correction term is needed.
All three augmentation steps are proposed jointly for each household member and accepted or rejected together. This joint update is more efficient than separate updates because the onset time, random effect, and covariate value are correlated in the posterior. The augmentation step is parallelized across households using RcppParallel.
When estimate_SI = TRUE, two additional parameters
(Weibull shape and scale) are added to the MCMC. At each iteration, the
serial interval PMF is recomputed from the current Weibull
parameters:
\[ w(d) = F_W(d + 1; k, \lambda) - F_W(d; k, \lambda), \quad d = 1, 2, \ldots, 14 \]
where \(F_W(\cdot; k, \lambda)\) is the Weibull CDF with shape \(k\) and scale \(\lambda\). The Weibull parameters are updated via the same Metropolis-Hastings random-walk scheme as the other parameters.
Because the sampler uses a single chain with adaptive tuning, standard multi-chain diagnostics (such as \(\hat{R}\)) are not directly applicable. Instead, we recommend:
Trace plots: plot_diagnostics(fit)
shows trace plots and posterior densities for all parameters. Look for
stationarity after burn-in and good mixing (no long excursions or sticky
regions).
Effective sample size (ESS):
plot_diagnostics(fit, show_ess = TRUE) or
table_parameters(fit, show_ess = TRUE) report ESS using the
initial positive sequence estimator (Geyer, 1992). ESS values below 100
suggest poor mixing; consider increasing n_iteration or
investigating identifiability.
Acceptance rates:
table_parameters(fit) shows acceptance rates. Rates outside
15–40% indicate poor tuning, which may occur if the burn-in was
insufficient for the adaptive scheme to converge. In practice, the
default burn-in of 5000 iterations is usually sufficient.
Log-likelihood trace:
fit$log_likelihood stores the full log-likelihood trace
(before thinning) for visual inspection.
For most datasets:
n_iteration = 50000: sufficient for convergence with
moderate-dimensional models.burnin = 10000: allows the adaptive tuning to
stabilize.thinning = 10: reduces autocorrelation in stored
samples.For datasets with many covariates, missing data, or
estimate_SI = TRUE, consider longer chains (100,000+
iterations) and examine ESS for each parameter.
community: Reported as a daily probability on the
probability scale via \(1 -
\exp(-\beta_c)\). This is the daily probability of infection from
community sources, independent of household transmission.household: Reported as a daily probability via \(1 - \exp(-\beta_h)\). This is the baseline
daily probability of person-to-person transmission within a household
from a single infected member at peak serial interval.Covariate effects are estimated on the log scale. The
summary() method reports both the raw (log-scale) estimates
and the exponentiated estimates:
For example, if the susceptibility effect for age group “6-17” is \(\hat{\alpha} = -0.3\) with 95% CrI \((-0.8, 0.2)\), then \(\exp(\hat{\alpha}) = 0.74\) with 95% CrI \((0.45, 1.22)\). This means children aged 6-17 have an estimated 26% lower susceptibility than the reference group, but the credible interval includes 1 (no difference).
When estimate_SI = TRUE, the output includes
si_shape and si_scale — the Weibull shape
(\(k\)) and scale (\(\lambda\)) parameters. The mean serial
interval is \(\lambda \cdot \Gamma(1 +
1/k)\) and the variance is \(\lambda^2[\Gamma(1 + 2/k) - \Gamma(1 +
1/k)^2]\).
simulate_data()) is forced to
single-threaded mode because the R random number generator is not
thread-safe.Tsang TK, Lau EHY, Cauchemez S, Cowling BJ. Association between antibody titers and protection against influenza virus infection within households. Journal of Infectious Diseases. 2014;210(5):684–692. https://doi.org/10.1093/infdis/jiu186
Geyer CJ. Practical Markov chain Monte Carlo. Statistical Science. 1992;7(4):473–483.