hhdynamics fits a Bayesian household transmission model using MCMC to estimate:
The model accounts for tertiary transmission (household contacts infecting other household contacts) and uses a serial interval distribution to link infection times.
Input data must be in long format (one row per individual) with these required columns:
| Column | Description |
|---|---|
hhID |
Household identifier |
member |
Member index within household (0 = index case, 1, 2, … = contacts) |
size |
Number of individuals in the household |
end |
End date of follow-up for that individual |
inf |
Infection status (1 = infected, 0 = not infected). Index cases must be infected. |
onset |
Onset time of symptoms (use -1 or NA for uninfected) |
Additional columns can be included as covariates for infectivity or susceptibility.
library(hhdynamics)
data("inputdata")
str(inputdata)
#> 'data.frame': 1533 obs. of 8 variables:
#> $ hhID : int 1 1 1 100 100 100 100 10005 10005 10005 ...
#> $ member: int 0 1 2 0 1 2 3 0 1 2 ...
#> $ size : int 3 3 3 4 4 4 4 3 3 3 ...
#> $ end : int 15 15 15 66 66 66 66 922 922 922 ...
#> $ inf : int 1 0 0 1 0 0 0 1 0 0 ...
#> $ onset : int 8 NA NA 57 NA NA NA 913 NA NA ...
#> $ age : Factor w/ 3 levels "0","1","2": 1 1 3 2 1 1 3 1 3 2 ...
#> $ sex : Factor w/ 2 levels "0","1": 2 2 2 1 2 1 1 1 2 2 ...
head(inputdata)
#> hhID member size end inf onset age sex
#> 1 1 0 3 15 1 8 0 1
#> 2 1 1 3 15 0 NA 0 1
#> 3 1 2 3 15 0 NA 2 1
#> 4 100 0 4 66 1 57 1 0
#> 5 100 1 4 66 0 NA 0 1
#> 6 100 2 4 66 0 NA 0 0The model uses a discrete serial interval distribution (probability mass function of length 14, one value per day). By default, the bundled influenza serial interval from Tsang et al. (2014) is used:
data("SI")
print(SI)
#> [1] 5.872190e-02 3.155084e-01 4.863670e-01 1.369000e-01 2.502287e-03
#> [6] 3.537313e-07 1.257039e-14 0.000000e+00 0.000000e+00 0.000000e+00
#> [11] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
barplot(SI, names.arg = 1:14, xlab = "Days", ylab = "Probability",
main = "Default serial interval distribution (influenza)")You can provide your own SI vector, or let the model estimate it from data (see below).
Use household_dynamics() to fit the model. Specify
covariates using R formulas:
# With covariates (uses default influenza SI)
fit <- household_dynamics(inputdata, inf_factor = ~sex, sus_factor = ~age,
n_iteration = 50000, burnin = 10000, thinning = 10)
# Without covariates
fit <- household_dynamics(inputdata,
n_iteration = 50000, burnin = 10000, thinning = 10)
# With a custom serial interval
my_SI <- c(0, 0.01, 0.05, 0.15, 0.25, 0.25, 0.15, 0.08, 0.04, 0.015, 0.005, 0, 0, 0)
fit <- household_dynamics(inputdata, SI = my_SI,
n_iteration = 50000, burnin = 10000, thinning = 10)The function returns an hhdynamics_fit object containing
the full MCMC output.
print(fit)
# Household transmission model fit
# Data: 386 households, 1533 individuals
# MCMC: 4000 post-burnin samples (50000 iterations, burnin: 10000, thin: 10)
# Runtime: 85 seconds
# Parameters: community, household, sex1.0, age1.0, age2.0
#
# Use summary() for estimates, coef() for posterior means.summary() returns a table with posterior means, 95%
credible intervals, and exponentiated estimates for covariate
effects:
summary(fit)
# Variable Point estimate Lower bound Upper bound ...
# Daily probability of infection from community 0.004 0.002 0.007
# Probability of person-to-person transmission in households 0.057 0.034 0.084
# sex1.0 -0.081 -0.733 0.488
# age1.0 -0.065 -0.537 0.412
# age2.0 -0.312 -0.831 0.170Community and household parameters are reported on the probability
scale (via 1 - exp(-x) transform). Covariate effects are on
the log scale, with exponentiated values also shown.
The fit object stores all MCMC output for custom diagnostics and post-processing:
# Posterior samples matrix (post-burnin, thinned)
dim(fit$samples) # e.g. 4000 x 7
# Log-likelihood trace (full chain, for convergence checks)
dim(fit$log_likelihood) # e.g. 50000 x 3
# Per-parameter acceptance rates
fit$acceptance
# Trace plot for community parameter
plot(fit$samples[, "community"], type = "l",
ylab = "Community rate", xlab = "Iteration")
# Posterior density
hist(fit$samples[, "household"], breaks = 50, probability = TRUE,
main = "Household transmission probability (raw scale)",
xlab = "Rate parameter")hhdynamics provides several plotting functions for diagnostics and publication-ready figures.
# Forest plot of covariate effects (relative risks)
plot_covariates(fit)
# With custom labels for variable headers and factor levels
plot_covariates(fit, file = "covariates.pdf",
labels = list(
sex = list(name = "Sex", levels = c("Male", "Female")),
age = list(name = "Age Group", levels = c("0-5", "6-17", "18+"))))The file parameter auto-sizes the PDF dimensions based
on the number of covariates.
# Overall attack rate
plot_attack_rate(fit)
# Stratified by a single variable
plot_attack_rate(fit, by = ~age)
# Combine overall + multiple strata in one figure
plot_attack_rate(fit, by = list(~sex, ~age), include_overall = TRUE,
labels = list(
sex = list(name = "Sex", levels = c("Male", "Female")),
age = list(name = "Age Group", levels = c("0-5", "6-17", "18+"))))hhdynamics includes table functions that return data frames suitable for reporting.
household_dynamics() validates inputs before running the
MCMC, catching common errors early:
# Missing column
household_dynamics(inputdata[, -1])
# Error: 'input' is missing required columns: hhID
# Wrong formula variable
household_dynamics(inputdata, inf_factor = ~nonexistent)
# Error: Variables in inf_factor not found in input: nonexistent
# Old string syntax (no longer supported)
household_dynamics(inputdata, inf_factor = "~sex")
# Error: 'inf_factor' must be a formula (e.g. ~sex) or NULL.Infected household contacts with missing (NA) onset
times are automatically imputed during MCMC. The sampler proposes onset
times uniformly within the follow-up window and accepts/rejects via the
full likelihood. Index case onsets must be non-missing.
# Some infected contacts have unknown symptom onset
inputdata_missing_onset <- inputdata
infected_contacts <- which(inputdata$member > 0 & inputdata$inf == 1)
inputdata_missing_onset$onset[infected_contacts[1:5]] <- NA
fit_onset <- household_dynamics(inputdata_missing_onset, ~sex, ~age,
n_iteration = 50000, burnin = 10000, thinning = 10)
# Note: 5 of 92 infected contact(s) (5.4%) have missing onset times. These will be imputed during MCMC.household_dynamics() also imputes missing factor
covariates via Bayesian data augmentation. Simply pass your data with
NA values in factor columns:
# Introduce some missing values
inputdata_missing <- inputdata
inputdata_missing$sex[c(5, 12, 30)] <- NA
inputdata_missing$age[c(8, 20)] <- NA
# Fit as usual — missing values are imputed automatically
fit_missing <- household_dynamics(inputdata_missing, ~sex, ~age,
n_iteration = 50000, burnin = 10000, thinning = 10)
# Note: Covariate 'sex' has 3 missing value(s) (0.2%). These will be imputed during MCMC.
# Note: Covariate 'age' has 2 missing value(s) (0.1%). These will be imputed during MCMC.
summary(fit_missing)Restrictions:
NA will produce an error.~sex*age) are not supported when
covariates have missing data.inf_factor and
sus_factor if it has missing values.If the serial interval is unknown or you want to estimate it jointly
with the transmission parameters, set estimate_SI = TRUE.
The model parameterizes the SI as a Weibull distribution and estimates
the shape and scale parameters via MCMC:
fit_si <- household_dynamics(inputdata, inf_factor = ~sex, sus_factor = ~age,
n_iteration = 50000, burnin = 10000, thinning = 10,
estimate_SI = TRUE)
summary(fit_si)
# Output now includes si_shape and si_scale parameters
# Reconstruct the estimated SI distribution
est_shape <- mean(fit_si$samples[, "si_shape"])
est_scale <- mean(fit_si$samples[, "si_scale"])
est_SI <- pweibull(2:15, est_shape, est_scale) - pweibull(1:14, est_shape, est_scale)
barplot(est_SI, names.arg = 1:14, xlab = "Days", ylab = "Probability",
main = "Estimated serial interval")The Weibull priors are: shape ~ Uniform(0.1, 10), scale ~
Uniform(0.1, 20). When estimate_SI = TRUE, the
SI argument is not used.
simulate_data() generates synthetic datasets for
validation or power analysis: