Model Selection for Masked Series Systems via Likelihood Ratio Tests

Introduction

When analyzing masked series system data, a fundamental question precedes parameter estimation: which parametric family should we use for the component lifetimes? The maskedcauses package provides three models of increasing generality — exponential, homogeneous Weibull, and heterogeneous Weibull — each nested in the next. This hierarchy forms a natural scaffold for model selection via the likelihood ratio test (LRT).

We develop a two-step testing cascade that moves top-down from the most general model, simplifying only when the data cannot distinguish between levels. We demonstrate the cascade on worked examples and then assess the power of the shape heterogeneity test through Monte Carlo simulation.

The Weibull Nesting Chain

Three levels of complexity

The three models form a strict nesting chain: \[ \underbrace{\text{Exponential}}_{\lambda_1, \ldots, \lambda_m} \;\subset\; \underbrace{\text{Hom.\ Weibull}}_{k,\, \beta_1, \ldots, \beta_m} \;\subset\; \underbrace{\text{Het.\ Weibull}}_{k_1, \beta_1, \ldots, k_m, \beta_m} \]

Model Constructor Parameters Count
Exponential exp_series_md_c1_c2_c3() \(\lambda_1, \ldots, \lambda_m\) \(m\)
Hom. Weibull wei_series_homogeneous_md_c1_c2_c3() \(k, \beta_1, \ldots, \beta_m\) \(m + 1\)
Het. Weibull wei_series_md_c1_c2_c3() \(k_1, \beta_1, \ldots, k_m, \beta_m\) \(2m\)

The two LRT steps

Each nesting defines a hypothesis test:

Step 1: Shape heterogeneity. Does the data support different shapes for different components? \[ H_0: k_1 = k_2 = \cdots = k_m \quad \text{(homogeneous)} \quad\text{vs}\quad H_1: \text{shapes unrestricted (heterogeneous)} \] The LRT statistic is \(\Lambda = -2[\ell(\hat\theta_{\text{hom}}) - \ell(\hat\theta_{\text{het}})]\), which under \(H_0\) follows \(\chi^2_{m-1}\) asymptotically.

Step 2: Aging. If the homogeneous model is retained, does the common shape differ from 1 (exponential)? \[ H_0: k = 1 \quad \text{(exponential)} \quad\text{vs}\quad H_1: k \neq 1 \quad \text{(homogeneous Weibull)} \] The LRT statistic follows \(\chi^2_1\) under \(H_0\).

Physical interpretation

The nesting chain reflects increasing physical realism:

Top-down testing cascade

We recommend a top-down approach: start with the most general model and simplify until a restriction is rejected:

  1. Fit all three models to the data.
  2. Test heterogeneous vs homogeneous. If we cannot reject \(H_0\), adopt the homogeneous model.
  3. Test homogeneous vs exponential. If we cannot reject \(H_0\), adopt the exponential model.

This approach controls the risk of prematurely simplifying the model.

Worked Example: Homogeneous True Model

We generate data from a 5-component homogeneous Weibull system — the “middle” model in the chain — and show that the LRT correctly identifies it.

Setup and data generation

m <- 5
theta_hom <- c(k = 1.5, beta1 = 300, beta2 = 400, beta3 = 500,
               beta4 = 600, beta5 = 700)

model_hom <- wei_series_homogeneous_md_c1_c2_c3()

# Right-censoring time at the 80th percentile of the system lifetime
beta_sys <- wei_series_system_scale(theta_hom[1], theta_hom[-1])
tau <- qweibull(0.8, shape = theta_hom[1], scale = beta_sys)

cat("System scale:", round(beta_sys, 1), "\n")
#> System scale: 152.9
cat("Right-censoring time (q = 0.8):", round(tau, 1), "\n")
#> Right-censoring time (q = 0.8): 210
set.seed(2024)
gen_hom <- rdata(model_hom)

df <- gen_hom(theta_hom, n = 300, p = 0.2,
              observe = observe_right_censor(tau = tau))

cat("Observation types:\n")
#> Observation types:
print(table(df$omega))
#> 
#> exact right 
#>   246    54

Fitting all three models

# Exponential (m params)
model_exp <- exp_series_md_c1_c2_c3()
fit_e <- fit(model_exp)(df, par = rep(0.002, m), method = "Nelder-Mead")

# Homogeneous Weibull (m+1 params)
fit_h <- fit(model_hom)(df, par = c(1, rep(500, m)),
             method = "Nelder-Mead", control = list(maxit = 5000))

# Heterogeneous Weibull (2m params) — warm-start from hom fit
model_het <- wei_series_md_c1_c2_c3()
het_start <- as.numeric(rbind(rep(fit_h$par[1], m), fit_h$par[-1]))
fit_w <- fit(model_het)(df, par = het_start,
             method = "Nelder-Mead", control = list(maxit = 10000))

Likelihood ratio tests

np_exp <- m
np_hom <- m + 1
np_het <- 2 * m
n_obs <- nrow(df)

# LRT: heterogeneous vs homogeneous
LRT_shape <- -2 * (fit_h$loglik - fit_w$loglik)
df_shape <- np_het - np_hom   # m - 1 = 4
p_shape <- pchisq(LRT_shape, df = df_shape, lower.tail = FALSE)

# LRT: homogeneous vs exponential
LRT_aging <- -2 * (fit_e$loglik - fit_h$loglik)
df_aging <- np_hom - np_exp   # 1
p_aging <- pchisq(LRT_aging, df = df_aging, lower.tail = FALSE)

# AIC and BIC
aic_val <- function(ll, k) -2 * ll + 2 * k
bic_val <- function(ll, k, n) -2 * ll + log(n) * k

comparison <- data.frame(
  Model = c("Exponential", "Hom. Weibull", "Het. Weibull"),
  Params = c(np_exp, np_hom, np_het),
  LogLik = c(fit_e$loglik, fit_h$loglik, fit_w$loglik),
  AIC = c(aic_val(fit_e$loglik, np_exp),
          aic_val(fit_h$loglik, np_hom),
          aic_val(fit_w$loglik, np_het)),
  BIC = c(bic_val(fit_e$loglik, np_exp, n_obs),
          bic_val(fit_h$loglik, np_hom, n_obs),
          bic_val(fit_w$loglik, np_het, n_obs))
)

knitr::kable(comparison, digits = 2,
             caption = "Model comparison (true model: homogeneous Weibull)",
             col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC"))
Model comparison (true model: homogeneous Weibull)
Model # Params Log-Lik AIC BIC
Exponential 5 -1728 3465 3484
Hom. Weibull 6 -1698 3407 3430
Het. Weibull 10 -1697 3414 3451
cat("Step 1: Het vs Hom (shape heterogeneity)\n")
#> Step 1: Het vs Hom (shape heterogeneity)
cat(sprintf("  LRT = %.2f, df = %d, p = %.4f\n", LRT_shape, df_shape, p_shape))
#>   LRT = 1.36, df = 4, p = 0.8514
if (p_shape < 0.05) {
  cat("  Decision: Reject homogeneous -> adopt heterogeneous model\n\n")
} else {
  cat("  Decision: Fail to reject -> retain homogeneous model\n\n")
}
#>   Decision: Fail to reject -> retain homogeneous model

cat("Step 2: Hom vs Exp (aging)\n")
#> Step 2: Hom vs Exp (aging)
cat(sprintf("  LRT = %.2f, df = %d, p = %.4f\n", LRT_aging, df_aging, p_aging))
#>   LRT = 59.58, df = 1, p = 0.0000
if (p_aging < 0.05) {
  cat("  Decision: Reject exponential -> adopt homogeneous model\n")
} else {
  cat("  Decision: Fail to reject -> retain exponential model\n")
}
#>   Decision: Reject exponential -> adopt homogeneous model

Since the true model is homogeneous Weibull with \(k = 1.5\), we expect:

  1. The heterogeneous vs homogeneous test to fail to reject — the additional shape parameters do not significantly improve the fit.
  2. The homogeneous vs exponential test to reject — the data shows clear evidence of \(k \neq 1\).

The cascade correctly identifies the homogeneous model. Both AIC and BIC should agree, with the homogeneous model having the smallest values.

Worked Example: Heterogeneous True Model

We now generate data from a system where components have meaningfully different shapes, and show that the LRT detects this.

# Shapes span DFR to strong IFR
theta_het_true <- c(0.8, 300,    # DFR (electronics)
                    1.0, 400,    # CFR (random)
                    1.5, 500,    # mild IFR (seals)
                    2.0, 600,    # moderate IFR (bearings)
                    2.5, 700)    # strong IFR (wear-out)

model_het <- wei_series_md_c1_c2_c3()
gen_het <- rdata(model_het)

set.seed(7231)
df_het <- gen_het(theta_het_true, n = 300, p = 0.2,
                  observe = observe_right_censor(tau = tau))

cat("Observation types:\n")
#> Observation types:
print(table(df_het$omega))
#> 
#> exact right 
#>   239    61
fit_e2 <- fit(model_exp)(df_het, par = rep(0.002, m), method = "Nelder-Mead")
fit_h2 <- fit(model_hom)(df_het, par = c(1, rep(500, m)),
              method = "Nelder-Mead", control = list(maxit = 5000))
het_start2 <- as.numeric(rbind(rep(fit_h2$par[1], m), fit_h2$par[-1]))
fit_w2 <- fit(model_het)(df_het, par = het_start2,
              method = "Nelder-Mead", control = list(maxit = 10000))
# LRT: heterogeneous vs homogeneous
LRT_shape2 <- -2 * (fit_h2$loglik - fit_w2$loglik)
p_shape2 <- pchisq(LRT_shape2, df = m - 1, lower.tail = FALSE)

# LRT: homogeneous vs exponential
LRT_aging2 <- -2 * (fit_e2$loglik - fit_h2$loglik)
p_aging2 <- pchisq(LRT_aging2, df = 1, lower.tail = FALSE)

comparison2 <- data.frame(
  Model = c("Exponential", "Hom. Weibull", "Het. Weibull"),
  Params = c(np_exp, np_hom, np_het),
  LogLik = c(fit_e2$loglik, fit_h2$loglik, fit_w2$loglik),
  AIC = c(aic_val(fit_e2$loglik, np_exp),
          aic_val(fit_h2$loglik, np_hom),
          aic_val(fit_w2$loglik, np_het)),
  BIC = c(bic_val(fit_e2$loglik, np_exp, nrow(df_het)),
          bic_val(fit_h2$loglik, np_hom, nrow(df_het)),
          bic_val(fit_w2$loglik, np_het, nrow(df_het)))
)

knitr::kable(comparison2, digits = 2,
             caption = "Model comparison (true model: heterogeneous Weibull)",
             col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC"))
Model comparison (true model: heterogeneous Weibull)
Model # Params Log-Lik AIC BIC
Exponential 5 -1617 3244 3263
Hom. Weibull 6 -1617 3246 3268
Het. Weibull 10 -1538 3095 3132

cat(sprintf("\nStep 1: Het vs Hom — LRT = %.2f, df = %d, p = %.4f\n",
            LRT_shape2, m - 1, p_shape2))
#> 
#> Step 1: Het vs Hom — LRT = 158.43, df = 4, p = 0.0000
cat(sprintf("Step 2: Hom vs Exp — LRT = %.2f, df = %d, p = %.4f\n",
            LRT_aging2, 1, p_aging2))
#> Step 2: Hom vs Exp — LRT = 0.29, df = 1, p = 0.5902

With true shapes spanning 0.8 to 2.5, the cascade correctly:

  1. Rejects the homogeneous constraint (step 1) — the shapes are genuinely different. The cascade stops here and adopts the heterogeneous model.

Note that the homogeneous vs exponential test (step 2) may fail to reject even though component aging is present. This is because the pooled shape estimate averages across DFR (\(k < 1\)) and IFR (\(k > 1\)) components, yielding \(\hat{k} \approx 1\) — which is close to exponential. The top-down cascade avoids this trap by testing heterogeneity first.

Monte Carlo Study: Power of the Heterogeneity Test

The worked examples show the cascade on individual datasets. We now assess the statistical power of the shape heterogeneity test (step 1) as a function of how different the component shapes actually are.

Design

We parameterize shape heterogeneity by the coefficient of variation (CV) of the shape parameters. All \(m = 5\) shapes are set to \(k_j = \bar{k}(1 + \text{cv} \cdot z_j)\) where \(\bar{k} = 1.5\) and \(z_1, \ldots, z_m\) are fixed offsets with mean 0 and standard deviation 1.

CV Meaning
0% All shapes equal — homogeneous model is correct
5–10% Mild heterogeneity — hard to detect
20–30% Moderate heterogeneity — detectable with adequate \(n\)
50% Strong heterogeneity — reliably detected

Other settings: \(n = 500\) observations, \(p = 0.2\) (masking probability), right-censoring at the 80th percentile of the system lifetime under the homogeneous baseline, \(\alpha = 0.05\).

set.seed(42)

m <- 5
base_k <- 1.5
base_scales <- c(300, 400, 500, 600, 700)
cv_levels <- c(0, 0.05, 0.10, 0.20, 0.30, 0.50)
n_reps <- 200
n_obs <- 500
p_mask <- 0.2
alpha <- 0.05

# Fixed offsets: mean 0, sd 1
offsets <- c(-2, -1, 0, 1, 2)
offsets <- offsets / sd(offsets)

# Right-censoring time (based on homogeneous baseline at CV = 0)
beta_sys <- wei_series_system_scale(base_k, base_scales)
tau <- qweibull(0.8, shape = base_k, scale = beta_sys)

# Models and solvers
model_exp <- exp_series_md_c1_c2_c3()
model_hom <- wei_series_homogeneous_md_c1_c2_c3()
model_het <- wei_series_md_c1_c2_c3()

solver_exp <- fit(model_exp)
solver_hom <- fit(model_hom)
solver_het <- fit(model_het)

gen_het <- rdata(model_het)

np_exp <- m
np_hom <- m + 1
np_het <- 2 * m
results_list <- vector("list", length(cv_levels) * n_reps)
idx <- 1

for (cv in cv_levels) {
  shapes <- base_k * (1 + cv * offsets)
  theta <- as.numeric(rbind(shapes, base_scales))

  for (r in seq_len(n_reps)) {
    df_r <- gen_het(theta, n = n_obs, p = p_mask,
                    observe = observe_right_censor(tau = tau))

    row <- list(cv = cv, rep = r,
                reject_het_vs_hom = NA, reject_hom_vs_exp = NA,
                ll_exp = NA, ll_hom = NA, ll_het = NA,
                aic_exp = NA, aic_hom = NA, aic_het = NA,
                bic_exp = NA, bic_hom = NA, bic_het = NA,
                converged_all = FALSE)

    tryCatch({
      fe <- solver_exp(df_r, par = rep(0.002, m), method = "Nelder-Mead")

      # Two-phase for hom: Nelder-Mead warm-up -> L-BFGS-B refinement
      fh_nm <- solver_hom(df_r, par = c(1, rep(500, m)),
                           method = "Nelder-Mead",
                           control = list(maxit = 5000))
      fh <- solver_hom(df_r, par = fh_nm$par,
                        method = "L-BFGS-B", lower = rep(1e-6, np_hom))

      # Two-phase for het: warm-start from hom, Nelder-Mead -> L-BFGS-B
      het_start <- as.numeric(rbind(rep(fh$par[1], m), fh$par[-1]))
      fw_nm <- solver_het(df_r, par = het_start,
                           method = "Nelder-Mead",
                           control = list(maxit = 10000))
      fw <- solver_het(df_r, par = fw_nm$par,
                        method = "L-BFGS-B", lower = rep(1e-6, np_het))

      if (fe$converged && fh$converged && fw$converged) {
        row$ll_exp <- fe$loglik
        row$ll_hom <- fh$loglik
        row$ll_het <- fw$loglik

        row$aic_exp <- -2 * fe$loglik + 2 * np_exp
        row$aic_hom <- -2 * fh$loglik + 2 * np_hom
        row$aic_het <- -2 * fw$loglik + 2 * np_het

        row$bic_exp <- -2 * fe$loglik + log(n_obs) * np_exp
        row$bic_hom <- -2 * fh$loglik + log(n_obs) * np_hom
        row$bic_het <- -2 * fw$loglik + log(n_obs) * np_het

        LRT_het <- -2 * (fh$loglik - fw$loglik)
        LRT_hom <- -2 * (fe$loglik - fh$loglik)

        row$reject_het_vs_hom <- pchisq(LRT_het, df = m - 1,
                                          lower.tail = FALSE) < alpha
        row$reject_hom_vs_exp <- pchisq(LRT_hom, df = 1,
                                          lower.tail = FALSE) < alpha
        row$converged_all <- TRUE
      }
    }, error = function(e) NULL)

    results_list[[idx]] <- row
    idx <- idx + 1
  }

  cat(sprintf("CV = %.0f%% complete\n", cv * 100))
}

power_study <- do.call(rbind, lapply(results_list, as.data.frame))

Rejection rates

ps <- mc_data$power_study
ps_conv <- ps[ps$converged_all, ]

rej_summary <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) {
  data.frame(
    CV_pct = d$cv[1] * 100,
    n_converged = nrow(d),
    reject_het_vs_hom = mean(d$reject_het_vs_hom),
    reject_hom_vs_exp = mean(d$reject_hom_vs_exp)
  )
}))

knitr::kable(rej_summary, digits = 3, row.names = FALSE,
             caption = sprintf(
               "Rejection rates at alpha = %.2f (n = %d, m = %d)",
               mc_data$config$alpha, mc_data$config$n, mc_data$config$m),
             col.names = c("Shape CV (%)", "Converged",
                          "Reject Hom (Het vs Hom)",
                          "Reject Exp (Hom vs Exp)"))
Rejection rates at alpha = 0.05 (n = 500, m = 5)
Shape CV (%) Converged Reject Hom (Het vs Hom) Reject Exp (Hom vs Exp)
0 200 0.045 1.00
5 200 0.050 1.00
10 200 0.145 1.00
20 200 0.350 1.00
30 200 0.860 0.78
50 200 1.000 0.99

Power curve

plot(rej_summary$CV_pct, rej_summary$reject_het_vs_hom,
     type = "b", pch = 19, col = "steelblue", lwd = 2,
     xlab = "Coefficient of Variation of Shapes (%)",
     ylab = "Rejection Rate",
     ylim = c(0, 1),
     main = "Power of the LRT Cascade")
lines(rej_summary$CV_pct, rej_summary$reject_hom_vs_exp,
      type = "b", pch = 17, col = "firebrick", lwd = 2, lty = 2)
abline(h = mc_data$config$alpha, lty = 3, col = "gray50")
legend("right",
       c("Het vs Hom (shape heterogeneity)",
         "Hom vs Exp (aging)",
         expression(alpha == 0.05)),
       col = c("steelblue", "firebrick", "gray50"),
       lty = c(1, 2, 3), pch = c(19, 17, NA), lwd = 2, cex = 0.85)
grid()

At CV = 0% (no heterogeneity), the rejection rate of the shape heterogeneity test should be near the nominal level \(\alpha = 0.05\) — the Type I error rate. As CV increases, the power rises, eventually reaching near 1 at CV = 50%.

The aging test (hom vs exp) maintains high rejection rates across all CV levels, because the baseline shape \(\bar{k} = 1.5\) is far from 1 regardless of how much the individual shapes vary around it.

AIC/BIC model selection accuracy

We also evaluate how often each information criterion selects the correct model. At CV = 0, the correct model is homogeneous Weibull. At CV > 0, it is heterogeneous Weibull.

aic_correct <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) {
  cv <- d$cv[1]

  aic_choice <- ifelse(d$aic_het < d$aic_hom & d$aic_het < d$aic_exp, "het",
                ifelse(d$aic_hom < d$aic_exp, "hom", "exp"))
  bic_choice <- ifelse(d$bic_het < d$bic_hom & d$bic_het < d$bic_exp, "het",
                ifelse(d$bic_hom < d$bic_exp, "hom", "exp"))

  correct_model <- if (cv == 0) "hom" else "het"

  data.frame(
    CV_pct = cv * 100,
    AIC_correct = mean(aic_choice == correct_model),
    BIC_correct = mean(bic_choice == correct_model)
  )
}))

knitr::kable(aic_correct, digits = 3, row.names = FALSE,
             caption = "Proportion selecting the correct model",
             col.names = c("Shape CV (%)", "AIC Correct", "BIC Correct"))
Proportion selecting the correct model
Shape CV (%) AIC Correct BIC Correct
0 0.925 1.000
5 0.080 0.000
10 0.210 0.000
20 0.445 0.025
30 0.920 0.405
50 1.000 1.000

Practical Guidelines

Based on the Monte Carlo study and general theory, we offer the following recommendations for practitioners:

Shape CV LRT Power Recommended Approach
0% (identical) \(\sim\alpha\) (nominal) Homogeneous Weibull — no benefit to extra parameters
\(< 10\%\) Low (\(< 20\%\)) Prefer homogeneous — heterogeneity is undetectable
10–25% Moderate Run the cascade; decision depends on \(n\) and censoring
\(> 25\%\) High (\(> 80\%\)) Heterogeneous Weibull — shape differences are reliably detected

The aging test is almost always decisive. When the true common shape is far from 1, even moderate sample sizes (\(n \geq 200\)) overwhelmingly reject the exponential model. The choice between exponential and Weibull is rarely ambiguous.

The heterogeneity test is the crux. Power depends strongly on how different the component shapes are. With \(n = 500\) and \(m = 5\), the test has good power for CV \(\geq 25\%\) but is nearly blind to CV \(< 10\%\). Practitioners should consider:

For extensive sensitivity analysis varying \(n\), \(m\), \(p\), and censoring schemes, see the companion paper on model selection for masked series systems.