The Weibull Nesting Chain

Three levels of complexity

The three models form a strict nesting chain: \[ \underbrace{\text{Exponential}}_{\lambda_1, \ldots, \lambda_m} \;\subset\; \underbrace{\text{Hom.\ Weibull}}_{k,\, \beta_1, \ldots, \beta_m} \;\subset\; \underbrace{\text{Het.\ Weibull}}_{k_1, \beta_1, \ldots, k_m, \beta_m} \]

Model	Constructor	Parameters	Count
Exponential	`exp_series_md_c1_c2_c3()`	\(\lambda_1, \ldots, \lambda_m\)	\(m\)
Hom. Weibull	`wei_series_homogeneous_md_c1_c2_c3()`	\(k, \beta_1, \ldots, \beta_m\)	\(m + 1\)
Het. Weibull	`wei_series_md_c1_c2_c3()`	\(k_1, \beta_1, \ldots, k_m, \beta_m\)	\(2m\)

The two LRT steps

Each nesting defines a hypothesis test:

Step 1: Shape heterogeneity. Does the data support different shapes for different components? \[ H_0: k_1 = k_2 = \cdots = k_m \quad \text{(homogeneous)} \quad\text{vs}\quad H_1: \text{shapes unrestricted (heterogeneous)} \] The LRT statistic is \(\Lambda = -2[\ell(\hat\theta_{\text{hom}}) - \ell(\hat\theta_{\text{het}})]\), which under \(H_0\) follows \(\chi^2_{m-1}\) asymptotically.

Step 2: Aging. If the homogeneous model is retained, does the common shape differ from 1 (exponential)? \[ H_0: k = 1 \quad \text{(exponential)} \quad\text{vs}\quad H_1: k \neq 1 \quad \text{(homogeneous Weibull)} \] The LRT statistic follows \(\chi^2_1\) under \(H_0\).

Physical interpretation

The nesting chain reflects increasing physical realism:

Exponential \(\to\) Homogeneous Weibull tests whether components age (\(k > 1\), increasing hazard) or exhibit infant mortality (\(k < 1\), decreasing hazard). Rejecting \(k = 1\) means the constant-hazard assumption is inadequate.
Homogeneous \(\to\) Heterogeneous Weibull tests whether components have different aging patterns. Rejecting equal shapes indicates at least one component has a fundamentally different failure mechanism — for example, an electronic component with infant mortality alongside a bearing with wear-out.

Top-down testing cascade

We recommend a top-down approach: start with the most general model and simplify until a restriction is rejected:

Fit all three models to the data.
Test heterogeneous vs homogeneous. If we cannot reject \(H_0\), adopt the homogeneous model.
Test homogeneous vs exponential. If we cannot reject \(H_0\), adopt the exponential model.

This approach controls the risk of prematurely simplifying the model.

Worked Example: Homogeneous True Model

We generate data from a 5-component homogeneous Weibull system — the “middle” model in the chain — and show that the LRT correctly identifies it.

Setup and data generation

m <- 5
theta_hom <- c(k = 1.5, beta1 = 300, beta2 = 400, beta3 = 500,
               beta4 = 600, beta5 = 700)

model_hom <- wei_series_homogeneous_md_c1_c2_c3()

# Right-censoring time at the 80th percentile of the system lifetime
beta_sys <- wei_series_system_scale(theta_hom[1], theta_hom[-1])
tau <- qweibull(0.8, shape = theta_hom[1], scale = beta_sys)

cat("System scale:", round(beta_sys, 1), "\n")
#> System scale: 152.9
cat("Right-censoring time (q = 0.8):", round(tau, 1), "\n")
#> Right-censoring time (q = 0.8): 210

set.seed(2024)
gen_hom <- rdata(model_hom)

df <- gen_hom(theta_hom, n = 300, p = 0.2,
              observe = observe_right_censor(tau = tau))

cat("Observation types:\n")
#> Observation types:
print(table(df$omega))
#> 
#> exact right 
#>   246    54

Fitting all three models

# Exponential (m params)
model_exp <- exp_series_md_c1_c2_c3()
fit_e <- fit(model_exp)(df, par = rep(0.002, m), method = "Nelder-Mead")

# Homogeneous Weibull (m+1 params)
fit_h <- fit(model_hom)(df, par = c(1, rep(500, m)),
             method = "Nelder-Mead", control = list(maxit = 5000))

# Heterogeneous Weibull (2m params) — warm-start from hom fit
model_het <- wei_series_md_c1_c2_c3()
het_start <- as.numeric(rbind(rep(fit_h$par[1], m), fit_h$par[-1]))
fit_w <- fit(model_het)(df, par = het_start,
             method = "Nelder-Mead", control = list(maxit = 10000))

Likelihood ratio tests

np_exp <- m
np_hom <- m + 1
np_het <- 2 * m
n_obs <- nrow(df)

# LRT: heterogeneous vs homogeneous
LRT_shape <- -2 * (fit_h$loglik - fit_w$loglik)
df_shape <- np_het - np_hom   # m - 1 = 4
p_shape <- pchisq(LRT_shape, df = df_shape, lower.tail = FALSE)

# LRT: homogeneous vs exponential
LRT_aging <- -2 * (fit_e$loglik - fit_h$loglik)
df_aging <- np_hom - np_exp   # 1
p_aging <- pchisq(LRT_aging, df = df_aging, lower.tail = FALSE)

# AIC and BIC
aic_val <- function(ll, k) -2 * ll + 2 * k
bic_val <- function(ll, k, n) -2 * ll + log(n) * k

comparison <- data.frame(
  Model = c("Exponential", "Hom. Weibull", "Het. Weibull"),
  Params = c(np_exp, np_hom, np_het),
  LogLik = c(fit_e$loglik, fit_h$loglik, fit_w$loglik),
  AIC = c(aic_val(fit_e$loglik, np_exp),
          aic_val(fit_h$loglik, np_hom),
          aic_val(fit_w$loglik, np_het)),
  BIC = c(bic_val(fit_e$loglik, np_exp, n_obs),
          bic_val(fit_h$loglik, np_hom, n_obs),
          bic_val(fit_w$loglik, np_het, n_obs))
)

knitr::kable(comparison, digits = 2,
             caption = "Model comparison (true model: homogeneous Weibull)",
             col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC"))

Model comparison (true model: homogeneous Weibull)
Model	# Params	Log-Lik	AIC	BIC
Exponential	5	-1728	3465	3484
Hom. Weibull	6	-1698	3407	3430
Het. Weibull	10	-1697	3414	3451

cat("Step 1: Het vs Hom (shape heterogeneity)\n")
#> Step 1: Het vs Hom (shape heterogeneity)
cat(sprintf("  LRT = %.2f, df = %d, p = %.4f\n", LRT_shape, df_shape, p_shape))
#>   LRT = 1.36, df = 4, p = 0.8514
if (p_shape < 0.05) {
  cat("  Decision: Reject homogeneous -> adopt heterogeneous model\n\n")
} else {
  cat("  Decision: Fail to reject -> retain homogeneous model\n\n")
}
#>   Decision: Fail to reject -> retain homogeneous model

cat("Step 2: Hom vs Exp (aging)\n")
#> Step 2: Hom vs Exp (aging)
cat(sprintf("  LRT = %.2f, df = %d, p = %.4f\n", LRT_aging, df_aging, p_aging))
#>   LRT = 59.58, df = 1, p = 0.0000
if (p_aging < 0.05) {
  cat("  Decision: Reject exponential -> adopt homogeneous model\n")
} else {
  cat("  Decision: Fail to reject -> retain exponential model\n")
}
#>   Decision: Reject exponential -> adopt homogeneous model

Since the true model is homogeneous Weibull with \(k = 1.5\), we expect:

The heterogeneous vs homogeneous test to fail to reject — the additional shape parameters do not significantly improve the fit.
The homogeneous vs exponential test to reject — the data shows clear evidence of \(k \neq 1\).

The cascade correctly identifies the homogeneous model. Both AIC and BIC should agree, with the homogeneous model having the smallest values.

Worked Example: Heterogeneous True Model

We now generate data from a system where components have meaningfully different shapes, and show that the LRT detects this.

# Shapes span DFR to strong IFR
theta_het_true <- c(0.8, 300,    # DFR (electronics)
                    1.0, 400,    # CFR (random)
                    1.5, 500,    # mild IFR (seals)
                    2.0, 600,    # moderate IFR (bearings)
                    2.5, 700)    # strong IFR (wear-out)

model_het <- wei_series_md_c1_c2_c3()
gen_het <- rdata(model_het)

set.seed(7231)
df_het <- gen_het(theta_het_true, n = 300, p = 0.2,
                  observe = observe_right_censor(tau = tau))

cat("Observation types:\n")
#> Observation types:
print(table(df_het$omega))
#> 
#> exact right 
#>   239    61

fit_e2 <- fit(model_exp)(df_het, par = rep(0.002, m), method = "Nelder-Mead")
fit_h2 <- fit(model_hom)(df_het, par = c(1, rep(500, m)),
              method = "Nelder-Mead", control = list(maxit = 5000))
het_start2 <- as.numeric(rbind(rep(fit_h2$par[1], m), fit_h2$par[-1]))
fit_w2 <- fit(model_het)(df_het, par = het_start2,
              method = "Nelder-Mead", control = list(maxit = 10000))

# LRT: heterogeneous vs homogeneous
LRT_shape2 <- -2 * (fit_h2$loglik - fit_w2$loglik)
p_shape2 <- pchisq(LRT_shape2, df = m - 1, lower.tail = FALSE)

# LRT: homogeneous vs exponential
LRT_aging2 <- -2 * (fit_e2$loglik - fit_h2$loglik)
p_aging2 <- pchisq(LRT_aging2, df = 1, lower.tail = FALSE)

comparison2 <- data.frame(
  Model = c("Exponential", "Hom. Weibull", "Het. Weibull"),
  Params = c(np_exp, np_hom, np_het),
  LogLik = c(fit_e2$loglik, fit_h2$loglik, fit_w2$loglik),
  AIC = c(aic_val(fit_e2$loglik, np_exp),
          aic_val(fit_h2$loglik, np_hom),
          aic_val(fit_w2$loglik, np_het)),
  BIC = c(bic_val(fit_e2$loglik, np_exp, nrow(df_het)),
          bic_val(fit_h2$loglik, np_hom, nrow(df_het)),
          bic_val(fit_w2$loglik, np_het, nrow(df_het)))
)

knitr::kable(comparison2, digits = 2,
             caption = "Model comparison (true model: heterogeneous Weibull)",
             col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC"))

Model comparison (true model: heterogeneous Weibull)
Model	# Params	Log-Lik	AIC	BIC
Exponential	5	-1617	3244	3263
Hom. Weibull	6	-1617	3246	3268
Het. Weibull	10	-1538	3095	3132


cat(sprintf("\nStep 1: Het vs Hom — LRT = %.2f, df = %d, p = %.4f\n",
            LRT_shape2, m - 1, p_shape2))
#> 
#> Step 1: Het vs Hom — LRT = 158.43, df = 4, p = 0.0000
cat(sprintf("Step 2: Hom vs Exp — LRT = %.2f, df = %d, p = %.4f\n",
            LRT_aging2, 1, p_aging2))
#> Step 2: Hom vs Exp — LRT = 0.29, df = 1, p = 0.5902

With true shapes spanning 0.8 to 2.5, the cascade correctly:

Rejects the homogeneous constraint (step 1) — the shapes are genuinely different. The cascade stops here and adopts the heterogeneous model.

Note that the homogeneous vs exponential test (step 2) may fail to reject even though component aging is present. This is because the pooled shape estimate averages across DFR (\(k < 1\)) and IFR (\(k > 1\)) components, yielding \(\hat{k} \approx 1\) — which is close to exponential. The top-down cascade avoids this trap by testing heterogeneity first.

Monte Carlo Study: Power of the Heterogeneity Test

The worked examples show the cascade on individual datasets. We now assess the statistical power of the shape heterogeneity test (step 1) as a function of how different the component shapes actually are.

Design

We parameterize shape heterogeneity by the coefficient of variation (CV) of the shape parameters. All \(m = 5\) shapes are set to \(k_j = \bar{k}(1 + \text{cv} \cdot z_j)\) where \(\bar{k} = 1.5\) and \(z_1, \ldots, z_m\) are fixed offsets with mean 0 and standard deviation 1.

CV	Meaning
0%	All shapes equal — homogeneous model is correct
5–10%	Mild heterogeneity — hard to detect
20–30%	Moderate heterogeneity — detectable with adequate \(n\)
50%	Strong heterogeneity — reliably detected

Other settings: \(n = 500\) observations, \(p = 0.2\) (masking probability), right-censoring at the 80th percentile of the system lifetime under the homogeneous baseline, \(\alpha = 0.05\).

set.seed(42)

m <- 5
base_k <- 1.5
base_scales <- c(300, 400, 500, 600, 700)
cv_levels <- c(0, 0.05, 0.10, 0.20, 0.30, 0.50)
n_reps <- 200
n_obs <- 500
p_mask <- 0.2
alpha <- 0.05

# Fixed offsets: mean 0, sd 1
offsets <- c(-2, -1, 0, 1, 2)
offsets <- offsets / sd(offsets)

# Right-censoring time (based on homogeneous baseline at CV = 0)
beta_sys <- wei_series_system_scale(base_k, base_scales)
tau <- qweibull(0.8, shape = base_k, scale = beta_sys)

# Models and solvers
model_exp <- exp_series_md_c1_c2_c3()
model_hom <- wei_series_homogeneous_md_c1_c2_c3()
model_het <- wei_series_md_c1_c2_c3()

solver_exp <- fit(model_exp)
solver_hom <- fit(model_hom)
solver_het <- fit(model_het)

gen_het <- rdata(model_het)

np_exp <- m
np_hom <- m + 1
np_het <- 2 * m

results_list <- vector("list", length(cv_levels) * n_reps)
idx <- 1

for (cv in cv_levels) {
  shapes <- base_k * (1 + cv * offsets)
  theta <- as.numeric(rbind(shapes, base_scales))

  for (r in seq_len(n_reps)) {
    df_r <- gen_het(theta, n = n_obs, p = p_mask,
                    observe = observe_right_censor(tau = tau))

    row <- list(cv = cv, rep = r,
                reject_het_vs_hom = NA, reject_hom_vs_exp = NA,
                ll_exp = NA, ll_hom = NA, ll_het = NA,
                aic_exp = NA, aic_hom = NA, aic_het = NA,
                bic_exp = NA, bic_hom = NA, bic_het = NA,
                converged_all = FALSE)

    tryCatch({
      fe <- solver_exp(df_r, par = rep(0.002, m), method = "Nelder-Mead")

      # Two-phase for hom: Nelder-Mead warm-up -> L-BFGS-B refinement
      fh_nm <- solver_hom(df_r, par = c(1, rep(500, m)),
                           method = "Nelder-Mead",
                           control = list(maxit = 5000))
      fh <- solver_hom(df_r, par = fh_nm$par,
                        method = "L-BFGS-B", lower = rep(1e-6, np_hom))

      # Two-phase for het: warm-start from hom, Nelder-Mead -> L-BFGS-B
      het_start <- as.numeric(rbind(rep(fh$par[1], m), fh$par[-1]))
      fw_nm <- solver_het(df_r, par = het_start,
                           method = "Nelder-Mead",
                           control = list(maxit = 10000))
      fw <- solver_het(df_r, par = fw_nm$par,
                        method = "L-BFGS-B", lower = rep(1e-6, np_het))

      if (fe$converged && fh$converged && fw$converged) {
        row$ll_exp <- fe$loglik
        row$ll_hom <- fh$loglik
        row$ll_het <- fw$loglik

        row$aic_exp <- -2 * fe$loglik + 2 * np_exp
        row$aic_hom <- -2 * fh$loglik + 2 * np_hom
        row$aic_het <- -2 * fw$loglik + 2 * np_het

        row$bic_exp <- -2 * fe$loglik + log(n_obs) * np_exp
        row$bic_hom <- -2 * fh$loglik + log(n_obs) * np_hom
        row$bic_het <- -2 * fw$loglik + log(n_obs) * np_het

        LRT_het <- -2 * (fh$loglik - fw$loglik)
        LRT_hom <- -2 * (fe$loglik - fh$loglik)

        row$reject_het_vs_hom <- pchisq(LRT_het, df = m - 1,
                                          lower.tail = FALSE) < alpha
        row$reject_hom_vs_exp <- pchisq(LRT_hom, df = 1,
                                          lower.tail = FALSE) < alpha
        row$converged_all <- TRUE
      }
    }, error = function(e) NULL)

    results_list[[idx]] <- row
    idx <- idx + 1
  }

  cat(sprintf("CV = %.0f%% complete\n", cv * 100))
}

power_study <- do.call(rbind, lapply(results_list, as.data.frame))

Rejection rates

ps <- mc_data$power_study
ps_conv <- ps[ps$converged_all, ]

rej_summary <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) {
  data.frame(
    CV_pct = d$cv[1] * 100,
    n_converged = nrow(d),
    reject_het_vs_hom = mean(d$reject_het_vs_hom),
    reject_hom_vs_exp = mean(d$reject_hom_vs_exp)
  )
}))

knitr::kable(rej_summary, digits = 3, row.names = FALSE,
             caption = sprintf(
               "Rejection rates at alpha = %.2f (n = %d, m = %d)",
               mc_data$config$alpha, mc_data$config$n, mc_data$config$m),
             col.names = c("Shape CV (%)", "Converged",
                          "Reject Hom (Het vs Hom)",
                          "Reject Exp (Hom vs Exp)"))

Rejection rates at alpha = 0.05 (n = 500, m = 5)
Shape CV (%)	Converged	Reject Hom (Het vs Hom)	Reject Exp (Hom vs Exp)
0	200	0.045	1.00
5	200	0.050	1.00
10	200	0.145	1.00
20	200	0.350	1.00
30	200	0.860	0.78
50	200	1.000	0.99

Power curve

plot(rej_summary$CV_pct, rej_summary$reject_het_vs_hom,
     type = "b", pch = 19, col = "steelblue", lwd = 2,
     xlab = "Coefficient of Variation of Shapes (%)",
     ylab = "Rejection Rate",
     ylim = c(0, 1),
     main = "Power of the LRT Cascade")
lines(rej_summary$CV_pct, rej_summary$reject_hom_vs_exp,
      type = "b", pch = 17, col = "firebrick", lwd = 2, lty = 2)
abline(h = mc_data$config$alpha, lty = 3, col = "gray50")
legend("right",
       c("Het vs Hom (shape heterogeneity)",
         "Hom vs Exp (aging)",
         expression(alpha == 0.05)),
       col = c("steelblue", "firebrick", "gray50"),
       lty = c(1, 2, 3), pch = c(19, 17, NA), lwd = 2, cex = 0.85)
grid()

At CV = 0% (no heterogeneity), the rejection rate of the shape heterogeneity test should be near the nominal level \(\alpha = 0.05\) — the Type I error rate. As CV increases, the power rises, eventually reaching near 1 at CV = 50%.

The aging test (hom vs exp) maintains high rejection rates across all CV levels, because the baseline shape \(\bar{k} = 1.5\) is far from 1 regardless of how much the individual shapes vary around it.

AIC/BIC model selection accuracy

We also evaluate how often each information criterion selects the correct model. At CV = 0, the correct model is homogeneous Weibull. At CV > 0, it is heterogeneous Weibull.

aic_correct <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) {
  cv <- d$cv[1]

  aic_choice <- ifelse(d$aic_het < d$aic_hom & d$aic_het < d$aic_exp, "het",
                ifelse(d$aic_hom < d$aic_exp, "hom", "exp"))
  bic_choice <- ifelse(d$bic_het < d$bic_hom & d$bic_het < d$bic_exp, "het",
                ifelse(d$bic_hom < d$bic_exp, "hom", "exp"))

  correct_model <- if (cv == 0) "hom" else "het"

  data.frame(
    CV_pct = cv * 100,
    AIC_correct = mean(aic_choice == correct_model),
    BIC_correct = mean(bic_choice == correct_model)
  )
}))

knitr::kable(aic_correct, digits = 3, row.names = FALSE,
             caption = "Proportion selecting the correct model",
             col.names = c("Shape CV (%)", "AIC Correct", "BIC Correct"))

Proportion selecting the correct model
Shape CV (%)	AIC Correct	BIC Correct
0	0.925	1.000
5	0.080	0.000
10	0.210	0.000
20	0.445	0.025
30	0.920	0.405
50	1.000	1.000

Practical Guidelines

Based on the Monte Carlo study and general theory, we offer the following recommendations for practitioners:

Shape CV	LRT Power	Recommended Approach
0% (identical)	\(\sim\alpha\) (nominal)	Homogeneous Weibull — no benefit to extra parameters
\(< 10\%\)	Low (\(< 20\%\))	Prefer homogeneous — heterogeneity is undetectable
10–25%	Moderate	Run the cascade; decision depends on \(n\) and censoring
\(> 25\%\)	High (\(> 80\%\))	Heterogeneous Weibull — shape differences are reliably detected

The aging test is almost always decisive. When the true common shape is far from 1, even moderate sample sizes (\(n \geq 200\)) overwhelmingly reject the exponential model. The choice between exponential and Weibull is rarely ambiguous.

The heterogeneity test is the crux. Power depends strongly on how different the component shapes are. With \(n = 500\) and \(m = 5\), the test has good power for CV \(\geq 25\%\) but is nearly blind to CV \(< 10\%\). Practitioners should consider:

Increasing sample size if moderate heterogeneity is suspected
Reducing censoring to provide more information per observation
Using AIC/BIC as complementary criteria — BIC tends to favor simpler models while AIC is more sensitive to heterogeneity

For extensive sensitivity analysis varying \(n\), \(m\), \(p\), and censoring schemes, see the companion paper on model selection for masked series systems.

Model Selection for Masked Series Systems via Likelihood Ratio Tests

Introduction