causaldef supports survival analysis and competing risks. This is critical for clinical applications where “time-to-event” is the primary outcome.
The executable survival examples in this vignette require a compatible survival runtime. In the current support matrix, that means R >= 4.0.
We use causal_spec_survival() to handle censoring and event times.
library(causaldef)
library(ggplot2) # Ensure ggplot2 is loaded
# DAG Helper
plot_dag <- function(coords, edges, title = NULL) {
edges_df <- merge(edges, coords, by.x = "from", by.y = "name")
colnames(edges_df)[c(3,4)] <- c("x_start", "y_start")
edges_df <- merge(edges_df, coords, by.x = "to", by.y = "name")
colnames(edges_df)[c(5,6)] <- c("x_end", "y_end")
ggplot2::ggplot(coords, ggplot2::aes(x = x, y = y)) +
ggplot2::geom_segment(data = edges_df, ggplot2::aes(x = x_start, y = y_start, xend = x_end, yend = y_end),
arrow = ggplot2::arrow(length = ggplot2::unit(0.3, "cm"), type = "closed"),
color = "gray40", size = 1, alpha = 0.8) +
ggplot2::geom_point(size = 14, color = "white", fill = "#CD5C5C", shape = 21, stroke = 1.5) + # IndianRed
ggplot2::geom_text(ggplot2::aes(label = name), fontface = "bold", size = 3.5, color = "white") +
ggplot2::ggtitle(title) +
ggplot2::theme_void(base_size = 14) +
ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold", margin = ggplot2::margin(b = 10))) +
ggplot2::coord_fixed()
}
data(hct_outcomes)
head(hct_outcomes)
#> id age disease_status kps donor_type conditioning_intensity time_to_event
#> 1 1 57 Early 89 HLA-Matched Myeloablative 14.31
#> 2 2 50 Early 85 Mismatched Reduced 11.05
#> 3 3 59 Intermediate 97 Mismatched Myeloablative 20.23
#> 4 4 65 Early 84 Unrelated Myeloablative 2.90
#> 5 5 54 Advanced 96 Unrelated Reduced 2.22
#> 6 6 48 Early 64 Mismatched Myeloablative 1.25
#> event_status
#> 1 Relapse
#> 2 Relapse
#> 3 Relapse
#> 4 Relapse
#> 5 Relapse
#> 6 DeathWe want to estimate the effect of conditioning_intensity on time_to_event for Death, treating “Relapse” as a competing event.
spec_surv <- causal_spec_survival(
data = hct_outcomes,
treatment = "conditioning_intensity",
time = "time_to_event",
event = "event_status",
# Note: The event setup needs binary 0/1 or Surv object logic
# We will convert it internally or preprocess
covariates = c("age", "disease_status", "kps"),
estimand = "ATE" # Difference in survival probability
)Note: In the current version, ensure your event column follows standard R survival conventions (0=censored, 1=event) or use proper factor levels.
We will analyze the hct_outcomes dataset to estimate the effect of conditioning intensity on the risk of relapse, while accounting for the competing risk of death without relapse.
In a competing risks setting, we need to distinguish between the event of interest (Relapse) and the competing event (Death). We create separate indicators for proper specification.
# Create indicator for Primary Event (Relapse)
hct_outcomes$relapse <- ifelse(hct_outcomes$event_status == "Relapse", 1, 0)
# Create indicator for Competing Event (Death)
# Patients who die without relapse are removed from the risk set for relapse
# but are NOT merely censored in the same way as loss-to-follow-up.
hct_outcomes$death_cr <- ifelse(hct_outcomes$event_status == "Death", 1, 0)
# Check the distribution of events
# Check the distribution of events
table(hct_outcomes$event_status)
# Visualize Competing Risks
coords <- data.frame(
name = c("Covariates", "CondIntensity", "Relapse", "Death"),
x = c(0, -1.5, 1.5, 1.5),
y = c(1, 0, 0.5, -0.5)
)
edges <- data.frame(
from = c("Covariates", "Covariates", "Covariates", "CondIntensity", "CondIntensity"),
to = c("CondIntensity", "Relapse", "Death", "Relapse", "Death")
)
plot_dag(coords, edges, title = "Competing Risks Structure")We using causal_spec_survival() to fully define the structure, including the competing event.
spec_hct <- causal_spec_survival(
data = hct_outcomes,
treatment = "conditioning_intensity",
time = "time_to_event",
event = "relapse", # Primary event of interest
competing_event = "death_cr", # Competing event
covariates = c("age", "disease_status", "kps", "donor_type"),
estimand = "ATE"
)
print(spec_hct)We compare unadjusted analysis (Naive) with Inverse Probability of Treatment Weighting (IPTW) to see if we can reduce the causal deficiency.
# Estimate deficiency
# Note: In survival settings, this measures how well we can recover the
# interventional survival curves.
results_hct <- estimate_deficiency(
spec_hct,
methods = c("unadjusted", "iptw"),
n_boot = 0 # Set > 0 for bootstrap confidence intervals
)
print(results_hct)Clinical Interpretation of Deficiency:
The deficiency \(\delta\) quantifies how much information is lost by using observational data instead of a randomized trial:
For this HCT Study:
Clinical Significance:
Why This Matters:
In HCT, patients receiving myeloablative conditioning are typically younger and healthier (confounding by indication). IPTW attempts to create a “synthetic randomization” by upweighting underrepresented patient types. Low \(\delta\) validates this approach.
To validate our findings and translate them into clinical metrics, we use standard survival models. We calculate the Restricted Mean Survival Time (RMST), which provides an interpretable effect size (difference in life expectancy up to a specific time horizon).
Why RMST and not Hazard Ratios? In a decision-theoretic framework, we need an estimand that maps directly to a utility function (e.g., “months of life gained”). Hazard ratios are relative measures that are difficult to translate into absolute utility bounds (\(M\)) needed to interpret regret bounds (transfer penalty \(M\delta\) and minimax floor \((M/2)\delta\)). RMST, being on the time scale (e.g., “months gained”), allows us to define \(M\) concretely (e.g., the horizon), making these guarantees interpretable.
# 1. Visualize Survival Curves
# We can use the standard survival package to visualize the unadjusted curves first
library(survival)
hct_outcomes$rfs_event <- ifelse(hct_outcomes$event_status != "Censored", 1, 0)
surv_obj <- Surv(time = hct_outcomes$time_to_event, event = hct_outcomes$rfs_event)
km_fit_naive <- survfit(surv_obj ~ conditioning_intensity, data = hct_outcomes)
plot(km_fit_naive, col = c("blue", "red"), lwd = 2,
main = "Relapse-Free Survival (Unadjusted)",
xlab = "Time (Months)", ylab = "Survival Probability")
legend("topright", legend = levels(hct_outcomes$conditioning_intensity),
col = c("blue", "red"), lwd = 2)
# 2. Estimate Causal Effect (RMST Difference)
# We use estimate_effect() on our deficiency object to compute the RMST difference
# using the IPTW weights we found were effective (low deficiency).
# Horizon = 24 months
horizon <- 24
# We reuse the deficiency object (and its learned weights) for the new estimand.
# Since deficiency measures the distributional match, the same weights apply
# to any outcome derived from the survival curve.
results_hct$spec$estimand <- "RMST"
results_hct$spec$horizon <- 24
effect_iptw <- estimate_effect(
results_hct,
target_method = "iptw",
contrast = c("Myeloablative", "Reduced") # Defined as Treated vs Control
)
print(effect_iptw)
# Compare with Unadjusted Effect
effect_naive <- estimate_effect(
results_hct,
target_method = "unadjusted",
contrast = c("Myeloablative", "Reduced")
)
print(effect_naive)
cat(sprintf("\nCausal RMST Difference (IPTW): %.2f months\n", effect_iptw$estimate))
cat(sprintf("Biased RMST Difference (Unadjusted): %.2f months\n", effect_naive$estimate))Interpreting RMST (Restricted Mean Survival Time) Results:
RMST is the average time patients survive (relapse-free) up to the time horizon (here, 24 months). The difference in RMST is the average survival benefit of one conditioning regimen over another.
What These Numbers Mean:
Clinical Decision-Making:
Example Interpretation:
If IPTW RMST difference = +2.5 months favoring Myeloablative:
“After adjusting for age, disease status, and comorbidity index, patients receiving myeloablative conditioning had 2.5 additional months of relapse-free survival on average within the first 24 months post-transplant, compared to reduced-intensity conditioning.”
Caution:
RMST is truncated at the horizon (24 months). Differences beyond 24 months are not captured. Choose the horizon based on clinical relevance (e.g., 1-year, 2-year, 5-year survival).
The RMST difference provides the observed treatment effect. However, this is biased if treatment assignment was confounded.
The deficiency \(\delta\) we calculated earlier tells us how much we can trust this estimate.
policy_regret_bound() converts this information gap into decision-relevant regret bounds on the same scale as the utility (here: months of RMST up to the horizon).# Calculate policy regret bounds (transfer penalty + minimax floor)
# Utility range is [0, horizon] (months of survival).
# The bound is tied to the previously estimated IPTW deficiency proxy.
bound <- policy_regret_bound(results_hct, utility_range = c(0, horizon), method = "iptw")
print(bound)Clinical Interpretation (Transfer Penalty and Minimax Safety Floor):
The bounds answer: “How much regret could confounding add to my decision, and what regret is unavoidable without stronger assumptions?”
Formulas:
where \(M\) is the utility range (here, the RMST horizon in months) and \(\delta\) is the deficiency proxy.
What It Means:
Clinical Decision Framework (use Transfer Penalty):
If Transfer Penalty < 1 month:
[PASS] Proceed with confidence – Observational evidence is decision-relevant at this horizon.
If 1 \(\leq\) Transfer Penalty < 3 months:
[CAUTION] Proceed with acknowledgment – Often acceptable, but document the bound: “Based on observational evidence with transfer penalty of X months…”
Example:
If \(\delta\) = 0.02 (excellent IPTW performance) and \(M\) = 24 months:
Transfer Penalty = \(24 \times 0.02\) = 0.48 months
Minimax Safety Floor = \((24/2) \times 0.02\) = 0.24 months
Interpretation: If the observed RMST difference is 4 months, then even the conservative transfer-penalty ratio is roughly \(4/0.48 \approx 8:1\), suggesting the decision is robust at this horizon (subject to the modeling assumptions).
Reporting to Clinicians:
“Our observational analysis suggests a 4-month survival advantage for myeloablative conditioning (95% CI: [2.1, 5.9]). After adjustment, the deficiency proxy is \(\delta \approx 0.02\) at a 24-month horizon, implying a transfer penalty of about 0.5 months and a minimax safety floor of about 0.25 months. These bounds are well below the observed benefit, supporting the recommendation, though an RCT would eliminate this residual uncertainty.”
The key insight:
The deficiency \(\delta\) and regret bounds work together:
Both should be reported to provide complete transparency about observational evidence quality.