Notes on Causal Survival Forest 🛟🌲
In this post, I provide summary notes on the paper “Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests” by Cui et al. (2023).
Motivation
How to estimate heterogeneous treatment effects with right-censored data?-
Heterogeneous treatment effect (HTE) estimation plays a central role in data-driven personalization
-
Existing methods often can’t handle censored survival outcomes, common in medical/business applications
Causal Survival Forests (CSF)
To address this challenge, the paper proposes causal survival forests (CSF)

-
An adaptation of the causal forest algorithm of Athey et al. (2019)
-
It adjusts for censoring using doubly robust estimating equations developed in the survival analysis literature
Advantages
-
Robust, computationally tractable, and outperforms available baselines in our experiments
-
Good statistical properties – UCAN
Statistical Setting
Assume i.i.d tuples
denote covariates is the survival time for th unit is censoring time (the time at which th unit gets censored) denotes treatment assignment
Using potential outcome framework, posit potential outcomes
where
-
-
for the restricted mean survival time (RMST); here is some chosen maximum considered time -
for the survival probability
To estimate
-
censored survival time:
-
non-censoring indictor:
Based on Assumption 1 (see later), we define the effective non-censoring indictor as follows:
Note that, for the eqn (2), everything is observed. We can regard an observation with
Assumptions
In order to identify treatment effects, we need to rely on two sets of assumptions.
-
Assumption 2-4 enable us to identify the causal effect of
on without censoring -
Assumption 5-6 is to guarantee that censoring due to
does not break identification results
Assumption 1 (Finite Horizon)
Assumption 2 (Potential Outcomes)
Assumption 4 (Overlap)
Propensity score
Assumption 5 (Ignorable censoring)
Censoring is independent of survival time conditionally on treatment and covariates,
Assumption 6 (Positivity)
Causal Forests Without Censoring
How does causal forest work?
Essentially, we are running a “forest”-localized version of Robinson’s regression
where
Using notations in previous section, we estimate
where,
is the orthogonal complete score function (shown as up-script
-
-
-
and are estimates derived via cross-fitting
Adjusting for Censoring via Weighting
In the presence of censoring, the
Simply ignoring censoring and building models on with complete observations (i.e.
Simple Censoring Adjustment via IPCW
Define the conditional survival function for censoring process as
-
the LHS is the conditional probability of observing a complete observations (i.e.
) -
the RHS is the conditional probability that censoring time is greater than survival time
-
Does the above
look like propensity score function?
The main idea of IPCW estimation is to only consider complete cases, but up-weight all complete observations by
As a result, IPCW estimators succeed in eliminating censoring bias.
With IPCW, we estimate
-
For eqn (cf), we sum over all observations; for eqn (IPCW), we only sum over complete observations
-
For eqn (IPCW), we add
as a part of weight
For more details on IPCW, please check:
-
Chapter 8 and 12 in the textbook Causal Inference: What If" (Hernán and Robins, 2020). In particular, “Ch 12.6 Censoring and missing data” is very helpful.
-
Chapter 21 “Treatment Heterogeneity with Survival Outcomes” in the textbook Handbook of Matching and Weighting Adjustments for Causal Inference (Zubizarreta et al., 2023)
A Doubly Robust Correction
Two limitations of IPCW approach:
-
Only use complete observations; throw away all observations with
, and this may hurt efficiency -
IPCW-type methods are generally not robust to estimation errors; Neyman orthogonality condition does not hold (Chernozhukov et al. 2018)
CSF Method
CSF method does not rely on IPCW. Instead, it relies on a more robust approach to making estimating equations robust to censoring.
Recall the simplest case (without censoring), we have,
where the score function is,

the conditional expectation of the transformed survival time is defined as:
and the associated conditional hazard function is defined as:
The short answer is that that functional form emerges for the math (i.e., the desire for a doubly robust adjustment); and, unlike with the basic AIPW formula, it’s not as immediately intuitive.1
💡 KEY Points
We should think about the Neyman-orthogonal property. In summary, CSF alleviates the drawbacks of IPCW so by taking the (complete-data) causal forest estimating equation
-
censoring process:
-
survival process:
The upshot of this “orthogonal” estimating equation is that it will be consistent if either the survival or censoring process is correctly specified, which is very beneficial when we want to estimate these by modern ML tools, such as random survival forests.
For more details, Rubin & van der Laan (2007) and the chapter on RCTs with time-to-event data in Targeted Learning (2011) gives some more digestible details on doubly robust estimation with survival data.3
References
Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” The Annals of Statistics 47 (2): 1148–78. https://doi.org/10.1214/18-AOS1709.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1): C1–68. https://doi.org/10.1111/ectj.12097.
Cui, Yifan, Michael R Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. 2023. “Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85 (2): 179–211. https://doi.org/10.1093/jrsssb/qkac001.
Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
Zubizarreta, J. R., Stuart, E. A., Small, D. S., & Rosenbaum, P. R. (2023). Handbook of Matching and Weighting Adjustments for Causal Inference. CRC Press.
-
This was suggested by Professor Wager in an email conversation. ↩︎
-
Check more on grf tutorial: Causal forest with time-to-event data ↩︎
-
Suggested by Erik Sverdrup. Many thanks! ↩︎