Adjust Censoring and Confounding Bias by IP Weighting
Introduction
In the context of causal inference, adjusting for both censoring bias and confounding bias is crucial, particularly in survival analysis where right-censored data often complicates causal effect estimation. Right censoring occurs when the outcome of interest (e.g., time to an event) is not observed within the study period for some subjects, making it challenging to correctly assess the causal effect of a treatment. Moreover, confounding bias arises when treatment assignment is influenced by pre-treatment covariates, potentially leading to biased estimates of treatment effects if not properly accounted for.
To address these issues, inverse probability weights (IPW) are commonly employed. IPW adjusts for confounding by reweighting observations based on their treatment probabilities given covariates. In addition, inverse probability of censoring weights (IPCW) further adjust for censoring by reweighting observations based on their probabilities of being uncensored. Together, these techniques allow us to estimate causal effects in the presence of both confounding and censoring biases, providing more accurate insights into the relationships between treatments and outcomes.
This post explains how to apply IP weights to adjust for these biases, with references to key resources.
-
Hernán and Robins’ Causal Inference: What If (2020), especially Chapter 8.5 and Chapter 12.6, covers censoring and missing data in detail
-
Zubizarreta et al.’s Handbook of Matching and Weighting Adjustments for Causal Inference (2023), Chapter 21, discusses treatment heterogeneity in survival outcomes. These resources form the basis for our explanation on using IP weights in survival analysis
IP Weighting
Imagine we want to estimate the causal effect of eco-friendly packaging on a product’s selling price. However, the selling price is right-censored — meaning we only observe the price for items that have been sold, while unsold items remain censored (i.e., their final selling price is unknown).
Setting:
-
Let $W \in \{0,1\}$ be a binary treatment variable (e.g., $W= 1$, if the item has a eco-friendly package; $W = 0$, otherwise)
-
Let $Y(w, s)$ be potential outcomes
-
Let $S = \textbf{1} \{C = 0\}$ be non-censoring indicator, where $C \in \{0,1\}$ is the censoring indicator
- e.g. $S = 1$, if the item is non-censored (in this case sold); $S = 0$, if the item is censored (in this case on-sale)
-
Let $X, L \in \mathcal{X}$ be two sets of covariates and $X \subset L$. Then there exist a function $f(\cdot)$ such that $X = f(L)$
Considering $Y(W = w, S = s)$, our analysis was necessarily restricted to uncensored individuals, i.e., those with $S = 1$, because those were the only ones with known values of the outcome $Y$. Thus, the casual effect of interest is:
$$ \tau = \mathbb{E}\{Y(w = 1, s = 1)\} - \mathbb{E}\{Y(w = 0, s= 1)\} $$Assumptions:
Recall the three identification conditions in Chapter 8.5 (Hernán and Robins, 2020, p. 113-114). Note that, the book uses $Y, A, C, L$ to represent the outcome, treatment, censoring indicator and covariates respectively.
“First, the average outcome in the uncensored individuals must equal the unobserved average outcome in the censored individuals with the same values of $A$ and $L$. This provision will be satisfied … if the variables in $A$ and $L$ are suffcient to block all backdoor paths between $C$ and $Y$.”
“Second, IP weighting requires that all conditional probabilities of being uncensored given A and the variables in L must be greater than zero.”
“The third condition is consistency, including sufficiently well-defined interventions. IP weighting is used to create a pseudo-population in which censoring $C$ has been abolished, and in which the effect of the treatment $A$ is the same as in the original population.”
Following above three identification conditions, we make the following assumptions. Denote the propensity score as $e(\cdot)$ and censoring score as $c(\cdot)$,
$$ \begin{alignat}{2} &\text{1. (Unconfoundedness):} \quad && \{Y(w= 0, s = 1), Y(w= 1, s = 1)\} \perp \!\!\! \perp W \mid X \\[1.5em] &\text{2. (Overlap):} \quad && 0 < e(X) := \mathbb{P}(W = 1 \mid X) < 1 \\[1.5em] &\text{3. (Ignorable censoring):} \quad && \{Y(w= 0, s = 1), Y(w= 1, s = 1)\} \perp \!\!\! \perp S \mid (W, L) \\[1.5em] &\text{4. (Positivity):} \quad && c(W, L) := \mathbb{P}(S = 1 \mid (W, L)) > 0 \end{alignat} $$Claim:
$$ \begin{align} \tau &= \mathbb{E}\left[Y(w= 1, s = 1) - Y(w= 0, s = 1)\right] \\[1em] &= \mathbb{E}\left[\frac{S W Y}{c(W, L) e(X)} - \frac{S (1-W) Y}{c(W, L) (1-e(X))}\right] \quad \tag{2} \end{align} $$Proof:
First, note that the eqn (2) is well defined because of A2 and A4.
I only give the proof for, $$\mathbb{E}\left[Y(w= 1, s = 1)\right] = \mathbb{E}\left[\frac{S W Y}{c(W, L) e(X)}\right],$$ as the $\mathbb{E}\left[Y(w= 0, s = 1)\right]$ part follows the same logic.
Recall that,
-
By our setting, $X = f(L)$ for some known function $f$
-
$c(W, L) := \mathbb{P}(S = 1 \mid W, L) = \mathbb{E}\left[S \mid W,L \right]$
Then we have,
\begin{aligned} \mathbb{E}\left[\frac{S W Y}{c(W, L) e(X)}\right] &= \mathbb{E}\left\{\frac{W}{e(f(L))c(W,L))} \cdot \mathbb{E}\left[SY(w, s=1) \,\middle|\, W,L \right] \right\} &&\text{(by LIE)} \\[1.5em] & = \mathbb{E}\left\{\frac{W}{e(f(L))c(W,L))} \cdot \mathbb{E}\left[S \,\middle|\, W,L \right] \cdot \mathbb{E}\left[Y(w, s=1) \,\middle|\, W,L \right] \right\} &&\text{(by A3)} \\[1.5em] & = \mathbb{E}\left\{\frac{W}{e(f(L))} \cdot \mathbb{E}\left[Y(w, s=1) \,\middle|\, W,L \right] \right\} \\[1.5em] & = \mathbb{E}\left\{\mathbb{E}\left[\frac{W}{e(f(L))} \cdot Y(w=1, s=1) \,\middle|\, W,L \right] \right\} &&\text{(by SUTVA)} \\[1.5em] & = \mathbb{E}\left[\frac{W}{e(X)} \cdot Y(w=1, s=1) \right] \\[1.5em] & = \mathbb{E}\left[Y(w=1, s=1) \right] &&\text{(by A1 and IPW)} \end{aligned}The last equation holds by the classical proof of the unbiasedness of inverse propensity score weighting (IPW) estimator. For more details, one can check Theorem 11.3 at page 158-159 in Peng Ding’s textbook (Ding, 2023).
Q.E.D.
Future Work
How to create a doubly robust estimator based on above setting? One related literature is the causal survival forest model (Cui et al., 2023), which estimates heterogeneous treatment effects in time-to-event setting and obtains doubly robustness property. But sometimes we are more interested in the downstream outcomes (e.g. selling price) after the event (e.g. being sold). This motivates us to create a new estimator that is doubly robust…
References
Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.
Zubizarreta, J. R., Stuart, E. A., Small, D. S., & Rosenbaum, P. R. (2023). Handbook of Matching and Weighting Adjustments for Causal Inference. CRC Press.
Cheng, C., Li, F., Thomas, L. E., & Li, F. (Frank). (2022). Addressing Extreme Propensity Scores in Estimating Counterfactual Survival Functions via the Overlap Weights. American Journal of Epidemiology, 191(6), 1140–1151. https://doi.org/10.1093/aje/kwac043
YouTube tutorial about IPCW: “Survival Analysis, Censoring and Time Scales”
Cui, Y., Kosorok, M. R., Sverdrup, E., Wager, S., & Zhu, R. (2023). Estimating heterogeneous treatment effects with right-censored data via causal survival forests. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2), 179–211. https://doi.org/10.1093/jrsssb/qkac001
Ding, Peng. “A First Course in Causal Inference.” arXiv, October 3, 2023. https://doi.org/10.48550/arXiv.2305.18793.