Intuition for Doubly Robust Estimator

Introduction

To estimate ATE, we can either use outcome regression or inverse propensity weighting (IPW). While each approach has merits, combining them offers significant advantage – double robustness. In this post, I summarize the intuition for doubly robust estimator from Professor Ding’s awesome textbook (Ding, 2024), and connects this framework to debiased machine learning (DML) through Riesz representation theory. By understanding these connections, we gain insights into how modern causal inference methods effectively correct for bias in treatment effect estimation.

Two characterizations of the ATE

Assumption 1 (Basic setting).
SUTVA, unconfoundedness and overlap

Let Y(0),Y(1) be potential outcomes and D be a binary treatment variable. Consider ATE.

First, we can use the outcome regression,

τ=E{μ1(X)μ2(X)}, where μ1=E{Y(1)X}=E{YD=1,X}, μ0=E{Y(0)X}=E{YD=0,X}

Second, we can use the inverse propensity score weighting (IPW) approach,

τ=E{DYe(X)}E{(1D)Y1e(X)},

where e(X)=P(D=1X) is the propensity score.

It completely ignores the outcome model. However, if there exist covariates X that are predictive of Y, then even when the outcome model is misspecified, including it can reduce the variance compared to using IPW alone.

Key Insights

This motivates combining the two approaches to:

1. Reduce the variance of the IPW estimator

2. Reduce the bias of the outcome regression

Reducing the Variance

μ1=E{Y(1)}=E{Y(1)μ1(X,β1)}+E{μ1(X,β1)}.

Idea: View Yμ1(X,β1) as a “pseudo potential outcome”, then apply IPW to it:

μ1=E{Z{Yμ1(X,β1)}e(X)}+E{μ1(X,β1)}=E{Z{Yμ1(X,β1)}e(X)+μ1(X,β1)},

Similarly,

μ0=E{(1Z){Yμ0(X,β0)}1e(X)}+E{μ0(X,β0)}=E{(1Z){Yμ0(X,β0)}1e(X)+μ0(X,β0)},

Notice that, μ1μ2 gives us the AIPW estimator.

Reducing the Bias

μ1=E{Z{Yμ1(X,β1)}e(X)}+E{μ1(X,β1)}

Idea: We can view Yμ1(X,β1) as the regression residuals, from which we apply IPW to extract useful signals. Alternatively, we can view μ1(X,β1)Y as the bias, which we then use IPW to correct the bias.

Connecting to DDML: Riesz Representation for Bias Correction

In the generic debiased framework (Chernozhukov, Newey, and Singh 2022), we leverage the Riesz representer to correct for bias, similar to the approach described above. This method parallels our use of IPW to extract signals from residuals, but specifically employs the Riesz representer for bias correction.

  • Data is Z={Y,D,X}

  • Let g be outcome regression

g(X):=E[YX], g(D,X)=E[YD,X],
  • Suppose moment is of the form: for some moment m() that is linear in g
θE[m(Z;g)]=0, θ=E[m(Z;g)]:=E[g(1,X)g(0,X)]

Then debiased version of the moment is of the form:

θE[m(Z;g)+a(X)(Yg(X))]=0,

where

a(X) is the Riesz Representer of the linear functional L(g):=E[m(Z;g)]. The existence of a(X) is guaranteed by the Riesz representation theorem,

g:E[m(Z;g)]=E[a(X)g(X)].

As we consider the ATE, the Riesz Representer are just some “inverse propensity score terms”,

a(D,X):=Dp(X)(1D)1p(X),

Consider the expression

E[m(Z;g)+a(X)(Yg(X))]

The key intuition here is that Yg(X) represents the residual part, and we use the Riesz Representer a(X) to correct this bias/residual term.

Double robust estimation of ATT

How about the double robust estimator for ATT? Can we also use the same idea to derive it? Yes!

Assumption 2 ( "one-sided" unconfoundedness and overlap).
DY(0)X and e(X)<1.
Theorem 1.
Under the "one-sided" unconfoundedness and overlap assumption, (1)E{Y(0)D=1}=1eE{e(X)1e(X)(1D)Y} and τT=E(YD=1)E{e(X)e1D1e(X)Y}, where e=P(D=1) is the marginal probability of the treatment.

We also have a doubly robust estimator for E{Y(0)D=1} which combines the propensity score and the outcome models.

Theorem 2.
Define μ~0Tdr:=1eE[e(X,α)1e(X,α)(1D){Yμ0(X,β0)}+Dμ0(X,β0)], Under above Assumption, if either e(X,α)=e(X) or μ0(X,β0)=μ0(X), then μ~0Tdr=E{Y(0)D=1}.

How to come up with μ~0Tdr ? Exactly the same idea as before!

E{Y(0)D=1}=E{Y(0)μ0(X,β0)D=1}+E{μ0(X,β0)D=1}

Now, we can view Y(0)μ0(X,β0) as a “pseudo potential outcome” under the control and apply eqn (1) to weight it, then we can get the form of μ~0Tdr .

Reference

Ding, Peng (2024) A first course in causal inference. Chapman & Hall.

Chernozhukov, Victor, Whitney K. Newey, and Rahul Singh (2022), “Automatic Debiased Machine Learning of Causal and Structural Effects,” Econometrica, 90 (3), 967–1027.

Chen Xing
Chen Xing
Founder & Data Scientist

Enjoy Life & Enjoy Work!

Related