Calibrate Your Nuisances: A Simple Fix for Doubly Robust Inference
Doubly Robust Inference via Calibration
TL;DR
Van der Laan, Luedtke, and Carone introduce “calibrated debiased machine learning” (calibrated DML), a method that achieves doubly robust asymptotic normality for causal inference estimators by simply adding an isotonic regression calibration step to standard DML pipelines. The key innovation is that valid inference requires only one of two nuisance functions (outcome regression or propensity score) to be estimated well—the other can converge arbitrarily slowly or even inconsistently. This bridges a long-standing gap where consistency was doubly robust but inference was not, and the method can be implemented by adding just a few lines of code to existing workflows.
What is this paper about?
Doubly robust estimators like AIPW are popular for estimating average treatment effects because they remain consistent if either the outcome regression or propensity score is correctly specified. However, there’s a crucial asymmetry: while consistency requires only one nuisance to be correct, valid inference (asymptotic normality, correct confidence intervals) typically requires both nuisances to converge at sufficiently fast rates
Motivation
The authors ask: can we achieve doubly robust inference using a simple, general-purpose procedure that works with any machine learning estimator?
What do the authors do?
The authors develop calibrated DML, a two-step procedure that:
-
step 1: takes cross-fitted nuisance estimators from any machine learning algorithm
-
step 2: calibrates them using isotonic regression before constructing the debiased estimator
The calibration step ensures that nuisance estimates satisfy certain empirical orthogonality conditions that linearize the bias term in the doubly robust expansion. Specifically, they calibrate the outcome regression using squared error loss and the Riesz representer (inverse propensity weights for ATE) using a tailored Riesz loss.
The paper proves that this calibration is sufficient to achieve “doubly robust asymptotic linearity” (DRAL)—meaning the estimator is asymptotically normal whenever at least one nuisance converges at
Why is this important?
Applied researchers using flexible ML methods (random forests, neural networks, gradient boosting) for nuisance estimation face a dilemma: these methods provide consistency but may not converge fast enough to guarantee valid inference, especially in moderate dimensions. The standard “product rate” condition requiring both nuisances to converge at
The practical benefits are substantial: better finite-sample coverage (e.g., improving from 32% to 90% coverage in ACIC-2017 simulations), reduced bias, and a method that integrates into existing pipelines with minimal code changes.
The theoretical contribution is equally significant—it establishes a novel connection between prediction calibration and causal inference validity, showing that calibration of nuisances provides the “debiasing” needed for doubly robust inference.
Who should care?
Applied researchers estimating treatment effects with observational data using ML for nuisance estimation will find immediate practical value—this method provides insurance against nuisance misspecification without computational overhead.
Econometricians and biostatisticians working on semiparametric inference will appreciate the theoretical framework connecting calibration to doubly robust properties.
Methodologists developing new estimands can use this as a general recipe: the approach applies to any linear functional of the outcome regression (counterfactual means, ATE, partial covariance, survival outcomes under missingness), not just the ATE.
Researchers in policy evaluation, epidemiology, marketing, and tech who regularly use AIPW-style estimators should consider adopting calibrated DML as a default given its robustness benefits.
Do we have code?
Yes, the authors provide both R and Python implementations.
An R package calibratedDML and Python code are available on GitHub at https://github.com/Larsvanderlaan/calibratedDML. The paper includes complete code listings for calibrating inverse propensity weights and outcome regressions using xgboost with monotonicity constraints.
The implementation is straightforward: isotonic regression is performed using gradient-boosted trees with monotone_constraints=1 and a single boosting round, making it computationally efficient and easy to integrate into existing DML workflows.
In summary, this paper provides a remarkably practical solution to a longstanding theoretical problem. The insight that isotonic calibration—a standard tool from prediction—can unlock doubly robust inference is both elegant and immediately actionable. For anyone running AIPW or related estimators with ML nuisances, calibrating cross-fitted estimates before debiasing is now the obvious default: it costs almost nothing computationally and provides genuine protection against the scenario where one of your nuisance models is less reliable than you hoped.
Reference
van der Laan, Lars, Alex Luedtke, and Marco Carone (2024), “Doubly robust inference via calibration,” arXiv preprint arXiv:2411.02771.
An R package calibratedDML and Python code are available on GitHub at https://github.com/Larsvanderlaan/calibratedDML.