A Review of Causal Inference Methods – Models and Comparison

Jan 12, 2024 13 min read causal inference, econometrics, quantitive-marketing

last update: 06-05-2025

Introduction

Establishing causation from observational data remains one of the most challenging tasks in empirical research. This guide examines twelve major causal inference methods, from traditional econometric approaches (DiD, IV, RD) to modern machine learning techniques (Causal Forests, DML, Auto-DML).

Each method targets different identification challenges and data structures. Understanding their assumptions, strengths, and limitations is crucial for making credible causal claims in marketing research.

1. Difference-in-Differences (DiD)

Main Idea: DiD estimates causal effects by comparing the change in outcomes over time between a treatment and control group, assuming parallel trends in the absence of treatment.

Data Requirements:

Panel data or repeated cross-sectional data with observations before and after treatment for treated and control groups.
Outcome $Y_{i t}$ for unit $i$ at time $t$ , treatment indicator $D_{i}$ , and time indicator $T_{t}$ .

Key Equation: $Y_{i t} = α + β D_{i} + γ T_{t} + δ (D_{i} \cdot T_{t}) + ϵ_{i t}$ where $δ$ is the DiD estimator of the average treatment effect on the treated (ATT).

Assumptions:

Parallel Trends: $E [Δ Y_{i t} (0) | D_{i} = 1] = E [Δ Y_{i t} (0) | D_{i} = 0]$

In expectation, the change in control potential outcomes among the treatment group is the same as the change in the control potential outcomes among the control group.
No Anticipation: Treatment does not affect pre-treatment outcomes.
Stable Unit Treatment Value Assumption (SUTVA): No spillovers between units.
Common Support: Treated and control units are comparable.

Advantages:

Controls for time-invariant unobserved confounders.
Intuitive for policy evaluation.
Robust to baseline differences.

Limitations:

Sensitive to violations of parallel trends.
Requires sufficient pre- and post-treatment data.
Cannot handle time-varying confounders unless extended.
Assumes homogeneous treatment effects unless modified.

References:

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press.
Bertrand, M., et al. (2004). “How Much Should We Trust Differences-in-Differences Estimates?” Quarterly Journal of Economics.

2. Regression Discontinuity (RD)

Main Idea: RD exploits a discontinuity in treatment assignment based on a continuous running variable to estimate local treatment effects at the cutoff.

Data Requirements:

Cross-sectional or panel data with a continuous running variable $X$ , outcome $Y$ , and treatment indicator $D$ .

Key Equation: $Y_{i} = α + β D_{i} + f (X_{i} - c) + ϵ_{i}, D_{i} = 1 {X_{i} \geq c}$ where $β$ is the local average treatment effect (LATE) at cutoff $c$ , and $f (\cdot)$ is a smooth function (e.g., polynomial or local linear regression).

Assumptions:

Continuity: Potential outcomes $E [Y (0) | X]$ and $E [Y (1) | X]$ are continuous at $c$ .
No Manipulation: Units cannot precisely manipulate $X$ around the cutoff.
Local Randomization: Units just above/below the cutoff are comparable.

Advantages:

Strong internal validity near the cutoff.
Clear quasi-experimental design.
Flexible with non-linear models.

Limitations:

Estimates LATE only at the cutoff.
Sensitive to bandwidth choice and functional form.
Requires sufficient data near the cutoff.
Vulnerable to manipulation of $X$ .

References:

Imbens, G. W., & Lemieux, T. (2008). “Regression Discontinuity Designs: A Guide to Practice.” Journal of Econometrics.
Lee, D. S., & Lemieux, T. (2010). “Regression Discontinuity Designs in Economics.” Journal of Economic Literature.

3. Instrumental Variables (IV)

Main Idea: IV uses an exogenous instrument $Z$ to isolate variation in the endogenous treatment $D$ that is uncorrelated with unobserved confounders.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , endogenous treatment $D$ , and instrument $Z$ .

Key Equations: First stage: $D_{i} = π_{0} + π_{1} Z_{i} + ν_{i}$ Second stage: $Y_{i} = α + β D_{i} + ϵ_{i}$ where $β$ is the LATE for compliers, estimated via two-stage least squares (2SLS).

Assumptions:

Relevance: $π_{1} \neq 0$ .
Exogeneity: $Z ⊥ ϵ$ (exclusion restriction and no confounding).
Monotonicity: $Z$ affects $D$ in one direction (for LATE).
SUTVA: No spillovers.

Advantages:

Handles endogeneity from unobserved confounders.
Flexible for various treatment types.
Well-established framework.

Limitations:

Finding valid instruments is challenging.
Estimates LATE, not ATE.
Sensitive to exclusion restriction violations.
Requires large samples for precision.

References:

Angrist, J. D., & Krueger, A. B. (2001). “Instrumental Variables and the Search for Identification.” Journal of Economic Perspectives.
Imbens, G. W., & Angrist, J. D. (1994). “Identification and Estimation of Local Average Treatment Effects.” Econometrica.

4. Instrument-Free Approach (Latent IV and Gaussian Copula)

Main Idea: Instrument-free methods model the joint distribution of treatment and outcome to account for endogeneity without explicit instruments, using latent variables or copulas.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ and endogenous treatment $D$ .
No explicit instrument required.

Key Equations:

Latent IV: $D_{i} = g (U_{i}, ϵ_{D, i}), Y_{i} = h (D_{i}, U_{i}, ϵ_{Y, i})$ where $U_{i}$ is a latent confounder.
Gaussian Copula: $P (Y, D) = C (F_{Y} (Y), F_{D} (D); θ)$ where $C$ is the copula function, and $θ$ captures dependence.

Assumptions:

Latent IV: Correct specification of the latent variable model and distributional assumptions.
Gaussian Copula: Joint normality or specific copula structure.
Identification: Sufficient variation to identify parameters.

Advantages:

No need for external instruments.
Flexible for complex dependence structures.
Handles continuous or discrete treatments.

Limitations:

Relies on strong distributional assumptions.
Computationally intensive.
Limited empirical validation.
Sensitive to model misspecification.

References:

Park, S., & Gupta, S. (2012). “Handling Endogenous Regressors by Joint Estimation Using Copulas.” Marketing Science.
Chesher, A. (2010). “Instrumental Variable Models for Discrete Outcomes.” Econometrica.

5. Control Function Approach

Main Idea: The control function approach corrects for endogeneity by modeling the relationship between the endogenous treatment and unobserved confounders, using residuals from a first-stage regression.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , endogenous treatment $D$ , and instruments $Z$ .

Key Equations: First stage: $D_{i} = π_{0} + π_{1} Z_{i} + ν_{i}$ Second stage: $Y_{i} = α + β D_{i} + γ {\hat{ν}}_{i} + ϵ_{i}$ where ${\hat{ν}}_{i}$ is the first-stage residual, and $β$ is the causal effect.

Assumptions:

Valid Instruments: $Z$ satisfies relevance and exogeneity.
Correct Specification: First-stage model captures $D$ - $Z$ relationship.
Additivity: Unobserved confounders enter additively.

Advantages:

Explicitly models endogeneity.
Flexible for non-linear models.
Can be combined with IV.

Limitations:

Requires valid instruments.
Sensitive to first-stage misspecification.
Computationally complex in non-linear settings.
Weak instruments cause bias.

References:

Heckman, J. J. (1979). “Sample Selection Bias as a Specification Error.” Econometrica.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press.

6. Propensity Score Matching (PSM)

Main Idea: PSM matches treated and control units with similar propensity scores to balance observed confounders.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , treatment $D$ , and covariates $X$ .
Propensity score: $P (D = 1 | X)$ .

Key Equation: ATT: $τ_{A T T} = E [Y_{i} (1) - Y_{i} (0) | D_{i} = 1] \approx \frac{1}{N_{1}} \sum_{i : D_{i} = 1} (Y_{i} - Y_{j (i)})$ where $j (i)$ is the matched control unit.

Assumptions:

Conditional Independence: $Y (0), Y (1) ⊥ D | X$ .
Common Support: $0 < P (D = 1 | X) < 1$ .
SUTVA: No spillovers.

Advantages:

Balances observed covariates.
Flexible matching methods.
Reduces model dependence.

Limitations:

Sensitive to unobserved confounders.
Requires good overlap.
Matching discards data, reducing efficiency.
Choice of matching algorithm affects results.

References:

Rosenbaum, P. R., & Rubin, D. B. (1983). “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika.
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences.

7. Propensity Score Weighting (PSW)

Main Idea: PSW uses propensity scores to weight observations, creating a pseudo-population where treatment is independent of covariates.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , treatment $D$ , and covariates $X$ .
Propensity score: $P (D = 1 | X)$ .

Key Equation: IPTW for ATE: $τ_{A T E} = E [\frac{D_{i} Y_{i}}{P (D_{i} = 1 | X_{i})} - \frac{(1 - D_{i}) Y_{i}}{1 - P (D_{i} = 1 | X_{i})}]$

Assumptions:

Conditional Independence: $Y (0), Y (1) ⊥ D | X$ .
Common Support: $0 < P (D = 1 | X) < 1$ .
Correct Specification: Propensity score model is correct.

Advantages:

Uses all data, improving efficiency.
Flexible for ATE or ATT.
Handles continuous treatments.

Limitations:

Sensitive to misspecified propensity scores.
Extreme weights cause instability.
Requires strong overlap.
Vulnerable to unobserved confounders.

References:

Hirano, K., et al. (2003). “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score.” Econometrica.
Imbens, G. W. (2004). “Nonparametric Estimation of Average Treatment Effects Under Exogeneity.” Review of Economics and Statistics.

8. Regression Adjustment

Main Idea: Regression adjustment models the outcome as a function of treatment and covariates, assuming conditional independence.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , treatment $D$ , and covariates $X$ .

Key Equation: $Y_{i} = α + β D_{i} + γ X_{i} + ϵ_{i}$ where $β$ is the treatment effect.

Assumptions:

Conditional Independence: $Y (0), Y (1) ⊥ D | X$ .
Correct Specification: The regression model is correct.
SUTVA: No spillovers.

Advantages:

Simple and flexible.
Efficient with large samples.
Can incorporate complex functional forms.

Limitations:

Sensitive to model misspecification.
Assumes no unobserved confounders.
Poor performance with high-dimensional covariates.

References:

Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics.
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences.

9. Synthetic Control

Main Idea: Synthetic control constructs a weighted combination of control units to mimic the pre-treatment trajectory of a treated unit.

Data Requirements:

Panel data with outcome $Y$ , treatment $D$ , and covariates $X$ for treated and control units over time.

Key Equation: $Y_{1 t} (0) \approx \sum_{j = 2}^{J} w_{j} Y_{j t}, s.t. \sum_{j = 2}^{J} w_{j} = 1, w_{j} \geq 0$ Treatment effect: $τ_{t} = Y_{1 t} - \sum_{j = 2}^{J} w_{j} Y_{j t}$

Assumptions:

No Interference: Treated unit’s outcome is unaffected by controls.
Convex Hull: Pre-treatment outcomes can be approximated by controls.
Stable Weights: Weights remain valid post-treatment.

Advantages:

Ideal for single-unit interventions.
Transparent counterfactual construction.
Robust to time-varying confounders.

Limitations:

Requires long pre-treatment periods.
Limited to few treated units.
Sensitive to poor pre-treatment fit.
No formal inference in basic form.

References:

Abadie, A., et al. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association.
Abadie, A. (2021). “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.” Journal of Economic Literature.

10. Causal Forests

Main Idea: Causal forests use random forests to estimate heterogeneous treatment effects (CATE) as a function of covariates.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , treatment $D$ , and covariates $X$ .

Key Equation: $τ (X_{i}) = E [Y_{i} (1) - Y_{i} (0) | X_{i}]$ Estimated via trees splitting on $X$ to maximize treatment effect heterogeneity.

Assumptions:

Unconfoundedness: $Y (0), Y (1) ⊥ D | X$ .
Overlap: $0 < P (D = 1 | X) < 1$ .
Honest Splitting: Separate data for splitting and estimation.
SUTVA: No spillovers.

Advantages:

Handles high-dimensional covariates.
Estimates heterogeneous effects.
Robust to non-linear relationships.

Limitations:

Requires large samples.
Sensitive to unobserved confounders.
Computationally intensive.
Interpretability challenges.

References:

Wager, S., & Athey, S. (2018). “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.” Journal of the American Statistical Association.
Athey, S., et al. (2019). “Generalized Random Forests.” Annals of Statistics.

11. Double/Debiased Machine Learning (DML)

Main Idea: DML combines machine learning with orthogonalized estimating equations to estimate causal effects robustly in high-dimensional settings.

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , treatment $D$ , and high-dimensional covariates $X$ .

Key Equation: $E [(Y_{i} - g_{0} (X_{i}) - θ_{0} (D_{i} - m_{0} (X_{i}))) (D_{i} - m_{0} (X_{i}))] = 0$ where $g_{0} (X_{i}) = E [Y_{i} | X_{i}]$ , $m_{0} (X_{i}) = E [D_{i} | X_{i}]$ , and $θ_{0}$ is the treatment effect.

Assumptions:

Unconfoundedness: $Y (0), Y (1) ⊥ D | X$ .
Overlap: $0 < P (D = 1 | X) < 1$ .
Regularity Conditions: ML estimators converge sufficiently fast.
SUTVA: No spillovers.

Advantages:

Handles high-dimensional covariates.
Robust to model misspecification.
Provides valid inference.
Flexible for ATE or CATE.

Limitations:

Requires large samples.
Sensitive to unobserved confounders.
Computationally intensive.
Depends on ML algorithm quality.

References:

Chernozhukov, V., et al. (2018). “Double/Debiased Machine Learning for Treatment and Structural Parameters.” Econometrics Journal.
Athey, S., & Wager, S. (2021). “Policy Learning with Observational Data.” Econometrica.

12. Auto-Debiased Machine Learning (Auto-DML)

Main Idea: Auto-DML extends DML by automating the estimation of the Riesz representer, a weighting function that ensures efficient and unbiased causal estimation, using advanced machine learning (e.g., neural networks).

Data Requirements:

Cross-sectional or panel data with outcome $Y$ , treatment $D$ , and high-dimensional covariates $X$ .

Key Idea:

Chernozhukov, Newey, and Singh (2022) leverage the Riesz representer to correct for bias.

Data is $Z = {Y, D, X}$
Let $g$ be outcome regression

g (X) := E [Y ∣ X],

g (D, X) = E [Y ∣ D, X],

Suppose the moment function $m ()$ is linear in $g$
$θ - E [m (Z; g)] = 0,$ $θ = E [m (Z; g)] := E [g (1, X) - g (0, X)]$
Then debiased version of the moment is of the form: $θ - E [m (Z; g) + a (X) \cdot (Y - g (X))] = 0,$ where $a (X)$ is the Riesz Representer of the linear functional $L (g) := E [m (Z; g)]$ . The existence of $a (X)$ is guaranteed by the Riesz representation theorem.

Assumptions:

Unconfoundedness: $Y (0), Y (1) ⊥ D | X$ .
Overlap: $0 < P (D = 1 | X) < 1$ .
Regularity Conditions: ML estimators converge appropriately.
SUTVA: No spillovers.
Correct Riesz Representer: Automated model captures $α (X_{i})$ .

Advantages:

Automates Riesz representer estimation.
Handles complex, high-dimensional data.
Robust to nuisance function misspecification.
Flexible for ATE or CATE.

Limitations:

Computationally intensive.
Black-box nature reduces interpretability.
Sensitive to unobserved confounders.
Requires large samples.
Less developed theoretical guarantees.

References:

Chernozhukov, V., et al. (2022). “Automatic Debiased Machine Learning via Neural Riesz Regression.” Working Paper.
Farrell, M. H., et al. (2021). “Deep Neural Networks for Estimation and Inference.” Econometrica.

Comparison Summary

Method	Data Type	Key Assumption	Strength	Weakness
DiD	Panel	Parallel trends	Controls time-invariant confounders	Sensitive to parallel trends violation
RD	Cross-sectional/Panel	Continuity at cutoff	Strong internal validity	Local effect, bandwidth sensitivity
IV	Cross-sectional/Panel	Valid instruments	Handles endogeneity	Hard to find valid instruments
Instrument-Free	Cross-sectional/Panel	Distributional assumptions	No instruments needed	Strong distributional assumptions
Control Function	Cross-sectional/Panel	Valid instruments	Explicit endogeneity correction	Misspecification sensitivity
PSM	Cross-sectional/Panel	Unconfoundedness	Balances observed covariates	Sensitive to unobserved confounders
PSW	Cross-sectional/Panel	Unconfoundedness	Uses all data, flexible	Extreme weight instability
Regression Adjustment	Cross-sectional/Panel	Unconfoundedness	Simple, flexible forms	Misspecification sensitivity
Synthetic Control	Panel	Convex hull, no interference	Ideal for single-unit studies	Limited to few treated units
Causal Forests	Cross-sectional/Panel	Unconfoundedness	Heterogeneous effects, high-dimensional	Large sample requirement
DML	Cross-sectional/Panel	Unconfoundedness	Robust high-dimensional inference	Computationally intensive
Auto-DML	Cross-sectional/Panel	Unconfoundedness	Automates Riesz representer, flexible	Black-box, computationally intensive

Final Notes

The choice of method depends on data structure, identification strategy, confounding, and the causal parameter of interest (ATE, ATT, LATE, CATE). DiD and synthetic control suit panel data with policy interventions, while RD is ideal for cutoffs. IV and control function address endogeneity but require instruments. PSM, PSW, and regression adjustment rely on unconfoundedness, vulnerable to unobserved confounders. Causal forests, DML, and Auto-DML leverage machine learning for high-dimensional settings, with Auto-DML automating the Riesz representer for efficiency. Instrument-free approaches are promising but rely on strong assumptions. Combining methods or using robustness checks is recommended.

Conclusion

The choice of causal inference method depends on your data structure, identification strategy, and research context. Traditional methods like DiD and IV remain powerful for natural experiments, while machine learning approaches (DML, Causal Forests) excel in high-dimensional settings.

Key principles: Match the method to your identification challenge, validate assumptions where possible, and consider triangulation across multiple approaches. Remember that sophisticated methods cannot overcome fundamental identification problems—careful research design remains paramount for credible causal inference.

Reference

Causal Inference with Quasi-Experimental Data. (2024, November 20). American Marketing Association. https://www.ama.org/marketing-news/causal-inference-with-quasi-experimental-data/

causal inference

Chen Xing

Founder & Data Scientist

Enjoy Life & Enjoy Work!

A Review of Causal Inference Methods – Models and Comparison

Introduction

1. Difference-in-Differences (DiD)

2. Regression Discontinuity (RD)

3. Instrumental Variables (IV)

4. Instrument-Free Approach (Latent IV and Gaussian Copula)

5. Control Function Approach

6. Propensity Score Matching (PSM)

7. Propensity Score Weighting (PSW)

8. Regression Adjustment

9. Synthetic Control

10. Causal Forests

11. Double/Debiased Machine Learning (DML)

12. Auto-Debiased Machine Learning (Auto-DML)

Comparison Summary

Final Notes

Conclusion

Reference

Chen Xing

Founder & Data Scientist

Related