A Review of Causal Inference Methods – Models and Comparison
last update: 06-05-2025
Introduction
Establishing causation from observational data remains one of the most challenging tasks in empirical research. This guide examines twelve major causal inference methods, from traditional econometric approaches (DiD, IV, RD) to modern machine learning techniques (Causal Forests, DML, Auto-DML).
Each method targets different identification challenges and data structures. Understanding their assumptions, strengths, and limitations is crucial for making credible causal claims in marketing research.
1. Difference-in-Differences (DiD)
Main Idea: DiD estimates causal effects by comparing the change in outcomes over time between a treatment and control group, assuming parallel trends in the absence of treatment.
Data Requirements:
- Panel data or repeated cross-sectional data with observations before and after treatment for treated and control groups.
- Outcome
for unit at time , treatment indicator , and time indicator .
Key Equation:
Assumptions:
-
Parallel Trends:
In expectation, the change in control potential outcomes among the treatment group is the same as the change in the control potential outcomes among the control group.
-
No Anticipation: Treatment does not affect pre-treatment outcomes.
-
Stable Unit Treatment Value Assumption (SUTVA): No spillovers between units.
-
Common Support: Treated and control units are comparable.
Advantages:
- Controls for time-invariant unobserved confounders.
- Intuitive for policy evaluation.
- Robust to baseline differences.
Limitations:
- Sensitive to violations of parallel trends.
- Requires sufficient pre- and post-treatment data.
- Cannot handle time-varying confounders unless extended.
- Assumes homogeneous treatment effects unless modified.
References:
- Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press.
- Bertrand, M., et al. (2004). “How Much Should We Trust Differences-in-Differences Estimates?” Quarterly Journal of Economics.
2. Regression Discontinuity (RD)
Main Idea: RD exploits a discontinuity in treatment assignment based on a continuous running variable to estimate local treatment effects at the cutoff.
Data Requirements:
- Cross-sectional or panel data with a continuous running variable
, outcome , and treatment indicator .
Key Equation:
Assumptions:
- Continuity: Potential outcomes
and are continuous at . - No Manipulation: Units cannot precisely manipulate
around the cutoff. - Local Randomization: Units just above/below the cutoff are comparable.
Advantages:
- Strong internal validity near the cutoff.
- Clear quasi-experimental design.
- Flexible with non-linear models.
Limitations:
- Estimates LATE only at the cutoff.
- Sensitive to bandwidth choice and functional form.
- Requires sufficient data near the cutoff.
- Vulnerable to manipulation of
.
References:
- Imbens, G. W., & Lemieux, T. (2008). “Regression Discontinuity Designs: A Guide to Practice.” Journal of Econometrics.
- Lee, D. S., & Lemieux, T. (2010). “Regression Discontinuity Designs in Economics.” Journal of Economic Literature.
3. Instrumental Variables (IV)
Main Idea: IV uses an exogenous instrument
Data Requirements:
- Cross-sectional or panel data with outcome
, endogenous treatment , and instrument .
Key Equations: First stage:
Assumptions:
- Relevance:
. - Exogeneity:
(exclusion restriction and no confounding). - Monotonicity:
affects in one direction (for LATE). - SUTVA: No spillovers.
Advantages:
- Handles endogeneity from unobserved confounders.
- Flexible for various treatment types.
- Well-established framework.
Limitations:
- Finding valid instruments is challenging.
- Estimates LATE, not ATE.
- Sensitive to exclusion restriction violations.
- Requires large samples for precision.
References:
- Angrist, J. D., & Krueger, A. B. (2001). “Instrumental Variables and the Search for Identification.” Journal of Economic Perspectives.
- Imbens, G. W., & Angrist, J. D. (1994). “Identification and Estimation of Local Average Treatment Effects.” Econometrica.
4. Instrument-Free Approach (Latent IV and Gaussian Copula)
Main Idea: Instrument-free methods model the joint distribution of treatment and outcome to account for endogeneity without explicit instruments, using latent variables or copulas.
Data Requirements:
- Cross-sectional or panel data with outcome
and endogenous treatment . - No explicit instrument required.
Key Equations:
- Latent IV:
where is a latent confounder. - Gaussian Copula:
where is the copula function, and captures dependence.
Assumptions:
- Latent IV: Correct specification of the latent variable model and distributional assumptions.
- Gaussian Copula: Joint normality or specific copula structure.
- Identification: Sufficient variation to identify parameters.
Advantages:
- No need for external instruments.
- Flexible for complex dependence structures.
- Handles continuous or discrete treatments.
Limitations:
- Relies on strong distributional assumptions.
- Computationally intensive.
- Limited empirical validation.
- Sensitive to model misspecification.
References:
- Park, S., & Gupta, S. (2012). “Handling Endogenous Regressors by Joint Estimation Using Copulas.” Marketing Science.
- Chesher, A. (2010). “Instrumental Variable Models for Discrete Outcomes.” Econometrica.
5. Control Function Approach
Main Idea: The control function approach corrects for endogeneity by modeling the relationship between the endogenous treatment and unobserved confounders, using residuals from a first-stage regression.
Data Requirements:
- Cross-sectional or panel data with outcome
, endogenous treatment , and instruments .
Key Equations: First stage:
Assumptions:
- Valid Instruments:
satisfies relevance and exogeneity. - Correct Specification: First-stage model captures
- relationship. - Additivity: Unobserved confounders enter additively.
Advantages:
- Explicitly models endogeneity.
- Flexible for non-linear models.
- Can be combined with IV.
Limitations:
- Requires valid instruments.
- Sensitive to first-stage misspecification.
- Computationally complex in non-linear settings.
- Weak instruments cause bias.
References:
- Heckman, J. J. (1979). “Sample Selection Bias as a Specification Error.” Econometrica.
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press.
6. Propensity Score Matching (PSM)
Main Idea: PSM matches treated and control units with similar propensity scores to balance observed confounders.
Data Requirements:
- Cross-sectional or panel data with outcome
, treatment , and covariates . - Propensity score:
.
Key Equation: ATT:
Assumptions:
- Conditional Independence:
. - Common Support:
. - SUTVA: No spillovers.
Advantages:
- Balances observed covariates.
- Flexible matching methods.
- Reduces model dependence.
Limitations:
- Sensitive to unobserved confounders.
- Requires good overlap.
- Matching discards data, reducing efficiency.
- Choice of matching algorithm affects results.
References:
- Rosenbaum, P. R., & Rubin, D. B. (1983). “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika.
- Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences.
7. Propensity Score Weighting (PSW)
Main Idea: PSW uses propensity scores to weight observations, creating a pseudo-population where treatment is independent of covariates.
Data Requirements:
- Cross-sectional or panel data with outcome
, treatment , and covariates . - Propensity score:
.
Key Equation: IPTW for ATE:
Assumptions:
- Conditional Independence:
. - Common Support:
. - Correct Specification: Propensity score model is correct.
Advantages:
- Uses all data, improving efficiency.
- Flexible for ATE or ATT.
- Handles continuous treatments.
Limitations:
- Sensitive to misspecified propensity scores.
- Extreme weights cause instability.
- Requires strong overlap.
- Vulnerable to unobserved confounders.
References:
- Hirano, K., et al. (2003). “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score.” Econometrica.
- Imbens, G. W. (2004). “Nonparametric Estimation of Average Treatment Effects Under Exogeneity.” Review of Economics and Statistics.
8. Regression Adjustment
Main Idea: Regression adjustment models the outcome as a function of treatment and covariates, assuming conditional independence.
Data Requirements:
- Cross-sectional or panel data with outcome
, treatment , and covariates .
Key Equation:
Assumptions:
- Conditional Independence:
. - Correct Specification: The regression model is correct.
- SUTVA: No spillovers.
Advantages:
- Simple and flexible.
- Efficient with large samples.
- Can incorporate complex functional forms.
Limitations:
- Sensitive to model misspecification.
- Assumes no unobserved confounders.
- Poor performance with high-dimensional covariates.
References:
- Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics.
- Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences.
9. Synthetic Control
Main Idea: Synthetic control constructs a weighted combination of control units to mimic the pre-treatment trajectory of a treated unit.
Data Requirements:
- Panel data with outcome
, treatment , and covariates for treated and control units over time.
Key Equation:
Assumptions:
- No Interference: Treated unit’s outcome is unaffected by controls.
- Convex Hull: Pre-treatment outcomes can be approximated by controls.
- Stable Weights: Weights remain valid post-treatment.
Advantages:
- Ideal for single-unit interventions.
- Transparent counterfactual construction.
- Robust to time-varying confounders.
Limitations:
- Requires long pre-treatment periods.
- Limited to few treated units.
- Sensitive to poor pre-treatment fit.
- No formal inference in basic form.
References:
- Abadie, A., et al. (2010). “Synthetic Control Methods for Comparative Case Studies.” Journal of the American Statistical Association.
- Abadie, A. (2021). “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.” Journal of Economic Literature.
10. Causal Forests
Main Idea: Causal forests use random forests to estimate heterogeneous treatment effects (CATE) as a function of covariates.
Data Requirements:
- Cross-sectional or panel data with outcome
, treatment , and covariates .
Key Equation:
Assumptions:
- Unconfoundedness:
. - Overlap:
. - Honest Splitting: Separate data for splitting and estimation.
- SUTVA: No spillovers.
Advantages:
- Handles high-dimensional covariates.
- Estimates heterogeneous effects.
- Robust to non-linear relationships.
Limitations:
- Requires large samples.
- Sensitive to unobserved confounders.
- Computationally intensive.
- Interpretability challenges.
References:
- Wager, S., & Athey, S. (2018). “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.” Journal of the American Statistical Association.
- Athey, S., et al. (2019). “Generalized Random Forests.” Annals of Statistics.
11. Double/Debiased Machine Learning (DML)
Main Idea: DML combines machine learning with orthogonalized estimating equations to estimate causal effects robustly in high-dimensional settings.
Data Requirements:
- Cross-sectional or panel data with outcome
, treatment , and high-dimensional covariates .
Key Equation:
Assumptions:
- Unconfoundedness:
. - Overlap:
. - Regularity Conditions: ML estimators converge sufficiently fast.
- SUTVA: No spillovers.
Advantages:
- Handles high-dimensional covariates.
- Robust to model misspecification.
- Provides valid inference.
- Flexible for ATE or CATE.
Limitations:
- Requires large samples.
- Sensitive to unobserved confounders.
- Computationally intensive.
- Depends on ML algorithm quality.
References:
- Chernozhukov, V., et al. (2018). “Double/Debiased Machine Learning for Treatment and Structural Parameters.” Econometrics Journal.
- Athey, S., & Wager, S. (2021). “Policy Learning with Observational Data.” Econometrica.
12. Auto-Debiased Machine Learning (Auto-DML)
Main Idea: Auto-DML extends DML by automating the estimation of the Riesz representer, a weighting function that ensures efficient and unbiased causal estimation, using advanced machine learning (e.g., neural networks).
Data Requirements:
- Cross-sectional or panel data with outcome
, treatment , and high-dimensional covariates .
Key Idea:
Chernozhukov, Newey, and Singh (2022) leverage the Riesz representer to correct for bias.
-
Data is
-
Let
be outcome regression
-
Suppose the moment function
is linear in -
Then debiased version of the moment is of the form:
where is the Riesz Representer of the linear functional . The existence of is guaranteed by the Riesz representation theorem.
Assumptions:
- Unconfoundedness:
. - Overlap:
. - Regularity Conditions: ML estimators converge appropriately.
- SUTVA: No spillovers.
- Correct Riesz Representer: Automated model captures
.
Advantages:
- Automates Riesz representer estimation.
- Handles complex, high-dimensional data.
- Robust to nuisance function misspecification.
- Flexible for ATE or CATE.
Limitations:
- Computationally intensive.
- Black-box nature reduces interpretability.
- Sensitive to unobserved confounders.
- Requires large samples.
- Less developed theoretical guarantees.
References:
- Chernozhukov, V., et al. (2022). “Automatic Debiased Machine Learning via Neural Riesz Regression.” Working Paper.
- Farrell, M. H., et al. (2021). “Deep Neural Networks for Estimation and Inference.” Econometrica.
Comparison Summary
Method | Data Type | Key Assumption | Strength | Weakness |
---|---|---|---|---|
DiD | Panel | Parallel trends | Controls time-invariant confounders | Sensitive to parallel trends violation |
RD | Cross-sectional/Panel | Continuity at cutoff | Strong internal validity | Local effect, bandwidth sensitivity |
IV | Cross-sectional/Panel | Valid instruments | Handles endogeneity | Hard to find valid instruments |
Instrument-Free | Cross-sectional/Panel | Distributional assumptions | No instruments needed | Strong distributional assumptions |
Control Function | Cross-sectional/Panel | Valid instruments | Explicit endogeneity correction | Misspecification sensitivity |
PSM | Cross-sectional/Panel | Unconfoundedness | Balances observed covariates | Sensitive to unobserved confounders |
PSW | Cross-sectional/Panel | Unconfoundedness | Uses all data, flexible | Extreme weight instability |
Regression Adjustment | Cross-sectional/Panel | Unconfoundedness | Simple, flexible forms | Misspecification sensitivity |
Synthetic Control | Panel | Convex hull, no interference | Ideal for single-unit studies | Limited to few treated units |
Causal Forests | Cross-sectional/Panel | Unconfoundedness | Heterogeneous effects, high-dimensional | Large sample requirement |
DML | Cross-sectional/Panel | Unconfoundedness | Robust high-dimensional inference | Computationally intensive |
Auto-DML | Cross-sectional/Panel | Unconfoundedness | Automates Riesz representer, flexible | Black-box, computationally intensive |
Final Notes
The choice of method depends on data structure, identification strategy, confounding, and the causal parameter of interest (ATE, ATT, LATE, CATE). DiD and synthetic control suit panel data with policy interventions, while RD is ideal for cutoffs. IV and control function address endogeneity but require instruments. PSM, PSW, and regression adjustment rely on unconfoundedness, vulnerable to unobserved confounders. Causal forests, DML, and Auto-DML leverage machine learning for high-dimensional settings, with Auto-DML automating the Riesz representer for efficiency. Instrument-free approaches are promising but rely on strong assumptions. Combining methods or using robustness checks is recommended.
Conclusion
The choice of causal inference method depends on your data structure, identification strategy, and research context. Traditional methods like DiD and IV remain powerful for natural experiments, while machine learning approaches (DML, Causal Forests) excel in high-dimensional settings.
Key principles: Match the method to your identification challenge, validate assumptions where possible, and consider triangulation across multiple approaches. Remember that sophisticated methods cannot overcome fundamental identification problems—careful research design remains paramount for credible causal inference.
Reference
Causal Inference with Quasi-Experimental Data. (2024, November 20). American Marketing Association. https://www.ama.org/marketing-news/causal-inference-with-quasi-experimental-data/