Synthetic Controls for Experimental Design (Abadie and Zhao 2025)
TL;DR
Abadie and Zhao propose using synthetic control methods to design experiments when only one or a few large aggregate units (e.g., markets, cities) can be treated. Rather than randomizing treatment assignment—which can produce large post-randomization biases with small samples—their method jointly selects which units receive treatment and which serve as controls by matching pre-treatment characteristics. Simulations and an empirical application using Walmart sales data show that synthetic control designs substantially outperform randomized designs in settings with few treated units, reducing bias and improving statistical power while allowing for valid inference through novel permutation-based methods.
What is this paper about?
This paper addresses a fundamental challenge in experimental design: how to evaluate interventions when the experimental units are large aggregate entities (like local markets or cities) and only one or a small number can be exposed to treatment. In such settings, standard randomization can produce substantial post-randomization bias because the treated unit(s) may differ markedly from control units in baseline characteristics that affect outcomes. The motivating example involves a ride-sharing company testing a new driver compensation plan in one city—randomizing drivers within a city creates equity concerns and spillover effects, while randomizing across cities with few units risks poor balance. The authors propose adapting synthetic control methods (traditionally used in observational studies) to experimental design, where both the choice of treated units and control units are optimized jointly to reproduce aggregate counterfactuals of interest.
What do the authors do?
The authors develop several variants of synthetic control experimental designs that select units for treatment and control by matching weighted averages of pre-treatment predictors (including lagged outcomes and covariates) to population-level or treated-unit averages. Their baseline “unconstrained” design minimizes discrepancies between synthetic treated units and the population average, and between synthetic control units and the population average, subject to non-negativity and sum-to-one constraints on weights. They extend this to:
-
“constrained” designs (limiting the number of treated units for cost reasons)
-
“weakly-targeted” designs (targeting either average treatment effects or treatment effects on the treated)
-
“unit-level” designs (fitting separate synthetic controls for each treated unit)
-
clustered designs (accounting for natural groupings like regions)
They derive formal bias bounds under a linear factor model, showing that bias depends on the number of fitting periods, the scale of idiosyncratic shocks, and the number of unobserved factors. For inference, they propose novel permutation tests that rearrange estimated treatment effects across “blank periods” (pre-treatment periods not used for fitting) and post-treatment periods, proving the test is exact when factor loadings are exchangeable and approximately valid more generally. They also construct confidence intervals using split conformal prediction. The methods are validated through Monte Carlo simulations under both linear and nonlinear data-generating processes, and through a placebo test using weekly sales data from 45 Walmart stores, comparing performance against randomization, stratified randomization, regression adjustment, and nearest-neighbor matching.
Why is this important?
This work fills a critical gap in experimental methodology for settings where large-scale randomization is infeasible or undesirable—situations increasingly common in tech companies, policy evaluation, and corporate decision-making. When only a few aggregate units can be treated, randomization provides unbiased estimates ex ante (before randomization) but can yield large biases ex post (after randomization) if treated and control groups differ substantially at baseline. The synthetic control design directly addresses this by selecting units that jointly minimize imbalance, making the realized experiment more credible. The paper’s theoretical contributions—bias bounds that clarify the role of fitting periods, idiosyncratic noise, and unobserved factors, plus new inferential methods that allow for time-series dependence and non-stationarity—strengthen the foundation for applied work. The empirical validation demonstrates practical feasibility: in the Walmart data, synthetic control designs achieve root mean squared errors 3-10 times smaller than randomized alternatives. This makes rigorous causal inference possible in settings where researchers previously either accepted high bias or abandoned the project when pre-trends looked poor.
Who should care?
Applied researchers and practitioners conducting experiments with aggregate units should care, particularly in settings where intervention at the micro-unit level is impractical (e.g., testing market-level policies, district-level education reforms, or state-level regulations). This includes data scientists in technology companies designing market experiments, policy analysts evaluating place-based programs, economists studying geographically-clustered interventions, and corporate strategists testing operational changes across stores or regions. Methodologists working on causal inference, experimental design, and synthetic controls will find the paper’s extensions—especially the bias-variance trade-offs across designs, the treatment of time-series dependence in inference, and the integration of design selection with estimation—valuable for advancing the field. Anyone frustrated by the limitations of randomization in small-sample settings with aggregate units now has a principled alternative.
Do we have code?
Yes. The authors state that replication codes are available on GitHub (linked in the paper’s first page footnote). The online appendix provides detailed implementation guidance, including how to solve the optimization problems using both enumeration methods (for constrained designs with small numbers of treated units) and quadratic programming (for unconstrained and penalized designs). They implemented the synthetic control problems using the “lsei” function from the “limSolve” package in R 4.0.2, and the quadratic programs using Gurobi 9.0.2 in R. The appendix also documents the computational approach for each design variant, making the methods reproducible and accessible for practitioners.
In summary, this paper transforms synthetic controls from an observational tool into an experimental design strategy, showing both theoretically and empirically that careful unit selection can dramatically reduce bias compared to randomization when few aggregate units are available for treatment. By jointly optimizing the choice of treated and control units while preserving valid inference, it expands the frontier of rigorous experimental evaluation in settings previously considered methodologically challenging.
Reference
Abadie, Alberto and Jinglong Zhao (2025), “Synthetic controls for experimental design.” https://doi.org/10.48550/arXiv.2108.02196