Potential Outcomes and Causal Inference
This page introduces the fundamental problem of causal inference, the potential outcomes framework, and the three identification assumptions needed to estimate causal effects from data. It draws on the Women's Health Initiative (WHI) hormone replacement therapy case study as a motivating example.
Motivating example: hormone replacement therapy
Observational evidence (1980s–1990s)
Throughout the 1980s and 1990s, observational studies suggested that oestrogen therapy reduced all-cause mortality in postmenopausal women by roughly 30% (hazard ratio for current users vs. never users). Professional bodies endorsed hormone replacement therapy (HRT) on this basis:
- 1992, American College of Physicians: "Women who have coronary heart disease or who are at increased risk ... are likely to benefit from hormone therapy."
- 1996, American Heart Association: "ERT does look promising as a long-term protection against heart attack."
The experiment disagreed
The Women's Health Initiative (WHI) was a large randomised, double-blind, placebo-controlled trial enrolling 16,000 women aged 50–79. Participants were assigned to oestrogen plus progestin therapy and followed for up to eight years.
The experimental hazard ratio for all-cause mortality was 1.23 (initiators vs. non-initiators), the opposite direction from the observational finding.
What went wrong?
The discrepancy was not a failure of causal assumptions. It was a failure of study design: the observational studies did not correctly emulate a target trial. Specifically, they failed to align "time zero" (the start of follow-up) with the moment of treatment initiation, introducing survivor bias. When investigators re-analysed the observational data using a target trial emulation framework that matched treatment initiation to the start of follow-up, the observational estimates aligned with the experimental findings.
If you want causal inferences from observational data, design the analysis as though you were running an experiment. Specify the target trial first.
The fundamental problem of causal inference
Causality is never directly observed. To quantify a causal effect, we need to compare two states of the world for the same individual, but each individual can experience only one.
Notation
Let denote a binary exposure (: treated, : untreated) and denote the outcome.
- : the potential outcome for individual under treatment.
- : the potential outcome for individual under control.
The individual causal effect is:
We say there is a causal effect when .
The missing-data problem
At most one potential outcome is observed for each individual. The unobserved outcome is the counterfactual:
- If is observed, then is counterfactual.
- If is observed, then is counterfactual.
Individual-level causal effects are therefore generally unidentifiable. However, under certain assumptions, we can identify average causal effects at the population level.
Three identification assumptions
1. Causal consistency
The potential outcome corresponding to the exposure an individual actually receives equals their observed outcome:
This assumption requires that treatment is well-defined (no hidden versions of treatment) and that there is no interference between units (one person's treatment does not affect another's outcome).
2. Exchangeability
The potential outcomes are independent of treatment assignment. In a randomised experiment, this holds by design. In observational studies, we require conditional exchangeability: after conditioning on a set of measured covariates , treatment assignment is independent of potential outcomes:
When exchangeability holds, the Average Treatment Effect (ATE) is identified:
In observational settings with confounders :
3. Positivity
Every individual has a non-zero probability of receiving each treatment level, conditional on their covariates:
Positivity is the only assumption that can be verified with data. Violations occur when certain subgroups never receive a particular treatment level, making causal effect estimates for those subgroups extrapolations rather than identifiable quantities.
In observational settings, all three assumptions face threats. Causal consistency may fail when treatment varies across individuals (e.g., different forms of "religious service attendance"). Exchangeability is violated when unmeasured confounders exist. Positivity fails when certain subgroups have no access to treatment. These threats motivate the careful study designs and sensitivity analyses covered in later weeks.
From experiments to observational data
Randomised experiments address the fundamental problem by balancing confounders across treatment groups. Random assignment satisfies exchangeability by design, and controlled treatment administration satisfies consistency. Although individual causal effects remain unobservable, random assignment allows inference about average (marginal) causal effects.
In observational data, we must satisfy these three assumptions through study design, covariate adjustment, and sensitivity analysis. The remainder of this course develops the tools for doing so: causal diagrams (weeks 2–4), estimation methods (weeks 5–6, 8–9), and measurement considerations (week 10).
Discussion questions
- Where in your own research would an average treatment effect be the right causal estimand, and when would it mask disparities that stakeholders care about?
- Which of the three identification assumptions is most fragile in your field, and what designs or measurements could strengthen it?
- What are examples of post-treatment variables you have been tempted to adjust for, and how would doing so bias the total effect?
Further reading
- Hernan MA, Robins JM. Causal Inference: What If. Chapman & Hall/CRC, 2025. Chapters 1–3. Book site
- See the Course Readings page for a chapter-by-week guide.