Week 2: Causal Diagrams — Five Elementary Structures
- Barrett M (2023). ggdag: Analyze and Create Elegant Directed Acyclic Graphs. R package version 0.2.7.9000. https://github.com/malcolmbarrett/ggdag
- "An Introduction to Directed Acyclic Graphs": https://r-causal.github.io/ggdag/articles/intro-to-dags.html
- "Common Structures of Bias": https://r-causal.github.io/ggdag/articles/bias-structures.html
- Confounding
- Causal Directed Acyclic Graph
- Five elementary causal structures
- d-separation
- Back door path
- Conditioning
- Fork bias
- Collider bias
- Mediator bias
- Four rules of confounding control
- Create a new
.Rfile called02-lab.Rwith your name, contact, date, and a title such as "Simulating the five basic causal structures in R." - Copy and paste the code chunks below during class.
- Save in a clearly defined project directory.
You may also download the lab here: Download the R script for Lab 02
Seminar
Overview
- Understand basic features of causal diagrams: definitions and applications
- Introduction to the five elementary causal structures
- Lab: gentle introduction to simulation and regression
Review
- Psychological research begins with two questions:
- What do I want to know?
- For which population does this knowledge generalise?
This course considers how to ask psychological questions that pertain to populations with different characteristics.
-
In psychological research, we typically ask questions about the causes and consequences of thought and behaviour: "What if?" questions (Hernán & Robins, 2025).
-
The following concepts help us describe two distinct failure modes:
-
External validity: the extent to which findings generalise to other situations, people, settings, and time periods. We want to know if our findings carry beyond the sample population to the target population. We fail when our results do not generalise as we think, or when we have not clearly defined our question or target population.
-
Internal validity: the extent to which the associations we obtain from data reflect causality. When asking "What if?" questions, we want to understand what would happen if we intervened. In this course, we use "treatment" or "exposure" to denote the intervention, and "outcome" to denote the effect of an intervention.
- During the first part of the course, our primary focus is on challenges to internal validity from confounding bias.
How randomisation identifies causal effects
Last week we introduced the fundamental problem of causal inference: we cannot observe both potential outcomes and for the same individual. Randomisation solves this problem at the population level. When treatment is randomly assigned (), treatment assignment is independent of all potential outcomes:
This condition is called unconditional exchangeability. It means that the treated and untreated groups are, on average, identical in every respect except the treatment they received. Any difference in average outcomes between groups can therefore be attributed to the treatment itself. The ATE is then identified by a simple difference in group means:
Experiments are the benchmark for causal inference because randomisation eliminates confounding without requiring the investigator to know or measure the confounders. However, most psychological questions cannot be answered by experiment alone: we cannot randomly assign people to experience grief, adopt a religion, or grow up in poverty. For these questions, we need observational methods that achieve conditional exchangeability through careful adjustment. The causal diagrams we learn today are the tools for deciding what to adjust for.
Definitions
Internal validity is compromised if the association between the treatment and outcome in a study does not consistently reflect causality in the sample population as defined at baseline.
External validity is compromised if the association between the treatment and outcome in a study does not consistently reflect causality in the target population as defined at baseline.
Confounding bias exists if there is an open back-door path between the treatment and outcome, or if the path between the treatment and outcome is blocked.
Today, our purpose is to clarify the meaning of each term in this definition. To that end, we introduce the five elementary graphical structures employed in causal diagrams, then explain the four elementary rules that allow investigators to identify causal effects from the asserted relations in a causal diagram.
Introduction to Causal Diagrams
Causal diagrams (also called causal graphs, Directed Acyclic Graphs, or Causal DAGs) are graphical tools whose primary purpose is to enable investigators to detect confounding biases.
Remarkably, causal diagrams are rarely used in psychology!
The meaning of our symbols
For us:
- denotes a variable without reference to its role.
- denotes the "treatment" or "exposure" variable: the variable for which we seek to understand the effect of intervening on it. It is the "cause."
- denotes the outcome or response of an intervention. It is the "effect."
- denotes the counterfactual or potential state of in response to setting the level of the exposure to . To consistently estimate causal effects we need to evaluate counterfactual states of the world. This might seem like science fiction, but we are already familiar with methods for obtaining such contrasts: randomised controlled experiments.
- denotes a measured confounder or set of confounders: a variable which, if conditioned upon, closes an open back-door path between and .
- denotes an unmeasured confounder: a variable that may affect both the treatment and the outcome but for which we have no direct measurement.
- denotes a mediator: a variable along the path from exposure to outcome. Conditioning on a mediator when estimating the total effect of on will bias that estimate.
- denotes a sequence of variables, for example a sequence of treatments.
- denotes a randomisation or a chance event.
Elements of our causal graphs
Time indexing
In our causal diagrams, we implement two conventions for temporal order. First, the layout is structured left to right to reflect the sequence of causality. Second, we index nodes according to the relative timing of events. If precedes , the indexing indicates this chronological order.
Representing uncertainty in timing
When the sequence of events is ambiguous (particularly in cross-sectional data), we use to propose a temporal order without clear time-specific measurements. We denote an event presumed to occur first as and a subsequent event as .
Arrows
- Black arrows denote causality.
- Red arrows reveal an open backdoor path.
- Dashed black arrows denote attenuation.
- Red dashed arrows denote bias in a true causal association.
- A blue arrow with a circle point denotes effect-measure modification.
- denotes random treatment assignment.
Boxes
- A black box denotes conditioning that reduces confounding or is inert.
- A red box describes conditioning that introduces confounding bias.
- A dashed circle denotes a latent variable (not measured or not conditioned upon).
Terminology for conditional independence
- Statistical independence (): means treatment assignment is independent of the potential outcomes.
- Statistical dependence (): means treatment assignment is related to potential outcomes, potentially introducing bias.
- Conditioning (): specifies contexts under which independence or dependence holds.
- Conditional independence: means that once we account for , treatment and potential outcomes are independent.
- Conditional dependence: means potential outcomes and treatments are not independent after conditioning on .
The Five Elementary Structures of Causality
Judea Pearl proved that all elementary structures of causality can be represented graphically (Pearl, 2009).
Two variables:
- Causality absent: no causal effect between and . They are statistically independent: .
- Causality: causally affects . They are statistically dependent: .
Three variables:
- Fork: causally affects both and . Variables and are conditionally independent given : .
- Chain: is affected by , which is affected by . Variables and are conditionally independent given : . Here mediates the effect of on .
- Collider: is affected by both and , which are independent. Conditioning on induces an association: .
Once we understand these basic relationships, we can build more complex causal diagrams. These structures help us see how statistical independences and dependencies emerge from the data, allowing us to clarify confounders so that .
The Three Identification Assumptions
Every causal inference, whether from an experiment or an observational study, rests on three assumptions. When all three hold, the causal effect of on is identified: it can be estimated from the observed data.
The observed outcome for an individual equals the potential outcome under the treatment that individual actually received. Formally: if , then . This assumption requires that the treatment is well-defined and that there are no hidden versions of treatment. For example, if "exercise" is the treatment, the assumption requires that exercising means roughly the same thing for everyone in the study.
Within levels of the measured covariates , treatment assignment is independent of the potential outcomes:
This is the assumption that, after conditioning on , the treated and untreated groups are exchangeable. It is satisfied automatically by randomisation (where is empty and the condition is unconditional). In observational studies, it requires that we have measured and conditioned on all common causes of treatment and outcome. No statistical test can confirm this assumption; it is justified by subject-matter knowledge encoded in a causal diagram.
Every individual has a non-zero probability of receiving each treatment level, within every stratum of the covariates:
In plain language, there must be both treated and untreated individuals at every combination of covariate values. If some subgroup never receives treatment (for example, if no one over age 90 is prescribed the drug), we cannot estimate the causal effect for that subgroup because we have no counterfactual comparison.
These three assumptions form the foundation for everything that follows in this course. Causal diagrams help us evaluate whether the second assumption (conditional exchangeability) holds: if we can identify a set that blocks all backdoor paths between and , and if we condition on , then conditional exchangeability is satisfied (given that our diagram is correct).
You might wonder: "If not from the data, where do our assumptions about causality come from?" Our assumptions are based on existing knowledge. Otto Neurath, an Austrian philosopher, used the metaphor of a ship rebuilt at sea:
We are like sailors who on the open sea must reconstruct their ship but are never able to start afresh from the bottom. Where a beam is taken away a new one must at once be put there, and for this the rest of the ship is used as support. In this way, by using the old beams and driftwood, the ship can be shaped entirely anew, but only by gradual reconstruction. (Neurath, 1973, p. 199)
The Four Rules of Confounding Control
-
Condition on common cause or its proxy. When and share common causes, conditioning on these common causes blocks the open backdoor paths.
-
Do not condition on a mediator. When mediates , conditioning on biases the total causal effect estimate.
-
Do not condition on a collider. When is a common effect of and , conditioning on induces a spurious association. For example, if marriage causes wealth and happiness causes wealth, conditioning on wealth induces an association between marriage and happiness even without a causal connection.
-
Proxy rule: conditioning on a descendant is akin to conditioning on its parent. When is an effect of , conditioning on is approximately equivalent to conditioning on . This can introduce bias (if is a collider) or reduce bias (if is an unmeasured common cause and is a measured proxy).
The three identification assumptions and randomisation framework are covered in:
- Hernán MA, Robins JM (2025). Causal Inference: What If. Chapters 1–2. link
- Bulbulia JA (2024). "Methods in causal inference. Part 1: causal diagrams and confounding." Evolutionary Human Sciences. link
Why experiments are the benchmark for causal inference:
- Bulbulia JA (2024). "Methods in causal inference. Part 4: confounding in experiments." Evolutionary Human Sciences. link
Lab materials: Lab 2: Install R and RStudio