Causal Diagrams: The Structures of Confounding Bias
PSYC 434 — Week 3
Motivating example
Suppose investigators regress charitable giving on religious attendance using NZAVS data, adjusting for age, income, and education. The fitted model has high R^2.
Can we conclude that attendance causes giving?
Not yet
Conscientiousness may affect both attendance and giving. If that common cause is unmeasured, confounding remains, regardless of how well the model fits.
Good fit \neq good causal identification.
What is confounding?
Confounding exists when a common cause of treatment A and outcome Y opens a non-causal backdoor path. A backdoor path starts with an arrow intoA.
This path mixes causal and spurious association. If L is unmeasured, the mix is unresolvable.
Conditioning on the common cause
Conditioning on \boxed{L} blocks the backdoor path. The arrows turn from red to white: no bias flows through a blocked path.
The backdoor criterion
A set L satisfies the backdoor criterion for A and Y if:
No variable in L is a descendant of A
L blocks every backdoor path from A to Y
If both conditions hold:
Y(a) \coprod A \mid L.
Confounding and regression
Regression is one way to condition on measured variables:
Y = \beta_0 + \beta_1 A + \beta_2 L + \varepsilon
Symbol
Meaning
\beta_0
Expected outcome when covariates are zero
\beta_1
Expected outcome difference per unit change in A, conditional on model terms
R^2
Proportion of variance explained
Why model fit is misleading for causality
A model can fit very well and still be causally wrong:
Mistake
Consequence
Detected by R^2?
Condition on a mediator
Blocks part of the target effect
No
Condition on a collider
Opens a spurious path
No
Omit a confounder
Leaves a spurious path open
No
Time ordering can resolve some confounding
Time-resolved confounding
A common longitudinal strategy: measure confounders at t_0, treatment at t_1, and outcome at t_2. If all common causes of A and Y are captured at baseline, time ordering blocks backdoor paths.
What time ordering cannot resolve
Confounding not resolved by time
M-bias occurs when investigators condition on a pre-treatment collider. In the structure U_1 \to L \leftarrow U_2, with U_1 \to A and U_2 \to Y, conditioning on L opens a previously blocked path.
“Control for everything measured at baseline” is not a safe rule.
Mediation assumptions
Mediation structure
Mediation analysis needs stronger assumptions than total-effect analysis. Treatment-induced confounding of the mediator-outcome relation can make standard regression unsuitable for estimating direct and indirect effects.
Return to the opening example
Back to the NZAVS regression of charitable giving on religious attendance.
High R^2 does not answer the causal question. To estimate the effect of attendance on giving, we need a defended DAG, a valid adjustment set, and a design that addresses remaining bias paths.
This is why we separate modelling from causal identification.
Readings
Required and optional readings for each week are listed on the course readings page.