Causal Diagrams: The Structures of Confounding Bias

Motivating example

Suppose investigators regress charitable giving on religious attendance using NZAVS data, adjusting for age, income, and education. The fitted model has high R^2.

Can we conclude that attendance causes giving?

Not yet

Conscientiousness may affect both attendance and giving. If that common cause is unmeasured, confounding remains, regardless of how well the model fits.

Good fit \neq good causal identification.

What is confounding?

Confounding exists when a common cause of treatment A and outcome Y opens a non-causal backdoor path. A backdoor path starts with an arrow into A.

This path mixes causal and spurious association. If L is unmeasured, the mix is unresolvable.

Conditioning on the common cause

Conditioning on \boxed{L} blocks the backdoor path. The arrows turn from red to white: no bias flows through a blocked path.

The backdoor criterion

A set L satisfies the backdoor criterion for A and Y if:

No variable in L is a descendant of A
L blocks every backdoor path from A to Y

If both conditions hold:

Y(a) \coprod A \mid L.

Confounding and regression

Regression is one way to condition on measured variables:

Y = \beta_0 + \beta_1 A + \beta_2 L + \varepsilon

Symbol	Meaning
\beta_0	Expected outcome when covariates are zero
\beta_1	Expected outcome difference per unit change in A, conditional on model terms
R^2	Proportion of variance explained

Why model fit is misleading for causality

A model can fit very well and still be causally wrong:

Mistake	Consequence	Detected by R^2?
Condition on a mediator	Blocks part of the target effect	No
Condition on a collider	Opens a spurious path	No
Omit a confounder	Leaves a spurious path open	No

Time ordering can resolve some confounding

A common longitudinal strategy: measure confounders at t_0, treatment at t_1, and outcome at t_2. If all common causes of A and Y are captured at baseline, time ordering blocks backdoor paths.

What time ordering cannot resolve

M-bias occurs when investigators condition on a pre-treatment collider. In the structure U_1 \to L \leftarrow U_2, with U_1 \to A and U_2 \to Y, conditioning on L opens a previously blocked path.

“Control for everything measured at baseline” is not a safe rule.

Mediation assumptions

Mediation analysis needs stronger assumptions than total-effect analysis. Treatment-induced confounding of the mediator-outcome relation can make standard regression unsuitable for estimating direct and indirect effects.

Return to the opening example

Back to the NZAVS regression of charitable giving on religious attendance.

High R^2 does not answer the causal question. To estimate the effect of attendance on giving, we need a defended DAG, a valid adjustment set, and a design that addresses remaining bias paths.

This is why we separate modelling from causal identification.

Readings

Required and optional readings for each week are listed on the course readings page.