would conditioning on the middle variable help, hurt, or change the question?
warm-up 1: common cause
question: if you want the causal effect of A on Y, would you condition on \boxed{L}?
warm-up 2: mediator
question: if you want the total effect of A on Y, would you condition on \boxed{M}?
warm-up 3: collider
question: would conditioning on \boxed{C} help?
warm-up 4: descendant of a collider
question: if you do not condition on C, but you do condition on its descendant \boxed{D}, what happens?
warm-up 5: descendant of a common cause
question: if L is unmeasured, would conditioning on \boxed{D} be the same as conditioning on \boxed{L}?
warm-up summary
three rules from earlier weeks:
common cause: often condition
mediator: do not condition if you want the total effect
collider: do not condition
two extensions for this week:
descendant of a collider: also dangerous to condition on
descendant of a common cause: not a guaranteed substitute for the common cause
Motivating example: one study, two failure modes
bilingualism study recruited through university mailing lists:
Selection: people with high academic motivation and strong language confidence are more likely to enrol
Measurement: the cognitive task is validated only in English, so non-English-dominant participants may be mismeasured
Why this week extends Weeks 2–3
weeks 2–3: confounding
week 4 adds:
Threat
Source of bias
Selection bias
Who enters the analytic sample
Measurement bias
How variables are recorded
Common causal questions as graphs
Common causal questions
different questions require different graphs
Measurement error: two dimensions
two dimensions of measurement error:
Uncorrelated errors
Correlated errors
Independent
Attenuates effects
Creates spurious associations
Dependent
Opens non-causal paths
Biases in either direction
Independent measurement error
left: uncorrelated errors attenuate effects toward zero. right: a shared cause of errors (U) creates spurious associations even when no causal effect exists.
Dependent measurement error
left: the true exposure affects measurement of the outcome (red diagonal), opening a non-causal path. right: dependent errors with a shared cause (U) can bias in either direction.
Selection bias
selection can act like collider conditioning
Selection bias without colliders
No confounding, no collider. A is randomised. Yet if Z modifies the effect of A on Y (open circle), and Z is distributed differently in the sample than in the target population, the sample ATE does not transport.
Target, source, and analytic populations
External validity and transport
Population
Role
Target
Where we want the causal claim to apply
Source
Where recruitment occurs
Analytic sample
Who is actually analysed
transportability asks whether effect-relevant structure carries across populations
WEIRD samples and effect heterogeneity
weird is a problem when effect modifiers differ
Link to Week 10
measurement invariance is transport for constructs
Return to the opening example
back to the bilingualism study:
Selection: why did these participants enter the analytic sample, and does that selection depend on treatment or outcome?
Measurement: do the instruments measure the same constructs across all participants?
reading a regression in r
basic pattern:
fit <-lm( exam_score ~ study_hours + motivation,data = df_scores)
read it left to right:
fit <- store the model
lm() fit a linear model
exam_score outcome
~ modelled as a function of
study_hours + motivation predictors
data = df_scores where the variables live
what changes when the formula changes?
# one predictorlm(exam_score ~ study_hours, data = df_scores)# no predictor, only an interceptlm(exam_score ~1, data = df_scores)# two predictorslm(exam_score ~ study_hours + motivation, data = df_scores)# interactionlm(exam_score ~ study_hours * workshop, data = df_scores)
~ 1 fits a flat mean line
+ motivation adjusts for one more variable
* workshop allows the slope for study_hours to differ by workshop