Selection Bias and Measurement Bias

PSYC 434 — Week 4

quick reset: what would you condition on?

for each graph, ask:

  1. what path connects A and Y?
  2. would conditioning on the middle variable help, hurt, or change the question?

warm-up 1: common cause

question: if you want the causal effect of A on Y, would you condition on \boxed{L}?

warm-up 2: mediator

question: if you want the total effect of A on Y, would you condition on \boxed{M}?

warm-up 3: collider

question: would conditioning on \boxed{C} help?

warm-up 4: descendant of a collider

question: if you do not condition on C, but you do condition on its descendant \boxed{D}, what happens?

warm-up 5: descendant of a common cause

question: if L is unmeasured, would conditioning on \boxed{D} be the same as conditioning on \boxed{L}?

warm-up summary

three rules from earlier weeks:

  1. common cause: often condition
  2. mediator: do not condition if you want the total effect
  3. collider: do not condition

two extensions for this week:

  1. descendant of a collider: also dangerous to condition on
  2. descendant of a common cause: not a guaranteed substitute for the common cause

Motivating example: one study, two failure modes

bilingualism study recruited through university mailing lists:

  1. Selection: people with high academic motivation and strong language confidence are more likely to enrol
  2. Measurement: the cognitive task is validated only in English, so non-English-dominant participants may be mismeasured

Why this week extends Weeks 2–3

weeks 2–3: confounding

week 4 adds:

Threat Source of bias
Selection bias Who enters the analytic sample
Measurement bias How variables are recorded

Common causal questions as graphs

Common causal questions

different questions require different graphs

Measurement error: two dimensions

two dimensions of measurement error:

Uncorrelated errors Correlated errors
Independent Attenuates effects Creates spurious associations
Dependent Opens non-causal paths Biases in either direction

Independent measurement error

left: uncorrelated errors attenuate effects toward zero. right: a shared cause of errors (U) creates spurious associations even when no causal effect exists.

Dependent measurement error

left: the true exposure affects measurement of the outcome (red diagonal), opening a non-causal path. right: dependent errors with a shared cause (U) can bias in either direction.

Selection bias

selection can act like collider conditioning

Selection bias without colliders

No confounding, no collider. A is randomised. Yet if Z modifies the effect of A on Y (open circle), and Z is distributed differently in the sample than in the target population, the sample ATE does not transport.

Target, source, and analytic populations

External validity and transport
Population Role
Target Where we want the causal claim to apply
Source Where recruitment occurs
Analytic sample Who is actually analysed

transportability asks whether effect-relevant structure carries across populations

WEIRD samples and effect heterogeneity

weird is a problem when effect modifiers differ

Return to the opening example

back to the bilingualism study:

  1. Selection: why did these participants enter the analytic sample, and does that selection depend on treatment or outcome?
  2. Measurement: do the instruments measure the same constructs across all participants?

reading a regression in r

basic pattern:

fit <- lm(
  exam_score ~ study_hours + motivation,
  data = df_scores
)

read it left to right:

  • fit <- store the model
  • lm() fit a linear model
  • exam_score outcome
  • ~ modelled as a function of
  • study_hours + motivation predictors
  • data = df_scores where the variables live

what changes when the formula changes?

# one predictor
lm(exam_score ~ study_hours, data = df_scores)

# no predictor, only an intercept
lm(exam_score ~ 1, data = df_scores)

# two predictors
lm(exam_score ~ study_hours + motivation, data = df_scores)

# interaction
lm(exam_score ~ study_hours * workshop, data = df_scores)
  • ~ 1 fits a flat mean line
  • + motivation adjusts for one more variable
  • * workshop allows the slope for study_hours to differ by workshop

useful follow-up lines

summary(fit)

inspect the fitted model.

plot(df_scores$study_hours, df_scores$exam_score)
abline(fit)

see what the fitted line is doing.

predict(fit)

get fitted values from the model.

Readings

Required and optional readings for each week are listed on the course readings page.