Week 4: Selection Bias and Measurement Bias
Readings
Required
Optional
Key concepts for the test(s)
- Independent/uncorrelated measurement error bias
- Independent/correlated measurement error bias
- Dependent/uncorrelated measurement error bias
- Dependent/correlated measurement error bias
- Collider bias as a distinct mechanism
- Selection bias and transportability
Lab 4 setup
Use Lab 4: Writing Regression Models for this week's practical work. The lab page links the student practice script first and the instructor script second.
Seminar
Motivating example: one study, two failure modes
Suppose investigators recruit participants for a bilingualism study through university mailing lists. Recruitment is selective. People with high academic motivation and strong language confidence are more likely to enrol.
Now add measurement error. Suppose the cognitive task is validated only in English. Then non-English-dominant participants may be mismeasured.
One study can fail in two ways. It can fail through biased sampling. It can fail through biased measurement.
Learning outcomes
By the end of this week, you should be able to:
- Use fork, chain, and collider structures to recognise how bias enters a study.
- Classify four structural types of measurement error.
- Explain how conditioning on selection can bias causal contrasts.
- Distinguish target, source, and analytic populations for transport claims.
Why this week extends Weeks 2-3
Weeks 2-3 focused on confounding paths between treatment and outcome. That may have made DAGs look like a tool just for "finding confounders". They are more general than that.
Week 4 adds two structural threats. The first threat is selection bias. The second threat is measurement bias. Both can be analysed with the same elementary graph logic you already know.
The same graph logic still applies
The key idea from the earlier weeks is not just "adjust for confounders". The key idea is that a small number of graph structures determine which paths are open and which are blocked. Once you can recognise a fork, a chain, and a collider, you can start to diagnose bias in almost any study.
More complicated studies are built from simpler pieces. Selection problems, attrition problems, and measurement problems usually look confusing at first because they involve more nodes. But the underlying logic is still the same.
Five elementary structures
- No causal relation: no arrow connects the variables.
- Direct causation: one variable causes another.
- Fork: one variable is a common cause of two others.
- Chain: one variable sits on the path between two others.
- Collider: one variable is a common effect of two others.
When you see $A \coprod Y$, read this as "A is statistically independent of Y". When you see $A \cancel\coprod Y$, read this as "A is statistically dependent on Y".
A short rulebook for week 4
Four practical rules
- Condition on common causes or their good proxies when you want to block a non-causal backdoor path.
- Do not condition on mediators when you want the total effect, because you would block part of the causal pathway.
- Do not condition on colliders, because conditioning opens a path that was previously blocked.
- Be careful with descendants and proxies, because conditioning on them can behave like conditioning on their parent.
This is the bridge to this week. Selection bias often appears because study entry, study retention, or analytic restriction acts like a collider or a descendant of a collider. Measurement bias often appears because the variable we record is a noisy proxy or downstream consequence of the variable we actually care about.
Pair check: what does conditioning do?
- In a fork, $A \leftarrow L \to Y$, what does conditioning on $L$ do?
- In a chain, $A \to M \to Y$, what happens if we condition on $M$ when we want the total effect of $A$ on $Y$?
- In a collider, $A \to C \leftarrow Y$, what happens if we condition on $C$?
You only need one sentence for each answer.
Common causal questions as graphs
Different questions require different graphs. Different questions require different causal estimands. Different questions require different assumptions. The important point for this week is that the same picture language can represent confounding, selection, measurement, mediation, and transport problems. The effect modification graph (open circle into the outcome) reappears below when we distinguish two mechanisms of selection bias.
A typology of measurement error bias
Measurement error is not a separate universe from the DAGs you have already seen. It becomes easier to understand once we separate the true variable from the recorded variable. Then the same questions return: what causes what, what is shared, and what happens if we condition on the recorded value?
Four structural types of measurement error
Measurement error is classified along two dimensions.
Dimension 1: independent vs dependent
- Independent (undirected): one true variable does not causally affect another variable's measurement error.
- Dependent (directed): one true variable causally affects another variable's measurement error.
Dimension 2: uncorrelated vs correlated
- Uncorrelated: errors do not share a common cause.
- Correlated: errors share a common cause.
Combinations:
- Independent, uncorrelated: often attenuates effects toward the null. Example: a self-report anxiety scale adds random noise to the true score. The noise is unrelated to treatment status, so it blurs the signal without creating a false one.
- Independent, correlated: can create spurious associations even when no causal effect exists. Example: societies with advanced record-keeping produce more precise records of both religious beliefs and social complexity. The shared cause (record-keeping quality) induces a non-causal association between treatment and outcome measures.
- Dependent, uncorrelated: can open non-causal paths from treatment to measured outcome. Example: participants who receive an intervention report their outcomes more favourably because the treatment itself changes how they interpret survey items. The exposure causally affects measurement of the outcome.
- Dependent, correlated: can bias in either direction, and the direction is hard to predict analytically. Example: social complexity shapes how historical archives record both religious beliefs and governance structures, and the errors in both records share a common cause in elite patronage of scribes.
Pair exercise: classifying measurement error
For each scenario, classify the measurement error using the two dimensions (independent/dependent, correlated/uncorrelated) and name the type number (1-4).
- A self-report screen-time measure adds noise because participants guess rather than track. Social desirability also inflates wellbeing reports. The errors are unrelated to each other and unrelated to treatment status.
- A cognitive test for bilingualism effects is validated only in English. Non-English-dominant participants are systematically mismeasured. The treatment (bilingualism) causally affects measurement of the outcome.
- A cross-cultural study uses the same translation team for exposure and outcome instruments. Shared translation quality introduces correlated errors in both measures.
For scenario 2, draw a short DAG showing how the treatment ($A$) creates a path through the measurement node to the recorded outcome.
Selection bias and transportability
Selection bias occurs when inclusion in the analytic sample depends on variables related to treatment, outcome, or effect modifiers. It threatens validity in two structurally distinct ways.
Mechanism 1: collider conditioning (internal validity). When both treatment and outcome affect who enters the sample, selection acts as a collider. Conditioning on it opens a non-causal path between $A$ and $Y$. The estimate is biased for the population it claims to describe.
Mechanism 2: effect modifier imbalance (external validity). Even without confounding, a sample can fail to generalise. Suppose treatment $A$ is randomised, so no backdoor paths are open. If a variable $Z$ modifies the effect of $A$ on $Y$, and $Z$ is distributed differently in the analytic sample than in the target population, the sample ATE does not equal the population ATE. No non-causal path is opened; the internal validity is intact. The problem is that the average treatment effect is a weighted average of subgroup effects, and the weights differ between populations.
The open circle on the arrow from $Z$ to $Y$ denotes effect modification: $Z$ changes the size of $A$'s effect on $Y$. This is not a standard causal arrow. No confounding is present. Yet if $Z$ is distributed differently in the sample than in the target population, the sample ATE does not transport.
This second mechanism does not require a collider. A study of exercise and blood pressure conducted entirely in young adults may correctly estimate the ATE for young adults. If older adults benefit more (effect modification by age), the sample ATE underestimates the population ATE. The design is unconfounded but the conclusion does not transport.
Transportability asks whether effect-relevant structure is compatible between analytic and target populations. This requires knowing where effect modifiers differ, not just whether the sample is "representative" in some demographic sense.
Target, source, and analytic populations
- Target population: where we want the causal claim to apply.
- Source population: where recruitment occurs.
- Analytic sample: who is actually analysed.
Transportability requires that effect-relevant structure is compatible between analytic and target populations.
Collider bias: a distinct mechanism
Collider bias can feel new because the earlier weeks mostly taught you how to close open backdoor paths. Here the warning runs in the opposite direction: some conditioning decisions create bias rather than remove it.
Why collider bias is not confounding. Confounding arises from an open backdoor path through a common cause: $A \leftarrow L \to Y$. We usually reduce confounding by conditioning on $L$. Collider bias works in the opposite direction. In the structure $A \to C \leftarrow Y$, the path is blocked at first ($A \coprod Y$). Conditioning on $C$ opens a spurious association ($A \cancel\coprod Y \mid C$).
Why collider bias is not identical to selection bias. When collider conditioning happens through sample restriction, it appears as selection bias because the sample is truncated. Berkson's bias is the classic example. But collider bias can also appear when we stratify or adjust for a common effect inside a complete dataset. In that case the problem comes from the analytic decision, not from who entered the sample.
Why the same DAG rules still work. Pearl's d-separation criterion tells us which paths are opened and closed by conditioning. That is why DAGs help with more than confounding. The same framework lets us reason about collider bias, mediator bias, measurement error, and selection problems.
For this course, the practical upshot is simple: never condition on common effects, whether through sample restriction, stratification, or statistical adjustment. Conditioning on a collider opens a non-causal path.
Pair exercise: collider bias versus confounding
- A hospital study investigates whether depression ($A$) slows recovery ($Y$). Ward admission ($C$) depends on both depression severity and injury severity. Only admitted patients are analysed.
- Draw a DAG with $A \to C \leftarrow Y$ (ward admission as a collider of depression and recovery-related injury severity).
- Explain the non-causal path that opens when the study conditions on $C$ by restricting to admitted patients.
- Your partner argues "this is just confounding by injury severity." Counter by explaining the structural difference: confounding is an open backdoor path through a common cause, whereas collider bias opens a previously blocked path by conditioning on a common effect.
- Propose one design change that avoids this bias.
Attrition as a measurement error structure
Right-censoring (attrition) can bias causal estimates through two distinct mechanisms. The first is distortion: if the outcome affects who drops out, conditioning on the end-of-study sample conditions on a common effect of exposure and outcome. This opens a non-causal path. The bias is an internal validity problem; the estimate is wrong for the population it claims to describe.
The second mechanism is restriction: if effect modifiers are distributed differently among survivors than in the baseline population, the average treatment effect (ATE) estimated from the end-of-study sample may not match the ATE for the target population. No non-causal path is opened, but the sample no longer represents the population of interest. This is an external validity problem; the estimate may be correct for survivors but does not transport.
The structural parallel to measurement error is direct. Distortion through attrition mirrors dependent measurement error (type 3 above): the outcome causally affects what is recorded. Restriction through attrition mirrors independent measurement error off the null (type 1): the signal is diluted because the analytic sample differs from the target in composition. Investigators should diagnose which mechanism is operating, because the remedies differ: inverse-probability-of-censoring weights address distortion, whereas reweighting to the target population addresses restriction.
WEIRD samples and effect heterogeneity
A WEIRD sample is not automatically invalid. The problem is Mechanism 2: if effect modifiers are distributed differently between the analytic and target populations, and treatment effects vary by those modifiers, the sample ATE does not transport. A perfectly unconfounded study in a WEIRD sample can produce a correct estimate for that sample and a wrong estimate for the population of interest.
Link to Week 10
Measurement invariance is a transport problem for constructs. If a scale measures different constructs across groups, between-group contrasts can reflect measurement artefact.
Return to the opening example
Back to the bilingualism study. Two design checks are non-negotiable. First, why did these participants enter the analytic sample? Second, do the instruments measure the same constructs across participants? If either check fails, causal interpretation weakens.
With the structural threat landscape mapped (confounding, selection, measurement), Week 5 shows how the three identification assumptions introduced in Week 2 connect a causal question to a population-level causal contrast.
Pair exercise: auditing a study for two failure modes
- Return to the bilingualism example from the start of this lecture.
- Name the selection bias mechanism (what variable is acting as a collider or filter?).
- Name the measurement bias type from the four-type classification (independent/dependent, correlated/uncorrelated).
- Write a two-sentence design critique stating both problems and how each distorts the causal contrast.
Further reading
All open access: J. A. Bulbulia (2024c); J. A. Bulbulia (2024a); J. A. Bulbulia (2024b).
Lab materials: Lab 4: Writing Regression Models
Bulbulia, J. A. (2024a). Methods in causal inference part 1: Causal diagrams and confounding. Evolutionary Human Sciences, 6, e40. https://doi.org/10.1017/ehs.2024.35
Bulbulia, J. A. (2024b). Methods in causal inference part 2: Interaction, mediation, and time-varying treatments. Evolutionary Human Sciences, 6, e41. https://doi.org/10.1017/ehs.2024.32
Bulbulia, J. A. (2024c). Methods in causal inference part 3: Measurement error and external validity threats. Evolutionary Human Sciences, 6, e42. https://doi.org/10.1017/ehs.2024.33
Bulbulia, J., & Hine, D. W. (2024). Causal inference in environmental psychology. PsyArXiv. https://osf.io/preprints/psyarxiv/tbjx8
Hernán, M. A. (2017). Invited commentary: Selection bias without colliders | american journal of epidemiology | oxford academic. American Journal of Epidemiology, 185(11), 1048–1050. https://doi.org/10.1093/aje/kwx077
Hernán, M. A., & Cole, S. R. (2009). Invited commentary: Causal diagrams and measurement bias. American Journal of Epidemiology, 170(8), 959–962. https://doi.org/10.1093/aje/kwp293
Hernán, M. A., Hernández-Díaz, S., & Robins, J. M. (2004). A structural approach to selection bias. Epidemiology, 15(5), 615–625. https://www.jstor.org/stable/20485961
Hernán, M. A., & Robins, J. M. (2025). Causal inference: What if. Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
VanderWeele, T. J., & Hernán, M. A. (2012). Results on differential and dependent measurement error of the exposure and the outcome using signed directed acyclic graphs. American Journal of Epidemiology, 175(12), 1303–1310. https://doi.org/10.1093/aje/kwr458