Measurement And Selection Biases

Effect-Modification on Causal Directed Acyclic Graphs

The primary function of a causal directed acyclic graph is to allow investigators to apply Pearl’s backdoor adjustment theorem to evaluate whether causal effects may be identified from data, as shown in ?@tbl-terminologygeneral. We have noted that modifying a causal effect within one or more strata of the target population opens the possibility for biased average treatment effect estimates when the distribution of these effect modifiers differs in the analytic sample population (Bulbulia 2024b).

———. 2024b. “Methods in Causal Inference Part 2: Interaction, Mediation, and Time-Varying Treatments.” Evolutionary Human Sciences 6: e41. https://doi.org/10.1017/ehs.2024.32.
Bulbulia, J. A. 2024a. “Methods in Causal Inference Part 1: Causal Diagrams and Confounding.” Evolutionary Human Sciences 6: e40. https://doi.org/10.1017/ehs.2024.35.

We do not generally represent non-linearities in causal directed acyclic graphs, which are tools for obtaining relationships of conditional and unconditional independence from assumed structural relationships encoded in a causal diagram that may lead to a non-causal treatment/outcome association (Bulbulia 2024a).

Table 1 presents our convention for highlighting a relationship of effect modification in settings where (1) we assume no confounding of treatment and outcome and (2) there is effect modification such that the effect of A on Y differs in at least one stratum of the target population.

To focus on effect modification, we do not draw a causal arrow from the direct effect modifier F to the outcome Y. This convention is specific to this article (refer to Hernan and Robins (2020), pp. 126–127, for a discussion of ‘non-causal’ arrows).

Hernan, M. A., and J. M. Robins. 2020. Causal Inference: What If? Chapman & Hall/CRC Monographs on Statistics & Applied Probab. Taylor & Francis. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.

Part 1: How Measurement Error Bias Makes Your Causal Inferences weird {#id-sec-1}

(wrongly estimated inferences due to inappropriate restriction and distortion)

Measurements record reality, but they are not always accurate. Whenever variables are measured with error, our results can be misleading. Every study must therefore consider how its measurements might mislead.

Causal graphs can deepen understanding because—as implied by the concept of ‘record’—there are structural or causal properties that give rise to measurement error. Measurement error can take various forms, each with distinct implications for causal inference:

  • Independent (undirected) / uncorrelated: Errors in different variables do not influence each other.
  • Independent (undirected) and correlated: Errors in different variables are related through a shared cause.
  • Dependent (directed) and uncorrelated: Errors in one variable influence the measurement of another, but these influences are not related through a shared cause.
  • Dependent (directed) and correlated: Errors in one variable influence the measurement of another, and these influences are related through a shared cause (HernĂĄn and Cole 2009; VanderWeele and HernĂĄn 2012).
Hernán, Miguel A., and Stephen R. Cole. 2009. “Invited Commentary: Causal Diagrams and Measurement Bias.” American Journal of Epidemiology 170 (8): 959–62. https://doi.org/10.1093/aje/kwp293.

The six causal diagrams presented in Table 2 illustrate structural features of measurement error bias and clarify how these structural features compromise causal inferences.

Effect-Modification Examples of measurement error bias

Table 2

Understanding these structural features will help explain why measurement error bias cannot typically be evaluated with statistical models, and will prepare us to link target-population restriction biases to measurement error.

Example 1: Uncorrelated Non-Differential Errors under Sharp Null (No Treatment Effect)

Table 2 \mathcal{G}_1 illustrates uncorrelated non-differential measurement error under the ‘sharp null’, which arises when the error terms in the exposure and outcome are independent. In this setting, measurement error is not expected to bias estimates.

Example: A study on whether beliefs in big Gods affect social complexity in ancient societies, where societies randomly omitted or inaccurately recorded such beliefs and complexity, with errors independent across variables. Under randomisation, uncorrelated undirected errors will generally not bias estimates under the sharp null, assuming all backdoor paths are closed. However, mismeasured confounders can open backdoor paths (Robins and Hernan 2008).

Robins, James, and Miguel Hernan. 2008. “Estimation of the Causal Effects of Time-Varying Exposures.” Chapman & Hall/CRC Handbooks of Modern Statistical Methods, 553–99.

Example 2: Uncorrelated Non-Differential Errors “Off the Null” (True Effect Present)

Table 2 \mathcal{G}_2 illustrates uncorrelated non-differential measurement error when there is a true treatment effect. This bias, also called information bias (Lash, Fox, and Fink 2009), often attenuates the effect toward the null—but not always (Anne M. Jurek et al. 2005; Anne M. Jurek et al. 2006; Anne M. Jurek, Greenland, and Maldonado 2008).

Lash, Timothy L., Matthew P. Fox, and Aliza K. Fink. 2009. Applying Quantitative Bias Analysis to Epidemiologic Data. Springer.
Jurek, Anne M, Sander Greenland, George Maldonado, and Timothy R Church. 2005. “Proper Interpretation of Non-Differential Misclassification Effects: Expectations Vs Observations.” International Journal of Epidemiology 34 (3): 680–87.
Jurek, Anne M., Gustavo Maldonado, Sander Greenland, and Timothy R. Church. 2006. “Exposure-Measurement Error Is Frequently Ignored When Interpreting Epidemiologic Study Results.” European Journal of Epidemiology 21 (12): 871–76. https://doi.org/10.1007/s10654-006-9083-0.
Jurek, Anne M, Sander Greenland, and George Maldonado. 2008. “Brief Report: How Far from Non-Differential Does Exposure or Disease Misclassification Have to Be to Bias Measures of Association Away from the Null?” International Journal of Epidemiology 37 (2): 382–85.

Example: Same setting as above, but a real effect exists. Measurement error often underestimates it, but attenuation is not guaranteed. Mismeasured confounders may still open backdoor paths.

Example 3: Correlated Non-Differential (Undirected) Measurement Errors

Table 2 \mathcal{G}_3 arises when the error terms of the treatment and outcome share a common cause.

Example: Societies with advanced record-keeping produce more precise records of both big God beliefs and social complexity. This common cause creates a spurious association even in the absence of a true causal effect.

Example 4: Uncorrelated Differential Measurement Error — Exposure → Error in Outcome

Table 2 \mathcal{G}_4 occurs when the exposure influences how the outcome is measured.

Example: Big God beliefs lead to inflated historical records of social complexity, introducing bias even without a true causal effect.

Example 5: Uncorrelated Differential Measurement Error — Outcome → Error in Exposure

Table 2 \mathcal{G}_5 occurs when the outcome influences the measurement of the exposure.

Example: If social complexity shapes historical narratives, victors might record big God beliefs selectively to support political legitimacy.

Example 6: Correlated Differential Measurement Error

Table 2 \mathcal{G}_6 occurs when the exposure influences already correlated error terms.

Example: Social complexity fosters elites who glorify both political reach and big God beliefs, biasing both measures in a correlated fashion.

Summary

In Part 1, we examined independent, correlated, dependent, and correlated–dependent forms of measurement error bias. These structural features clarify why such biases threaten causal inference and often cannot be resolved with statistical adjustment alone (VanderWeele and Hernán 2012).

VanderWeele, Tyler J., and Miguel A. Hernán. 2012. “Results on Differential and Dependent Measurement Error of the Exposure and the Outcome Using Signed Directed Acyclic Graphs.” American Journal of Epidemiology 175 (12): 1303–10. https://doi.org/10.1093/aje/kwr458.

We return to measurement error in Part 4.


Part 2: Target Population Restriction Bias at the End of Study {#id-sec-2}

Suppose the analytic sample matches the target population at baseline. Attrition (right-censoring) may bias causal effect estimates by:
1) opening biasing pathways (distortion), or
2) restricting the analytic sample so it is no longer representative (restriction).

Selection-bias over timen Five examples of right-censoring bias.

Table 3

Example 1: Confounding by common cause of treatment and attrition

Table 3 \mathcal{G}_1 illustrates confounding by common cause of treatment and outcome in the censored such that the potential outcomes of the population at baseline Y(a) may differ from those of the censored population at the end of study Y'(a), so Y'(a) \neq Y(a). Suppose investigators ask whether religious service attendance affects volunteering, and an unmeasured variable (loyalty) affects attendance, attrition, and volunteering—opening a backdoor path.

We have encountered this bias before: the structure matches correlated measurement errors (Table 2 \mathcal{G}_3). Attrition may exacerbate measurement error bias by opening a path A \;\associationred\; U \;\associationred\; U_{\Delta A} \;\associationred\; Y'.

Example 2: Treatment affects censoring

Table 3 \mathcal{G}_2 illustrates bias in which the treatment affects the censoring process. Here, the treatment causally affects the outcome reporter but not the outcome itself.

Example: In a meditation trial with no true effect on well-being, Buddha-like detachment increases attrition and also changes how well-being is reported. This opens a path A \;\associationred\; U_{\Delta{A\to Y}} \;\associationred\; Y' (not confounding; no common cause of A and Y). Structurally, this is directed uncorrelated measurement error (Table 2 \mathcal{G}_4). Results risk distortion via end-of-study restriction.

Example 3: No treatment effect when outcome causes censoring

Table 3 \mathcal{G}_3 shows outcome-driven censoring under the sharp null. In theory, the ATE may remain unbiased, though the analytic sample is restricted. This corresponds to undirected uncorrelated measurement error (Table 2 \mathcal{G}_1). In practice the sharp-null assumption is untestable and rarely known in advance.

Example 4: Treatment effect when outcome causes censoring and a true effect exists

Table 3 \mathcal{G}_4 shows that if the outcome affects censoring in the presence of a true effect, bias arises (at least on one effect scale). This structure is equivalent to measurement error bias and can occur without confounding. See the worked example in Part 4.

Example 5: Treatment effect and effect-modifiers differ in censored group (restriction bias without confounding)

Table 3 \mathcal{G}_5 represents a setting with a true treatment effect, but the distribution of effect modifiers differs at study end. If missingness is at random and models are correctly specified, inverse probability weighting or multiple imputation can recover valid estimates (Cole and HernĂĄn 2008; Leyrat et al. 2021; Shiba and Kawahara 2021). If not (e.g., MNAR or model misspecification), causal estimation is compromised (Tchetgen Tchetgen and Wirth 2017; Malinsky, Shpitser, and Tchetgen Tchetgen 2022).

Cole, Stephen R, and Miguel A Hernán. 2008. “Constructing Inverse Probability Weights for Marginal Structural Models.” American Journal of Epidemiology 168 (6): 656–64.
Leyrat, ClĂ©mence, James R Carpenter, SĂ©bastien Bailly, and Elizabeth J Williamson. 2021. “Common Methods for Handling Missing Data in Marginal Structural Models: What Works and Why.” American Journal of Epidemiology 190 (4): 663–72.
Shiba, Koichiro, and Takuya Kawahara. 2021. “Using Propensity Scores for Causal Inference: Pitfalls and Tips.” Journal of Epidemiology 31 (8): 457–63.
Tchetgen Tchetgen, Eric J, and Kathleen E Wirth. 2017. “A General Instrumental Variable Framework for Regression Analysis with Outcome Missing Not at Random.” Biometrics 73 (4): 1123–31.
Malinsky, Daniel, Ilya Shpitser, and Eric J Tchetgen Tchetgen. 2022. “Semiparametric Inference for Nonmonotone Missing-Not-at-Random Data: The No Self-Censoring Model.” Journal of the American Statistical Association 117 (539): 1415–23.

Note that Table 3 \mathcal{G}_5 resembles Table 2 \mathcal{G}_2. Replacing unmeasured effect modifiers \circledotted{F} and U_{\Delta F} by \circledotted{U_Y} shows the link to uncorrelated independent measurement error ‘off the null’.

In this setting there may be a common cause of A and Y, and, additionally, the end-of-study analytic sample is an undesirable restriction of the target population: marginal effects differ between the restricted sample and the target (see Supplement S4 for a simulation). Hence results can be weird due to inappropriate restriction.

Summary

Right-censoring can bias effect estimates by changing the distribution of effect modifiers between baseline and study end. Investigators should ensure the end-of-study potential outcomes distribution aligns with the target population. Methods such as inverse probability weighting and multiple imputation can mitigate this bias (Bulbulia 2024c), subject to their assumptions.

———. 2024c. “A Practical Guide to Causal Inference in Three-Wave Panel Studies.” PsyArXiv Preprints, February. https://doi.org/10.31234/osf.io/uyg3d.

The take-home message: attrition is nearly inevitable; if unchecked, it yields weird results (wrongly estimated inferences due to inappropriate restriction and distortion). See Supplement S3 for a formal explanation and S4 for a simulation.

Part 3: Target Population Restriction Bias at the Start of Study

Target‐restriction bias occurs when the analytic sample at baseline differs from the target population in the distribution of confounders and/or treatment‐effect modifiers. Misalignment may arise if the source population does not match the target, or if study selection alters distributions. Alignment cannot generally be verified from data (see Supplement S3).

Collider‐Restriction Bias at Baseline

Table 4: Collider‐stratification bias at the start of a study (“M‐bias”)

In Table 4 \mathcal{G}_1, unmeasured health awareness (U_1) influences both activity (A) and participation (S=1), and unmeasured SES (U_2) influences both heart health (Y) and S=1. Conditioning on S=1 opens paths: - U_1: Over‐representation of active individuals, overstating benefits. - U_2: Confounding path via SES, inflating effect estimates.

Adjusting for U_1, U_2, or proxies can block these paths (Table 4 \mathcal{G}_2).

Restriction Bias Without Collider Stratification

Table 5: Selection bias “off the null” .

Example 1: WEIRD Sample, Non‐WEIRD Target

If effect modifiers differ between a WEIRD sample and general population (Table 5 \mathcal{G}_{1.1}), estimates may be biased without confounding . Structure matches Table 3 \mathcal{G}_5 and Table 2 \mathcal{G}_2. Known effect‐modifier distributions allow weighting , but mapping from restricted to target effects (f_W) is usually unknown .

Example 2: Overly Broad Sample for Narrow Target

If the target is a restricted stratum (e.g., NZ men > 40 without vasectomy) but sampling is broader (Table 5 \mathcal{G}_{2.1}), bias mirrors right‐censoring with effect modifiers. Correct restriction (Table 5 \mathcal{G}_{2.2}) aligns sample with target.

Example 3: Correlated Covariate/Outcome Measurement Error Across Strata

In cross‐cultural studies, correlated measurement errors in L and Y (Table 5 \mathcal{G}_{3.1}) can open biasing paths even if A is measured perfectly. Without local validation, pooling cultures risks contamination; restricting to cultures with reliable measures (Table 5 \mathcal{G}_{3.2}) is safer.

Example 4: Correlated Measurement Error in Effect Modifiers

With perfect A and Y, correlated errors in effect‐modifier measures (Table 5 \mathcal{G}_{4.1}) still prevent valid heterogeneity estimation or use of target weights. Best practice: restrict to settings with reliable effect‐modifier measurement (Table 5 \mathcal{G}_{4.2}) and report strata separately if errors differ.