Heterogeneity

Effect Modification, Interaction, and Conditional Average Treatment Effects

Published

August 12, 2025

Note
Important

Key concepts

  • Causal estimand: The specific causal quantity of interest (e.g., the average effect in the population).
  • Statistical estimand: The quantity computed from data to approximate the causal estimand.
  • Interaction: The joint effect of two or more interventions.
  • Effect modification: When the effect of one intervention varies by baseline characteristics.
  • Heterogeneous treatment effects (HTE): The phenomenon that effects differ across individuals.
  • Conditional average treatment effect (CATE), \tau(x): The average effect for a subgroup with characteristics x.
  • Estimated CATE, \hat{\tau}(X): The empirical estimate of \tau(x) for a subgroup defined by X.

The fundamental problem of causal inference

Consider whether bilingualism improves cognitive ability:

• Y_i^{a=1}: Cognitive ability of child i if bilingual. • Y_i^{a=0}: Cognitive ability of child i if monolingual.

The individual causal effect is: Y_i^{a=1} - Y_i^{a=0}.

If this difference \neq 0, bilingualism has an effect for i. However, we observe only one potential outcome per child—physics prevents observing both.

Although individual effects are unobservable, average treatment effects (ATE) can be identified under assumptions:

E(\delta) = E(Y^{a=1} - Y^{a=0}) = E(Y^{a=1}) - E(Y^{a=0}).

Identification assumptions**

Causal consistency:

The exposure levels compared correspond to well-defined interventions represented in the data.

Positivity:

Every exposure level occurs with positive probability in all covariate strata.

Exchangeability:

Conditional on measured covariates, exposure assignment is independent of potential outcomes.

We also assume::

Measurement error

Variables used to define exposures, outcomes, and confounders are measured without error or with error that does not induce bias after adjustment. Systematic or differential measurement error—especially in exposures or confounders—can bias effect estimates even when other assumptions hold.

Selection bias

The study sample is representative of the target population with respect to the causal effect of interest, or differences are accounted for through design or analysis. In longitudinal settings, attrition or loss to follow-up must not depend jointly on treatment and outcome in ways not captured by measured covariates; otherwise, the estimated effect may differ systematically from the true population effect.

Basic counterfactual logic

Causal inference asks: What would happen under alternative interventions?

For example, test scores Y under:

  • a: old teaching method,
  • a^*: new teaching method.

The ATE:

\mathbb{E}[Y(a^) - Y(a)]

is the average change in scores if the whole population switched from a to a^.

Confounding—common causes of A and Y—must be addressed for valid inference.

⸻

Interaction vs. effect modification

Interaction: Joint effects of two or more interventions. Effect modification: Variation in the effect of a single intervention across subgroups.

Interaction

Let A = teaching method, B = tutoring, Y = test score. We compare: • Effect of both: \mathbb{E}[Y(1,1)] - \mathbb{E}[Y(0,0)] • Sum of individual effects: \big(\mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,0)]\big) + \big(\mathbb{E}[Y(0,1)] - \mathbb{E}[Y(0,0)]\big).

Additive interaction exists if:

\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)] \neq 0. Positive values indicate synergy; negative values, antagonism. Identification requires controlling for all confounders of A \to Y (L) and B \to Y (Q), i.e., L \coprod Q.

⸻

Effect modification and \tau(x)

Effect modification examines whether the effect of a single intervention A on Y differs by baseline characteristics X. The conditional average treatment effect (CATE) is: \tau(x) = \mathbb{E}[Y(1) - Y(0) \mid X = x], the average effect for the subgroup with X = x. Comparing \tau(x) across x quantifies effect modification.

If X is categorical (G = sex), group-specific ATEs are: \delta_{g} = \mathbb{E}[Y(1) \mid G=g] - \mathbb{E}[Y(0) \mid G=g], and effect modification exists if \delta_{g_1} - \delta_{g_2} \neq 0. Identification requires adjusting for A \to Y confounders within each group.

⸻

HTE, CATE, and estimated \hat{\tau}(x)

  • HTE: The fact that \tau(x) varies with x.
  • CATE, \tau(x): The true subgroup-average effect.
  • Estimated CATE, \hat{\tau}(x): The model-based prediction for a subgroup with features x.

For unit i with profile X_i = x:

  • Unobservable individual effect: Y_i(1) - Y_i(0).
  • \hat{\tau}(x): Our estimate of \tau(x), the average effect for all units with profile x.

High-dimensional X makes manual interaction modeling impractical. Modern ML methods such as causal forests (grf in R) (Tibshirani et al. 2024) are designed to estimate \hat{\tau}(x) flexibly.

Concept Notation Definition Scope Requirements
Interaction – Joint effect of multiple interventions compared with the sum of their separate effects Multiple interventions (A, B) Adjust for all confounders of A \to Y and B \to Y (L \coprod Q)
Effect modification \tau(x) varies with x Effect of a single intervention (A) differs by subgroup defined by X = x Single intervention Adjust for confounders of A \to Y within each subgroup
Estimated CATE \hat{\tau}(x) Model-based estimate of \tau(x) for X = x Prediction task Flexible estimation methods (e.g., causal forests)

Appendix: Identification of Interaction Effects

Figure 1: Diagram illustrating causal interaction. Assessing the joint effect of two interventions, A (e.g., teaching method) and B (e.g., tutoring), on outcome Y (e.g., test score). L represents confounders of the A-Y relationship, and Q represents confounders of the B-Y relationship. Red arrows indicate biasing backdoor paths requiring adjustment. Assumes A and B are decided independently here.

Figure 2 shows we need to condition on (adjust for) both L_0 and Q_0.

Figure 2: Identification of causal interaction requires adjusting for all confounders of A-Y (L) and B-Y (Q). Boxes around L and Q indicate conditioning, closing backdoor paths.
Figure 3: How shall we investigate effect modification of A on Y by G? Can you see the problem?

Thus ,it is essential to understand that when we control for confounding along the the A \to Y path, we do not identify the causal effects of effect-modifiers. Rather, we should consider effect-modifiers prognostic indicators. Moreover, we’re going to need to develop methods for clarifying prognostic indicators in multi-dimensional settings where

© 2025 Joseph Bulbulia. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

References

Hernan, M. A., and J. M. Robins. 2020. Causal Inference: What If? Chapman & Hall/CRC Monographs on Statistics & Applied Probab. Taylor & Francis. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.
Tibshirani, Julie, Susan Athey, Erik Sverdrup, and Stefan Wager. 2024. Grf: Generalized Random Forests. https://github.com/grf-labs/grf.
VanderWeele, Tyler J. 2009. “On the Distinction Between Interaction and Effect Modification.” Epidemiology, 863–71.
VanderWeele, Tyler J., and James M. Robins. 2007. “Four types of effect modification: a classification based on directed acyclic graphs.” Epidemiology (Cambridge, Mass.) 18 (5): 561–68. https://doi.org/10.1097/EDE.0b013e318127181b.