Heterogeneity

Moderation (as Effect Modification: Conditional Average Treatment Effects)

Published

November 19, 2025

Note
Important

Key concepts

  • Causal estimand: The specific causal quantity of interest (e.g., the average effect in the population).
  • Statistical estimand: The quantity computed from data to approximate the causal estimand.
  • Interaction: The joint effect of two or more interventions.
  • Effect modification: When the effect of one intervention varies by baseline characteristics.
  • Heterogeneous treatment effects (HTE): The phenomenon that effects differ across individuals.
  • Conditional average treatment effect (CATE), \tau(x): The average effect for a subgroup with characteristics x.
  • Estimated CATE, \hat{\tau}(X): The empirical estimate of \tau(x) for a subgroup defined by X.

The fundamental problem of causal inference

Consider whether bilingualism improves cognitive ability:

Y_i^{a=1}: Cognitive ability of child i if bilingual. • Y_i^{a=0}: Cognitive ability of child i if monolingual.

The individual causal effect is: Y_i^{a=1} - Y_i^{a=0}.

If this difference \neq 0, bilingualism has an effect for i. However, we observe only one potential outcome per child—physics prevents observing both.

Although individual effects are unobservable, average treatment effects (ATE) can be identified under assumptions:

E(\delta) = E(Y^{a=1} - Y^{a=0}) = E(Y^{a=1}) - E(Y^{a=0}).

Identification assumptions**

Causal consistency:

The exposure levels compared correspond to well-defined interventions represented in the data.

Positivity:

Every exposure level occurs with positive probability in all covariate strata.

Exchangeability:

Conditional on measured covariates, exposure assignment is independent of potential outcomes.

We also assume::

Measurement error

Variables used to define exposures, outcomes, and confounders are measured without error or with error that does not induce bias after adjustment. Systematic or differential measurement error—especially in exposures or confounders—can bias effect estimates even when other assumptions hold.

Selection bias

The study sample is representative of the target population with respect to the causal effect of interest, or differences are accounted for through design or analysis. In longitudinal settings, attrition or loss to follow-up must not depend jointly on treatment and outcome in ways not captured by measured covariates; otherwise, the estimated effect may differ systematically from the true population effect.

Correctly Specified Model

This assumption is probably always violated in standard regression models

Basic counterfactual logic

Causal inference asks: What would happen under alternative interventions?

For example, test scores Y under:

  • a: old teaching method,
  • a^*: new teaching method.

The ATE:

\mathbb{E}[Y(a^*) - Y(a)]

is the average change in scores if the whole population switched from a to a^*.

Confounding—common causes of A and Y—must be addressed for valid inference.

Interaction vs. effect modification

Interaction: Joint effects of two or more interventions. Effect modification: Variation in the effect of a single intervention across subgroups.

Interaction

Let A = teaching method, B = tutoring, Y = test score. We compare: - Effect of both: \mathbb{E}[Y(1,1)] - \mathbb{E}[Y(0,0)] - Sum of individual effects: \big(\mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,0)]\big) + \big(\mathbb{E}[Y(0,1)] - \mathbb{E}[Y(0,0)]\big).

Additive interaction exists if:

\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)] \neq 0.

Positive values indicate synergy; negative values, antagonism. Identification requires controlling for all confounders of A \to Y (L) and B \to Y (Q), i.e., L \coprod Q.

Effect modification and \tau(x)

Effect modification examines whether the effect of a single intervention A on Y differs by baseline characteristics X. The conditional average treatment effect (CATE) is:

\tau(x) = \mathbb{E}[Y(1) - Y(0) \mid X = x],

the average effect for the subgroup with X = x. Comparing \tau(x) across x quantifies effect modification.

If X is categorical (G = sex), group-specific ATEs are: \delta_{g} = \mathbb{E}[Y(1) \mid G=g] - \mathbb{E}[Y(0) \mid G=g], and effect modification exists if \delta_{g_1} - \delta_{g_2} \neq 0. Identification requires adjusting for A \to Y confounders within each group.

Why psychology keeps confusing interactions with effect modification

Psychological papers typically fit linear models with multiplicative terms (e.g., Y ~ A * M) and call the coefficient on A:M “the interaction”. That coefficient is:

\beta_{AM} = \frac{\partial^2 \mathbb{E}[Y \mid A, M]}{\partial A \, \partial M}

It captures how the statistical mean function bends when we move both A and M. This is useful for prediction, but it is not automatically an effect modifier because:

  1. The cross-term blends confounding, scaling, and model misspecification. Unless the model is correctly specified and all confounders are solved, \beta_{AM} has no causal meaning.
  2. Even with perfect specification, \beta_{AM} is a property of the conditional mean, not of potential outcome contrasts like Y(1)-Y(0).
  3. Psychologists often interact treatment with post-treatment mediators, which cannot identify effect modification because the mediator is itself altered by A.

Effect modification is about contrasts of potential outcomes stratified (or smoothed) over baseline features X; it is a causal estimand. Interactions in a single regression equation rarely align with that estimand.

Rule of thumb

If you cannot write your estimand as \mathbb{E}[Y(1)-Y(0)\mid X] for baseline X, you are not estimating a CATE. You’re fitting a predictive surface with interaction terms.

Causal forests pivot directly to \tau(x)

Generalised random forests (the grf package) estimate \tau(x) without ever fitting a single interaction coefficient. Instead they:

  1. Build trees that split on baseline covariates X to expose treatment heterogeneity while enforcing honesty (separate subsamples for splits vs. estimation).
  2. Average leaf-level treatment contrasts to produce \hat{\tau}(x) for every unit.
  3. Provide diagnostics (variance estimates, forest weights) so we can aggregate \hat{\tau}(x) into interpretable summaries—RATE/Qini curves, policy trees, etc.

BIG implications:

  • We no longer guess which pairwise interactions to include. The forest discovers the nonlinear combination of age and baseline charity in our simulation.
  • \hat{\tau}(x) respects causal identification (we still need consistency/positivity/exchangeability) but does not require a parametric regression form.
  • CATE inference is local: we can report estimates (and uncertainty) for subgroups defined by the split structure rather than global \beta_{AM} coefficients.

HTE, CATE, and estimated \hat{\tau}(x) in practice

  • HTE means \tau(x) is not constant; psychology often assumes it is.
  • CATE, \tau(x) is the target: the average causal effect among units with baseline profile x.
  • Estimated CATE, \hat{\tau}(x) is what grf::causal_forest() returns for each observation; we aggregate these to describe how effects vary.

For unit i with profile X_i=x:

  • The individual effect Y_i(1)-Y_i(0) remains unobservable.
  • \hat{\tau}(x) is our best estimate of \tau(x), derived from nearby observations that share x-like features in the forest’s learned metric.

Because X is typically high-dimensional, manual interaction modeling is brittle. Causal forests—and the broader Margot workflow built around them—let us:

  1. Estimate \hat{\tau}(x) flexibly.
  2. Diagnose whether detected heterogeneity is actionable (RATE/Qini, policy value).
  3. Communicate effect modification in language policymakers understand (e.g., “Older, high-charity baseline participants benefit the most”).

This is the bridge between the theory of effect modification and the empirical machinery psychologists need to stop conflating statistical models results with causal moderation.

Concept Notation Definition Scope Requirements
Interaction Joint effect of multiple interventions compared with the sum of their separate effects Multiple interventions (A, B) Adjust for all confounders of A \to Y and B \to Y (L \coprod Q)
Effect modification \tau(x) varies with x Effect of a single intervention (A) differs by subgroup defined by X = x Single intervention Adjust for confounders of A \to Y within each subgroup
Estimated CATE \hat{\tau}(x) Model-based estimate of \tau(x) for X = x Prediction task Flexible estimation methods (e.g., causal forests)

Appendix: Identification of Interaction Effects

Figure 1: Diagram illustrating causal interaction. Assessing the joint effect of two interventions, A (e.g., teaching method) and B (e.g., tutoring), on outcome Y (e.g., test score). L represents confounders of the A-Y relationship, and Q represents confounders of the B-Y relationship. Red arrows indicate biasing backdoor paths requiring adjustment. Assumes A and B are decided independently here.

Figure 2 shows we need to condition on (adjust for) both L_0 and Q_0.

Figure 2: Identification of causal interaction requires adjusting for all confounders of A-Y (L) and B-Y (Q). Boxes around L and Q indicate conditioning, closing backdoor paths.
Figure 3: How shall we investigate effect modification of A on Y by G? Can you see the problem?

Thus ,it is essential to understand that when we control for confounding along the the A \to Y path, we do not identify the causal effects of effect-modifiers. Rather, we should consider effect-modifiers prognostic indicators. Moreover, we’re going to need to develop methods for clarifying prognostic indicators in multi-dimensional settings where

Group discussion

  • Where in your field have ‘interaction effects’ been over-interpreted as causal moderation? Find an example and diagnose which assumption (baseline covariate, post-treatment mediator, model form) was actually violated.
  • How would you explain to a collaborator the difference between \beta_{AM} in a regression and \tau(x) from a causal forest, without using equations: what metaphors would work in psychology?
  • Suppose CATE estimates show that a subgroup benefits less. What ethical or practical considerations arise before recommending differentiated treatment, and how will you communicate that uncertainty?

References

Hernan, M. A., and J. M. Robins. 2020. Causal Inference: What If? Chapman & Hall/CRC Monographs on Statistics & Applied Probab. Taylor & Francis. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.
VanderWeele, Tyler J. 2009. “On the Distinction Between Interaction and Effect Modification.” Epidemiology, 863–71.
VanderWeele, Tyler J., and James M. Robins. 2007. “Four types of effect modification: a classification based on directed acyclic graphs.” Epidemiology (Cambridge, Mass.) 18 (5): 561–68. https://doi.org/10.1097/EDE.0b013e318127181b.