Week 6: Effect Modification and CATE

Slides

Date: 1 Apr 2026

Required reading

  • Hernán & Robins (2025), chapters 4-5. link

Optional reading

Key concepts for assessment

  • Causal estimand versus statistical estimand
  • Interaction (joint interventions)
  • Effect modification (one intervention, subgroup contrasts)
  • CATE $\tau(x)$ and estimated CATE $\hat{\tau}(x)$
  • Why statistical interaction terms do not automatically imply causal effect modification

Week 5 defined the average treatment effect (ATE) and the assumptions required to estimate it from a well-defined intervention at a clear time zero. An average, though, can hide meaningful variation. This week extends the framework from "does the intervention work on average?" to "for whom does it work more, or less?"

The main difficulty this week is vocabulary. Psychology often uses "interaction", "moderation", "heterogeneity", and "personalised effects" as if they were interchangeable. They are not.

Seminar

Motivating example

A randomised exercise programme lowers blood pressure by 3 mmHg on average. That average can hide meaningful variation. Some participants improve a lot, while others barely change. If we only report the ATE, we can miss the information needed for treatment and policy decisions.

A simple map for this week

Keep these four ideas separate from the start.

Four ideas to keep separate

  • Interaction: the joint effect of two interventions.
  • Effect modification: variation in the effect of one intervention across subgroups.
  • Regression product term: a feature of a statistical model, such as $A \times G$.
  • CATE: the subgroup-level causal contrast, $\tau(x)$.

Rule of thumb

If you cannot write the estimand as $\mathbb{E}[Y(1) - Y(0) \mid X = x]$ for baseline $X$ measured at time zero, you are not estimating a CATE.

First distinction: interaction versus effect modification

Start with the scientific question, not the software output. If the design involves one intervention and subgroup contrasts, the question is about effect modification. If the design involves two interventions taken together, the question is about interaction.

Interaction

Interaction concerns two interventions, not one. Let $A$ and $B$ be interventions and let $Y$ be the outcome. On the additive scale, interaction is

$$ \mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)]. $$

If this contrast is non-zero, the joint effect is not additive on this scale.

Effect modification

Effect modification concerns one intervention across subgroups. For a subgroup variable $G$, effect modification exists when

$$ \mathbb{E}[Y(1) - Y(0) \mid G = g_1] \neq \mathbb{E}[Y(1) - Y(0) \mid G = g_2]. $$

This is still the effect of $A$ on $Y$. It is not the causal effect of $G$ on $Y$.

Scale matters

Interaction and effect-modification claims are scale-specific. A difference-scale result need not match a ratio-scale result.

Second distinction: causal modification versus model terms

This is where many psychology papers go wrong.

A regression interaction term ($A\times G$) is a model parameter. Causal effect modification is a property of potential-outcome contrasts under identification assumptions. A model term can be non-zero because of misspecification or bias, so it is not causal evidence by itself.

Pair exercise: interaction versus effect modification

  1. A study reports a "significant exercise-by-age interaction" in a regression of blood pressure on exercise, age, and their product term.
  2. State the causal estimand for interaction (hint: it requires four potential outcomes under joint interventions on exercise and age, which is conceptually odd because we cannot intervene on age).
  3. State the causal estimand for effect modification (hint: it involves one intervention on exercise, with subgroup contrasts across age groups).
  4. Which concept, interaction or effect modification, matches the study design? Give a reason the regression interaction term could be non-zero without any causal modification (e.g., model misspecification or collider bias).

CATE as the operational target

For a measured baseline profile $X=x$ defined at time zero,

$$ \tau(x) = \mathbb{E}[Y(1) - Y(0) \mid X = x]. $$

$\tau(x)$ is a subgroup average causal contrast. For person $i$, $\hat{\tau}(X_i)$ is an estimate of that subgroup contrast. $\hat{\tau}(X_i)$ is not the unobservable individual contrast $Y_i(1)-Y_i(0)$.

Personalised effects versus true individual effects

Students sometimes read $\hat{\tau}(X_i)$ as the effect of treatment on person $i$. It is not. Person $i$'s true effect, $Y_i(1) - Y_i(0)$, requires both potential outcomes, and we observe at most one. What $\hat{\tau}(X_i)$ estimates is the average effect across all people who share person $i$'s measured profile $X_i$. The estimate is "personalised" in the sense that it uses person $i$'s covariates, but it remains a subgroup average. Two people with identical $X_i$ can have different true effects if they differ on unmeasured variables.

When the literature refers to "individualised treatment effects," the intended meaning is almost always $\hat{\tau}(X_i)$: an estimated subgroup average, not the unknowable individual contrast.

Identification reminders

Week 4's graph logic still matters here. Week 5's design logic still matters too: the treatment must remain well-defined, covariates must precede treatment, and subgroup contrasts are causal only if the same identification conditions still hold. Effect-modification questions are still causal questions, so confounding does not disappear just because we are now interested in subgroup differences.

For interaction with two interventions, we need identification of the joint intervention contrast. A common condition is conditional exchangeability for joint treatment assignment:

$$ Y(a, b) \coprod (A, B) \mid L. $$

Here $L$ must block all relevant backdoor paths from $A$ and $B$ to $Y$.

Diagram illustrating causal interaction. Assessing the joint effect of two interventions, A (e.g., teaching method) and B (e.g., tutoring), on outcome Y (e.g., test score). L_A represents confounders of the A-Y relationship, and L_B represents confounders of the B-Y relationship. Red arrows indicate biasing backdoor paths requiring adjustment.

Identification of causal interaction requires adjusting for all confounders of A-Y (L_A) and B-Y (L_B). Boxes around L_A and L_B indicate conditioning, closing backdoor paths.

For effect modification of $A$ by $G$, we still need valid control of confounding for the $A \to Y$ relation, typically within strata of $G$.

How shall we investigate effect modification of A on Y by G? Can you see the problem?

For a larger handout version of these effect-modification graphs, see Effect modification using causal graphs.

Effect modification by proxy

A variable can modify the treatment effect without directly causing the outcome. In the graph below, $Z$ is the direct effect modifier (open circle: it changes the size of $A$'s effect on $Y$). $G$ inherits this modification through its association with $Z$.

Effect modification by proxy: $G$ modifies $A$'s effect on $Y$ through its relationship to the direct effect modifier $Z$. Open circle denotes effect modification, not a standard causal arrow.

Whether $G$ remains an effect modifier depends on what else is in the model. If investigators condition on $Z$, then $G$ becomes independent of $Y$ and is no longer an effect modifier. Effect modification is relative to the adjustment set, not an intrinsic property of $G$ (VanderWeele & Robins (2007); VanderWeele (2012)).

d-separation does not imply absence of effect modification

The graph below poses a subtler problem. To identify the effect of $A$ on $Y$, we condition on $L$. But $G$ causes $L$, and conditioning on $L$ d-separates $G$ from $Y$. Does this mean $G$ is not an effect modifier?

d-separation $\neq$ no effect modification: $G$ is d-separated from $Y$ conditional on $L$, yet $G$ can still modify the effect of $A$ on $Y$ because $G$ shifts the distribution of $L$.

No. Even when $G \perp!!!\perp Y \mid L$, the CATE for a given level of $G$ is a weighted average of the $L$-specific treatment effects, where the weights come from the distribution of $L$ given $G$:

$$ \tau(g) = \mathbb{E}\left[\mathbb{E}[Y(1) - Y(0) \mid L] \middle| G = g\right]. $$

Two conditions are sufficient for effect modification by $G$. First, the effect of $A$ on $Y$ varies across levels of $L$. Second, the distribution of $L$ differs across levels of $G$ (which it does, because $G \to L$). When both hold, $\tau(g)$ varies with $g$. Effect modification by $G$ is present even though $G$ has no direct structural path to $Y$ after conditioning.

This result has practical consequences. Investigators who equate d-separation with absence of effect modification will miss genuine heterogeneity. A non-significant regression interaction term between $A$ and $G$, after adjusting for $L$, does not prove that $G$ is irrelevant to treatment targeting. The CATE can still vary by $G$ because $G$ shifts the covariate distribution over which the $L$-specific effects are averaged.

Two rules of thumb

  • A variable can modify the treatment effect even if it has no direct arrow to the outcome in the adjusted DAG.
  • Whether a variable is an effect modifier depends on what other variables are in the conditioning set. Effect modification is relative, not absolute.

Pair exercise: why conditioning changes effect modification

  1. An exercise programme ($A$) targets blood pressure ($Y$). Age ($G$) affects fitness ($L$), and $L$ affects $Y$. There is no direct $G \to Y$ path.
  2. Explain to your partner why the CATE still varies by age, even without a direct $G \to Y$ arrow (hint: the distribution of $L$ differs across age groups).
  3. A colleague fits a regression with an $A \times G$ interaction term and finds it non-significant. They conclude "age does not modify the treatment effect." Evaluate this conclusion.
  4. Describe a scenario where two apparent effect modifiers ($G_1$ and $G_2$) both show significant CATE variation individually, but the variation disappears when you condition on both simultaneously.

Why flexible estimators matter

With many covariates, hand-built interaction models are fragile for four reasons. First, the number of possible interactions grows combinatorially: $k$ covariates generate $\binom{k}{2}$ pairwise interactions and far more higher-order terms. Second, each interaction subgroup contains fewer observations, so estimates become noisy. Third, searching across many interactions inflates false-positive rates unless corrected. Fourth, the analyst must specify the functional form in advance, and real treatment-response surfaces are rarely linear.

Flexible estimators such as causal forests learn the heterogeneity surface from data. They can recover non-linear and high-dimensional patterns without requiring the analyst to guess the correct specification. These estimators help with functional form, but they do not remove confounding by design.

Demo: functional form matters

This simulation has randomised treatment.

There is no confounding.

The challenge is functional form.

# install once
# remotes::install_github("go-bayes/causalworkshop@v0.2.1")
library(causalworkshop)

# simulate data with non-linear heterogeneous effects
d <- simulate_nonlinear_data(n = 2000, seed = 2026)

# compare four estimation methods
results <- compare_ate_methods(d)

# summary table: ATE and individual-level RMSE
results$summary

# plot: estimated vs true treatment effects
results$plot_comparison

# plot: estimated effect as a function of x1
results$plot_by_x1

All methods recover the ATE in this simulation. They differ in how well they recover heterogeneity.

Return to the opening example

Back to exercise and blood pressure, the ATE tells us whether the programme helps on average. The CATE tells us where gains are concentrated. For policy and clinical decisions, we usually need both.

After the mid-trimester break and Test 1 (Week 7), Week 8 introduces machine-learning methods that estimate these subgroup contrasts in high dimensions, without requiring the analyst to specify the functional form in advance.

Pair exercise: from average to subgroup

  1. An exercise programme has ATE = 3 mmHg reduction in blood pressure.
  2. Construct a scenario where the conditional average treatment effect (CATE) is 8 mmHg for one subgroup and $-2$ mmHg for another, consistent with this ATE (specify group sizes).
  3. Explain what a policy-maker reading only the ATE is missing.
  4. Your partner claims "$\hat{\tau}(X_i) = 8$ means the programme will reduce my blood pressure by 8 mmHg." Correct this claim using the distinction between estimated subgroup averages and unobservable individual effects.

Lab materials: Lab 6: CATE and Effect Modification


Appendix A: additive interaction simplification

Starting from

$$ \begin{aligned} \big(\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(0,0)]\big)

  • \big(\mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,0)]\big)
  • \big(\mathbb{E}[Y(0,1)] - \mathbb{E}[Y(0,0)]\big) \end{aligned} $$

we collect terms to obtain

$$ \mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)]. $$

Hernán, M. A., & Robins, J. M. (2025). Causal inference: What if. Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/

VanderWeele, T. J. (2009). On the distinction between interaction and effect modification. Epidemiology, 863–871.

VanderWeele, T. J. (2012). Confounding and Effect Modification: Distribution and Measure. Epidemiologic Methods, 1(1), 55–82. https://doi.org/10.1515/2161-962X.1004

VanderWeele, T. J., & Robins, J. M. (2007). Four types of effect modification: a classification based on directed acyclic graphs. Epidemiology (Cambridge, Mass.), 18(5), 561–568. https://doi.org/10.1097/EDE.0b013e318127181b