Effect Modification and CATE

Motivating example

A randomised exercise programme lowers blood pressure by 3 mmHg on average. That average hides meaningful variation: older participants with high baseline blood pressure improve substantially, while young normotensive participants show no change.

If we only report the ATE, we miss information needed for targeting treatment.

Two distinctions this week

	What varies	How many interventions
Interaction	The joint effect of two interventions	Two (A and B)
Effect modification	The effect of one intervention across subgroups	One (A), stratified by G

These concepts use different estimands, different identification conditions, and different DAGs.

Interaction

Interaction concerns two interventions (A and B) acting on the same outcome. On the additive scale:

\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)].

If this contrast is non-zero, the joint effect is not additive.

Effect modification

Effect modification concerns one intervention (A) across subgroups defined by a baseline variable G:

\mathbb{E}[Y(1) - Y(0) \mid G = g_1] \neq \mathbb{E}[Y(1) - Y(0) \mid G = g_2].

This is still the causal effect of A. It is not the causal effect of G.

Scale matters

Interaction and effect-modification claims are scale-specific. A non-zero additive interaction does not imply non-zero multiplicative interaction, and vice versa.

Scale	Contrast
Additive (risk difference)	\text{RD}_{g_1} - \text{RD}_{g_2}
Multiplicative (risk ratio)	\text{RR}_{g_1} / \text{RR}_{g_2}

Causal modification versus model terms

A regression interaction term (A \times G) is a model parameter. Causal effect modification is a property of potential-outcome contrasts under identification assumptions.

A non-zero interaction term can arise from misspecification or bias, not just genuine effect heterogeneity.

CATE as the operational target

For a measured baseline profile X = x, the conditional average treatment effect is:

\tau(x) = \mathbb{E}[Y(1) - Y(0) \mid X = x].

Symbol	Meaning
\tau(x)	Subgroup average causal contrast
\hat{\tau}(X_i)	An estimate of that subgroup contrast

\hat{\tau}(X_i) is not the unobservable individual contrast Y_i(1) - Y_i(0).

Identification for interaction

For interaction with two interventions, we need identification of the joint intervention contrast:

Y(a,b) \coprod (A,B) \mid L.

L must block all relevant backdoor paths from both A and B to Y.

Identification for effect modification

For effect modification of A by G, we need valid confounding control for the A \to Y relation within strata of G. We do not need to identify the causal effect of G.

Why flexible estimators matter

With many covariates, hand-built interaction models are fragile. Flexible estimators such as causal forests can recover complex heterogeneity patterns when identification assumptions are plausible.

These estimators address functional form. They do not remove confounding by design.

Demo: functional form matters

library(causalworkshop)

# simulate data with non-linear heterogeneous effects
d <- simulate_nonlinear_data(n = 2000, seed = 2026)

# compare four estimation methods
results <- compare_ate_methods(d)

# summary table and comparison plot
results$summary
results$plot_comparison

All methods recover the ATE. They differ in how well they recover heterogeneity.

Return to the opening example

Back to exercise and blood pressure.

Estimand	What it tells us
ATE	Whether the programme helps on average
CATE	Where gains are concentrated
Policy decision	Who should receive the programme

For treatment and policy decisions, we usually need both.

Effect modification by proxy

G modifies the effect of A on Y by proxy: G’s relationship to Z (a direct effect modifier) is what drives the heterogeneity. Open circle: modifies the effect without being conditioned on.

d-separation \neq no effect modification

G \perp\!\!\!\perp Y \mid \boxed{L}, yet G can still modify the effect of A on Y. Because G shifts the distribution of L, and the treatment effect varies across levels of L, the CATE \tau(g) changes with g.

Readings

Required and optional readings for each week are listed on the course readings page.

Appendix: additive interaction simplification

Starting from the full joint contrast:

\big(\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(0,0)]\big) - \big(\mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,0)]\big) - \big(\mathbb{E}[Y(0,1)] - \mathbb{E}[Y(0,0)]\big),

we collect terms to obtain:

\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)].

VanderWeele, Tyler J. 2012. “Confounding and Effect Modification: Distribution and Measure.” Epidemiologic Methods 1 (1): 55–82. https://doi.org/10.1515/2161-962X.1004.

VanderWeele, Tyler J., and James M. Robins. 2007. “Four types of effect modification: a classification based on directed acyclic graphs.” Epidemiology (Cambridge, Mass.) 18 (5): 561–68. https://doi.org/10.1097/EDE.0b013e318127181b.