Heterogeneity

Effect Modification, Interaction, and Conditional Average Treatment Effects

Published

August 12, 2025

Note

Required Reading

(Hernan and Robins 2020) Chapters 4-5 link

Optional Reading

(Tyler J. VanderWeele and Robins 2007) link
(Tyler J. VanderWeele 2009) link

Important

Key concepts

Causal estimand: The specific causal quantity of interest (e.g., the average effect in the population).
Statistical estimand: The quantity computed from data to approximate the causal estimand.
Interaction: The joint effect of two or more interventions.
Effect modification: When the effect of one intervention varies by baseline characteristics.
Heterogeneous treatment effects (HTE): The phenomenon that effects differ across individuals.
Conditional average treatment effect (CATE), \tau(x): The average effect for a subgroup with characteristics x.
Estimated CATE, \hat{\tau}(X): The empirical estimate of \tau(x) for a subgroup defined by X.

The fundamental problem of causal inference

Consider whether bilingualism improves cognitive ability:

• Y_i^{a=1}: Cognitive ability of child i if bilingual. • Y_i^{a=0}: Cognitive ability of child i if monolingual.

The individual causal effect is: Y_i^{a=1} - Y_i^{a=0}.

If this difference \neq 0, bilingualism has an effect for i. However, we observe only one potential outcome per child—physics prevents observing both.

Although individual effects are unobservable, average treatment effects (ATE) can be identified under assumptions:

E(\delta) = E(Y^{a=1} - Y^{a=0}) = E(Y^{a=1}) - E(Y^{a=0}).

Identification assumptions**

Causal consistency:

The exposure levels compared correspond to well-defined interventions represented in the data.

Positivity:

Every exposure level occurs with positive probability in all covariate strata.

Exchangeability:

Conditional on measured covariates, exposure assignment is independent of potential outcomes.

We also assume::

Measurement error

Variables used to define exposures, outcomes, and confounders are measured without error or with error that does not induce bias after adjustment. Systematic or differential measurement error—especially in exposures or confounders—can bias effect estimates even when other assumptions hold.

Selection bias

The study sample is representative of the target population with respect to the causal effect of interest, or differences are accounted for through design or analysis. In longitudinal settings, attrition or loss to follow-up must not depend jointly on treatment and outcome in ways not captured by measured covariates; otherwise, the estimated effect may differ systematically from the true population effect.

Basic counterfactual logic

Causal inference asks: What would happen under alternative interventions?

For example, test scores Y under:

a: old teaching method,
a^*: new teaching method.

The ATE:

\mathbb{E}[Y(a^) - Y(a)]

is the average change in scores if the whole population switched from a to a^.

Confounding—common causes of A and Y—must be addressed for valid inference.

⸻

Interaction vs. effect modification

Interaction: Joint effects of two or more interventions. Effect modification: Variation in the effect of a single intervention across subgroups.

Interaction

Let A = teaching method, B = tutoring, Y = test score. We compare: • Effect of both: \mathbb{E}[Y(1,1)] - \mathbb{E}[Y(0,0)] • Sum of individual effects: \big(\mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,0)]\big) + \big(\mathbb{E}[Y(0,1)] - \mathbb{E}[Y(0,0)]\big).

Additive interaction exists if:

\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)] \neq 0. Positive values indicate synergy; negative values, antagonism. Identification requires controlling for all confounders of A \to Y (L) and B \to Y (Q), i.e., L \coprod Q.

⸻

Effect modification and \tau(x)

Effect modification examines whether the effect of a single intervention A on Y differs by baseline characteristics X. The conditional average treatment effect (CATE) is: \tau(x) = \mathbb{E}[Y(1) - Y(0) \mid X = x], the average effect for the subgroup with X = x. Comparing \tau(x) across x quantifies effect modification.

If X is categorical (G = sex), group-specific ATEs are: \delta_{g} = \mathbb{E}[Y(1) \mid G=g] - \mathbb{E}[Y(0) \mid G=g], and effect modification exists if \delta_{g_1} - \delta_{g_2} \neq 0. Identification requires adjusting for A \to Y confounders within each group.

⸻

HTE, CATE, and estimated \hat{\tau}(x)

HTE: The fact that \tau(x) varies with x.
CATE, \tau(x): The true subgroup-average effect.
Estimated CATE, \hat{\tau}(x): The model-based prediction for a subgroup with features x.

For unit i with profile X_i = x:

Unobservable individual effect: Y_i(1) - Y_i(0).
\hat{\tau}(x): Our estimate of \tau(x), the average effect for all units with profile x.

High-dimensional X makes manual interaction modeling impractical. Modern ML methods such as causal forests (grf in R) (Tibshirani et al. 2024) are designed to estimate \hat{\tau}(x) flexibly.

Concept	Notation	Definition	Scope	Requirements
Interaction	–	Joint effect of multiple interventions compared with the sum of their separate effects	Multiple interventions (A, B)	Adjust for all confounders of A \to Y and B \to Y (L \coprod Q)
Effect modification	\tau(x) varies with x	Effect of a single intervention (A) differs by subgroup defined by X = x	Single intervention	Adjust for confounders of A \to Y within each subgroup
Estimated CATE	\hat{\tau}(x)	Model-based estimate of \tau(x) for X = x	Prediction task	Flexible estimation methods (e.g., causal forests)

Appendix: Identification of Interaction Effects

Figure 1: Diagram illustrating causal interaction. Assessing the joint effect of two interventions, A (e.g., teaching method) and B (e.g., tutoring), on outcome Y (e.g., test score). L represents confounders of the A-Y relationship, and Q represents confounders of the B-Y relationship. Red arrows indicate biasing backdoor paths requiring adjustment. Assumes A and B are decided independently here.

Figure 2 shows we need to condition on (adjust for) both L_0 and Q_0.

Figure 2: Identification of causal interaction requires adjusting for all confounders of A-Y (L) and B-Y (Q). Boxes around L and Q indicate conditioning, closing backdoor paths.

Figure 3: How shall we investigate effect modification of A on Y by G? Can you see the problem?

Thus ,it is essential to understand that when we control for confounding along the the A \to Y path, we do not identify the causal effects of effect-modifiers. Rather, we should consider effect-modifiers prognostic indicators. Moreover, we’re going to need to develop methods for clarifying prognostic indicators in multi-dimensional settings where

References

Hernan, M. A., and J. M. Robins. 2020. Causal Inference: What If? Chapman & Hall/CRC Monographs on Statistics & Applied Probab. Taylor & Francis. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.

Tibshirani, Julie, Susan Athey, Erik Sverdrup, and Stefan Wager. 2024. Grf: Generalized Random Forests. https://github.com/grf-labs/grf.

VanderWeele, Tyler J. 2009. “On the Distinction Between Interaction and Effect Modification.” Epidemiology, 863–71.

VanderWeele, Tyler J., and James M. Robins. 2007. “Four types of effect modification: a classification based on directed acyclic graphs.” Epidemiology (Cambridge, Mass.) 18 (5): 561–68. https://doi.org/10.1097/EDE.0b013e318127181b.

--- title: "Heterogeneity" subtitle: "Effect Modification, Interaction, and Conditional Average Treatment Effects" date: "2025-AUG-12" bibliography: /Users/joseph/GIT/templates/bib/references.bib editor_options: chunk_output_type: console format: html: warnings: FALSE error: FALSE messages: FALSE code-overflow: scroll highlight-style: kate code-tools: source: true toggle: FALSE html-math-method: katex cap-location: margin --- ```{=html} <style> .boxedblue { border: 2px solid blue; border-radius: 4px; /* slight roundness */ padding: 3px 6px; /* vertical | horizontal */ display: inline-block; color: blue; } .circleblue { border: 2px dashed blue; border-radius: 50%; padding: 3px 6px; display: inline-block; color: blue; } </style> ``` ```{r setup, include=FALSE} #| include: false # libraries and functions (ensure these are accessible) # library("tinytex") # library(extrafont) # loadfonts(device = "win") # Adjust device based on OS if using extrafont # source("/Users/joseph/GIT/templates/functions/funs.R") ``` ::: {.callout-note} **Required Reading** - [@hernan2024WHATIF] Chapters 4-5 [link](https://www.dropbox.com/scl/fi/9hy6xw1g1o4yz94ip8cvd/hernanrobins_WhatIf_2jan24.pdf?rlkey=8eaw6lqhmes7ddepuriwk5xk9&dl=0) **Optional Reading** - [@vanderweele2007FOURTYPESOFEFFECT] [link](https://www.dropbox.com/scl/fi/drytp2ui2b8o9jplh4bm9/four_types_of_effect_modification__a.6.pdf?rlkey=mb9nl599v93m6kyyo69iv5nz1&dl=0) - [@vanderweele2009distinction] [link](https://www.dropbox.com/scl/fi/srpynr0dvjcndveplcydn/OutcomeWide_StatisticalScience.pdf?rlkey=h4fv32oyjegdfl3jq9u1fifc3&dl=0) ::: ::: {.callout-important} **Key concepts** - Causal estimand: The specific causal quantity of interest (e.g., the average effect in the population). - Statistical estimand: The quantity computed from data to approximate the causal estimand. - Interaction: The joint effect of two or more interventions. - Effect modification: When the effect of one intervention varies by baseline characteristics. - Heterogeneous treatment effects (HTE): The phenomenon that effects differ across individuals. - Conditional average treatment effect (CATE), $\tau(x)$: The average effect for a subgroup with characteristics $x$. - Estimated CATE, $\hat{\tau}(X)$: The empirical estimate of $\tau(x)$ for a subgroup defined by $X$. ::: ### The fundamental problem of causal inference Consider whether bilingualism improves cognitive ability: • $Y_i^{a=1}$: Cognitive ability of child $i$ if bilingual. • $Y_i^{a=0}$: Cognitive ability of child $i$ if monolingual. The individual causal effect is: $$ Y_i^{a=1} - Y_i^{a=0}. $$ If this difference $\neq 0$, bilingualism has an effect for $i$. However, we observe only one potential outcome per child—physics prevents observing both. Although individual effects are unobservable, average treatment effects (ATE) can be identified under assumptions: $$ E(\delta) = E(Y^{a=1} - Y^{a=0}) = E(Y^{a=1}) - E(Y^{a=0}). $$ ### Identification assumptions** #### Causal consistency: The exposure levels compared correspond to well-defined interventions represented in the data. ### Positivity: Every exposure level occurs with positive probability in all covariate strata. ### Exchangeability: Conditional on measured covariates, exposure assignment is independent of potential outcomes. We also assume:: #### Measurement error Variables used to define exposures, outcomes, and confounders are measured without error or with error that does not induce bias after adjustment. Systematic or differential measurement error—especially in exposures or confounders—can bias effect estimates even when other assumptions hold. #### Selection bias The study sample is representative of the target population with respect to the causal effect of interest, or differences are accounted for through design or analysis. In longitudinal settings, attrition or loss to follow-up must not depend jointly on treatment and outcome in ways not captured by measured covariates; otherwise, the estimated effect may differ systematically from the true population effect. ### Basic counterfactual logic **Causal inference asks: What would happen under alternative interventions?** For example, test scores $Y$ under: - $a$: old teaching method, - $a^*$: new teaching method. The ATE: $$ \mathbb{E}[Y(a^) - Y(a)] $$ is the average change in scores if the whole population switched from $a$ to $a^$. Confounding—common causes of $A$ and $Y$—must be addressed for valid inference. ⸻ ### Interaction vs. effect modification Interaction: Joint effects of two or more interventions. Effect modification: Variation in the effect of a single intervention across subgroups. #### Interaction Let $A$ = teaching method, $B$ = tutoring, $Y$ = test score. We compare: • Effect of both: $\mathbb{E}[Y(1,1)] - \mathbb{E}[Y(0,0)]$ • Sum of individual effects: $\big(\mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,0)]\big) + \big(\mathbb{E}[Y(0,1)] - \mathbb{E}[Y(0,0)]\big)$. #### Additive interaction exists if: $$ \mathbb{E}[Y(1,1)] - \mathbb{E}[Y(1,0)] - \mathbb{E}[Y(0,1)] + \mathbb{E}[Y(0,0)] \neq 0. $$ Positive values indicate synergy; negative values, antagonism. Identification requires controlling for all confounders of $A \to Y$ ($L$) and $B \to Y$ ($Q$), i.e., $L \coprod Q$. ⸻ #### Effect modification and $\tau(x)$ Effect modification examines whether the effect of a single intervention $A$ on $Y$ differs by baseline characteristics $X$. The conditional average treatment effect (CATE) is: $$ \tau(x) = \mathbb{E}[Y(1) - Y(0) \mid X = x], $$ the average effect for the subgroup with $X = x$. Comparing $\tau(x)$ across $x$ quantifies effect modification. If $X$ is categorical ($G$ = sex), group-specific ATEs are: $$ \delta_{g} = \mathbb{E}[Y(1) \mid G=g] - \mathbb{E}[Y(0) \mid G=g], $$ and effect modification exists if $\delta_{g_1} - \delta_{g_2} \neq 0$. Identification requires adjusting for $A \to Y$ confounders within each group. ⸻ ## HTE, CATE, and estimated $\hat{\tau}(x)$ - HTE: The fact that $\tau(x)$ varies with $x$. - CATE, $\tau(x)$: The true subgroup-average effect. - Estimated CATE, $\hat{\tau}(x)$: The model-based prediction for a subgroup with features $x$. For unit $i$ with profile $X_i = x$: - Unobservable individual effect: $Y_i(1) - Y_i(0)$. - $\hat{\tau}(x)$: Our estimate of $\tau(x)$, the average effect for all units with profile $x$. High-dimensional $X$ makes manual interaction modeling **impractical.** Modern ML methods such as causal forests (grf in R) [@grf2024] are designed to estimate $\hat{\tau}(x)$ flexibly. | Concept | Notation | Definition | Scope | Requirements | |---------|----------|------------|-------|--------------| | **Interaction** | – | Joint effect of multiple interventions compared with the sum of their separate effects | Multiple interventions ($A$, $B$) | Adjust for all confounders of $A \to Y$ and $B \to Y$ ($L \coprod Q$) | | **Effect modification** | $\tau(x)$ varies with $x$ | Effect of a single intervention ($A$) differs by subgroup defined by $X = x$ | Single intervention | Adjust for confounders of $A \to Y$ within each subgroup | | **Estimated CATE** | $\hat{\tau}(x)$ | Model-based estimate of $\tau(x)$ for $X = x$ | Prediction task | Flexible estimation methods (e.g., causal forests) | ## Appendix: Identification of Interaction Effects ```{tikz} #| label: fig-dag-interaction #| fig-cap: "Diagram illustrating causal interaction. Assessing the joint effect of two interventions, A (e.g., teaching method) and B (e.g., tutoring), on outcome Y (e.g., test score). L represents confounders of the A-Y relationship, and Q represents confounders of the B-Y relationship. Red arrows indicate biasing backdoor paths requiring adjustment. Assumes A and B are decided independently here." #| out-width: 100% #| echo: false \usetikzlibrary{positioning, shapes.geometric, arrows, decorations} \tikzstyle{Arrow} = [->, thin, preaction = {decorate}] \tikzset{>=latex} \begin{tikzpicture}[{every node/.append style}=draw] \node [rectangle, draw=white] (LA) at (0, .5) {$L_{0}$}; \node [rectangle, draw=white] (LB) at (0, -.5) {$Q_{0}$}; \node [rectangle, draw=white] (A) at (2, .5) {$A_{1}$}; \node [rectangle, draw=white] (B) at (2, -.5) {$B_{1}$}; \node [rectangle, draw=white] (Y) at (5, 0) {$Y_{2}$}; \draw [-latex, draw=red] (LA) to (A); \draw [-latex, draw=red] (LB) to (B); \draw [-latex, draw=red, bend left] (LA) to (Y); \draw [-latex, draw=red, bend right] (LB) to (Y); \draw [-latex, draw=black] (A) to (Y); \draw [-latex, draw=black] (B) to (Y); \end{tikzpicture} ``` @fig-dag-interaction-solved shows we need to condition on (adjust for) *both* $L_0$ and $Q_0$. ```{tikz} #| label: fig-dag-interaction-solved #| fig-cap: "Identification of causal interaction requires adjusting for all confounders of A-Y (L) and B-Y (Q). Boxes around L and Q indicate conditioning, closing backdoor paths." #| out-width: 80% #| echo: false \usetikzlibrary{positioning, shapes.geometric, arrows, decorations} \tikzstyle{Arrow} = [->, thin, preaction = {decorate}] \tikzset{>=latex} \begin{tikzpicture}[{every node/.append style}=draw] \node [rectangle, draw=black] (LA) at (0, .5) {$L_{0}$}; \node [rectangle, draw=black] (LB) at (0, -.5) {$Q_{0}$}; \node [rectangle, draw=white] (A) at (2, .5) {$A_{1}$}; \node [rectangle, draw=white] (B) at (2, -.5) {$B_{1}$}; \node [rectangle, draw=white] (Y) at (5, 0) {$Y_{2}$}; \draw [-latex, draw=black] (LA) to (A); \draw [-latex, draw=black] (LB) to (B); \draw [-latex, draw=black, bend left] (LA) to (Y); \draw [-latex, draw=black, bend right] (LB) to (Y); \draw [-latex, draw=black] (A) to (Y); \draw [-latex, draw=black] (B) to (Y); \end{tikzpicture} ``` ```{tikz} #| label: fig-dag-effect-modification #| fig-cap: "How shall we investigate effect modification of A on Y by G? Can you see the problem?" #| out-width: 80% #| echo: false \usetikzlibrary{positioning, shapes.geometric, arrows.meta, decorations.pathmorphing} \tikzset{ Arrow/.style={->, >=latex, line width=0.4pt}, % Defines a generic arrow style emod/.style={rectangle, fill=blue!10, draw=blue, thick, minimum size=6mm}, emoddot/.style={circle, fill=blue!10, draw=blue, dotted, thick, minimum size=6mm} } \begin{tikzpicture} \node [rectangle, draw=white, thick] (U) at (-4,0) {U}; \node [black] (G) at (-2,0) {G}; \node [rectangle, draw=black,thick] (L) at (0,0) {L$_{0}$}; \node [rectangle, draw=white, thick] (A) at (2,0) {A$_{1}$}; %\node [emoddot] (Z) at (0, -1) {Z}; \node [rectangle, draw=white, thick] (Y) at (4,0) {Y$_{2}$}; \draw[Arrow, draw=black, bend left = 20] (U) to (L); \draw[Arrow, draw=black] (G) to (L); \draw[Arrow, draw=black] (L) to (A); \draw[Arrow, draw=black, bend left = 30] (L) to (Y); \draw[Arrow, draw=black, bend right = 30] (G) to (Y); \draw [-latex, draw=black] (A) to (Y); \draw [-latex, draw=black,bend left = 40] (U) to (Y); %\draw[-{Circle[open, fill=none]}, line width=0.25pt, draw=blue, bend right = 10] (Z) to (Y); % Circle-ended arrow \end{tikzpicture} ``` Thus ,it is essential to understand that when we control for confounding along the the $A \to Y$ path, we do not identify the causal effects of effect-modifiers. Rather, we should consider effect-modifiers *prognostic* indicators. Moreover, we're going to need to develop methods for clarifying prognostic indicators in multi-dimensional settings where                                                                                                       :::{.callout-note appearance="minimal"} © 2025 Joseph Bulbulia. This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/). :::