Lab 6: Conditional Average Treatment Effects
Download the R script for this lab (right-click → Save As)
This lab explores why functional form matters for estimating heterogeneous treatment effects. You will compare parametric and non-parametric estimators, examine individual-level predictions from causal forests, and test whether treatment effects genuinely vary across individuals.
- Why OLS can miss treatment effect heterogeneity
- How to extract individual treatment effect predictions from a causal forest
- How to test for significant heterogeneity using
test_calibration() - How to identify which covariates drive effect modification
Why functional form matters
When treatment effects vary across individuals, the method we use to estimate them matters. A linear model assumes effects change at a constant rate with each covariate; a causal forest can capture non-linear and interactive patterns.
library(causalworkshop)
library(grf)
library(tidyverse)
The simulate_nonlinear_data() function generates data where the true treatment effect surface is deliberately non-linear, so that flexible methods outperform rigid ones:
# simulate data with non-linear treatment effects
d_nl <- simulate_nonlinear_data(n = 2000, seed = 2026)
# compare four estimation methods
result <- compare_ate_methods(d_nl)
All four methods (OLS, polynomial, GAM, causal forest) recover the overall ATE reasonably well. But their ability to predict individual effects differs dramatically:
# compare RMSE for individual-level predictions
print(result$summary)
RMSE (root mean squared error) measures how well each method predicts the true individual treatment effect . A lower RMSE means the method captures the heterogeneity pattern more accurately. OLS assumes a linear effect surface and typically has the highest RMSE.
Individual treatment effects from the causal forest
Now we return to the NZAVS data from Lab 5. The causal forest estimates for each individual: what would their outcome change be if they were treated versus untreated?
# simulate NZAVS data (same as Lab 5)
d <- simulate_nzavs_data(n = 5000, seed = 2026)
d0 <- d |> filter(wave == 0)
d1 <- d |> filter(wave == 1)
d2 <- d |> filter(wave == 2)
# construct matrices
covariate_cols <- c(
"age", "male", "nz_european", "education", "partner", "employed",
"log_income", "nz_dep", "agreeableness", "conscientiousness",
"extraversion", "neuroticism", "openness",
"community_group", "wellbeing"
)
X <- as.matrix(d0[, covariate_cols])
Y <- d2$wellbeing
W <- d1$community_group
# fit causal forest
cf <- causal_forest(
X, Y, W,
num.trees = 1000,
honesty = TRUE,
tune.parameters = "all",
seed = 2026
)
Extract predicted individual treatment effects:
# predicted treatment effects for each individual
tau_hat <- predict(cf)$predictions
# summary statistics
cat("Mean tau_hat: ", round(mean(tau_hat), 3), "\n")
cat("SD tau_hat: ", round(sd(tau_hat), 3), "\n")
cat("Range tau_hat: ", round(range(tau_hat), 3), "\n")
Compare with the true individual effects:
# true individual effects from the data-generating process
tau_true <- d0$tau_community_wellbeing
# how well does the forest recover individual effects?
cat("Correlation(tau_hat, tau_true):", round(cor(tau_hat, tau_true), 3), "\n")
cat("RMSE:", round(sqrt(mean((tau_hat - tau_true)^2)), 3), "\n")
Visualise the distribution of predicted effects:
# histogram of predicted treatment effects
ggplot(data.frame(tau_hat = tau_hat), aes(x = tau_hat)) +
geom_histogram(bins = 40, fill = "steelblue", alpha = 0.7) +
geom_vline(xintercept = mean(tau_hat), colour = "red", linetype = "dashed") +
labs(
title = "Distribution of predicted treatment effects",
x = expression(hat(tau)(x)),
y = "Count"
) +
theme_minimal()
If treatment effects were homogeneous, this histogram would be tightly concentrated around the ATE. A wide spread indicates heterogeneity: some people benefit more from community group participation than others.
Test for heterogeneity
The test_calibration() function tests whether the forest has detected genuine heterogeneity, or whether the variation in is just noise.
# test for heterogeneity
cal_test <- test_calibration(cf)
print(cal_test)
The key row is differential.forest.prediction. If its coefficient is significantly greater than zero (p < 0.05), the forest has detected meaningful variation in treatment effects beyond the overall mean. The mean.forest.prediction row tests whether the average effect is non-zero.
Variable importance
Which covariates drive the heterogeneity? The variable_importance() function measures how frequently each variable is used for splitting in the forest:
# variable importance
var_imp <- variable_importance(cf)
importance_df <- data.frame(
variable = colnames(X),
importance = as.numeric(var_imp)
) |>
arrange(desc(importance))
print(importance_df)
The true treatment effect formula for community group participation on wellbeing is:
So extraversion, partner status, and neuroticism should appear as important variables. Does the forest recover this pattern?
Subgroup analysis
We can examine whether predicted effects differ across subgroups defined by the important covariates:
# compare effects by extraversion
high_extra <- tau_hat[d0$extraversion > 0]
low_extra <- tau_hat[d0$extraversion <= 0]
cat("Mean tau_hat (high extraversion):", round(mean(high_extra), 3), "\n")
cat("Mean tau_hat (low extraversion): ", round(mean(low_extra), 3), "\n")
cat("Difference: ", round(mean(high_extra) - mean(low_extra), 3), "\n")
# compare effects by partner status
partnered <- tau_hat[d0$partner == 1]
unpartnered <- tau_hat[d0$partner == 0]
cat("\nMean tau_hat (partnered): ", round(mean(partnered), 3), "\n")
cat("Mean tau_hat (unpartnered):", round(mean(unpartnered), 3), "\n")
cat("Difference: ", round(mean(partnered) - mean(unpartnered), 3), "\n")
The tau formula adds and . Highly extraverted and partnered individuals should show larger predicted treatment effects. Check whether this matches what you observe.
Predicted vs true effects scatter plot
# scatter plot of predicted vs true individual effects
ggplot(data.frame(true = tau_true, predicted = tau_hat),
aes(x = true, y = predicted)) +
geom_point(alpha = 0.1, colour = "steelblue") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", colour = "red") +
labs(
title = "Predicted vs true individual treatment effects",
x = expression(tau(x)),
y = expression(hat(tau)(x))
) +
theme_minimal()
Causal forests can detect meaningful heterogeneity in treatment effects without requiring the analyst to specify the functional form in advance. The test_calibration() function provides a formal test for heterogeneity, and variable_importance() identifies which covariates drive it. In Lab 8, we will use these individual predictions to evaluate targeting strategies.
Exercises
-
Different seed. Run
compare_ate_methods()withseed = 42instead ofseed = 2026. Do the relative RMSE rankings change? Why or why not? -
Different exposure-outcome pair. Fit a causal forest for
volunteer_workonself_esteem. Runtest_calibration()andvariable_importance(). Which covariates drive heterogeneity? Does this match the ground-truth tau formula? (Hint: check thesimulate_nzavs_datadocumentation.) -
Why does OLS miss heterogeneity? In one paragraph, explain why a linear model that includes only main effects cannot capture the term in the treatment effect formula. What would you need to add to the linear model to capture this non-linearity?