Lab 5: Average Treatment Effects
Download the R script for this lab (right-click → Save As)
This lab introduces causal forests as a tool for estimating average treatment effects (ATEs). You will compare naive, regression-adjusted, g-computation, and causal forest estimates against known ground-truth effects.
- Why naive estimates of causal effects are biased when confounding is present
- How covariate adjustment and g-computation reduce this bias
- How to fit a causal forest and extract the ATE
- How to validate estimates against ground truth
This lab uses the causalworkshop and grf packages. Install them before proceeding if you haven't already.
Setup and data
Install and load the required packages:
# install causalworkshop if needed
# install.packages("remotes")
# remotes::install_github("go-bayes/causalworkshop")
library(causalworkshop)
library(grf)
library(tidyverse)
Generate a simulated three-wave panel dataset. The data are modelled on the New Zealand Attitudes and Values Study (NZAVS), with baseline confounders (wave 0), binary exposures (wave 1), and continuous outcomes (wave 2). Crucially, the data contain known ground-truth treatment effects in the tau_* columns.
# simulate data
d <- simulate_nzavs_data(n = 5000, seed = 2026)
# check structure
dim(d)
names(d)
The data are in long format (three rows per individual). We need to separate the waves:
# separate waves
d0 <- d |> filter(wave == 0) # baseline confounders
d1 <- d |> filter(wave == 1) # exposure assignment
d2 <- d |> filter(wave == 2) # outcomes
# verify alignment
stopifnot(all(d0$id == d1$id), all(d0$id == d2$id))
We will estimate the effect of community group participation (community_group) at wave 1 on wellbeing (wellbeing) at wave 2.
# ground truth: the true ATE
true_ate <- mean(d0$tau_community_wellbeing)
cat("True ATE:", round(true_ate, 3), "\n")
Naive ATE (biased)
A naive estimate ignores confounders. We simply regress the outcome on the exposure:
fit_naive <- lm(d2$wellbeing ~ d1$community_group)
naive_ate <- coef(fit_naive)[2]
cat("Naive ATE:", round(naive_ate, 3), "\n")
cat("True ATE: ", round(true_ate, 3), "\n")
cat("Bias: ", round(naive_ate - true_ate, 3), "\n")
People who join community groups differ systematically from those who don't. They tend to be more extraverted, more agreeable, and less neurotic. These same traits also affect wellbeing directly. The naive estimate captures both the causal effect and the confounding.
Adjusted ATE (regression)
We can reduce bias by conditioning on baseline confounders:
# construct analysis dataframe
df <- data.frame(
y = d2$wellbeing,
a = d1$community_group,
age = d0$age,
male = d0$male,
nz_european = d0$nz_european,
education = d0$education,
partner = d0$partner,
employed = d0$employed,
log_income = d0$log_income,
nz_dep = d0$nz_dep,
agreeableness = d0$agreeableness,
conscientiousness = d0$conscientiousness,
extraversion = d0$extraversion,
neuroticism = d0$neuroticism,
openness = d0$openness,
community_t0 = d0$community_group,
wellbeing_t0 = d0$wellbeing
)
# regression with covariates
fit_adj <- lm(y ~ a + age + male + nz_european + education + partner +
employed + log_income + nz_dep + agreeableness +
conscientiousness + extraversion + neuroticism + openness +
community_t0 + wellbeing_t0, data = df)
adj_ate <- coef(fit_adj)["a"]
cat("Adjusted ATE:", round(adj_ate, 3), "\n")
cat("True ATE: ", round(true_ate, 3), "\n")
cat("Bias: ", round(adj_ate - true_ate, 3), "\n")
The adjusted estimate should be much closer to the true ATE. Conditioning on confounders breaks the spurious association between exposure and outcome (recall the fork structure from the ggdag tutorial).
G-computation by hand
G-computation estimates the ATE by predicting outcomes under counterfactual treatment assignments. We create two copies of the data, one where everyone is treated and one where everyone is untreated, predict outcomes for each, and take the average difference.
# create counterfactual datasets
df_treated <- df
df_treated$a <- 1
df_control <- df
df_control$a <- 0
# predict outcomes under each scenario
y_hat_treated <- predict(fit_adj, newdata = df_treated)
y_hat_control <- predict(fit_adj, newdata = df_control)
# ATE via g-computation
gcomp_ate <- mean(y_hat_treated - y_hat_control)
cat("G-computation ATE:", round(gcomp_ate, 3), "\n")
cat("True ATE: ", round(true_ate, 3), "\n")
When the treatment is binary and the model has no interactions, the g-computation ATE equals the regression coefficient on the treatment variable. They diverge when interactions are present, because g-computation averages over the empirical distribution of covariates.
ATE via causal forest
A causal forest estimates individual-level treatment effects non-parametrically. The ATE is the average of these individual effects, with a valid standard error that accounts for the estimation uncertainty.
# construct matrices for the causal forest
covariate_cols <- c(
"age", "male", "nz_european", "education", "partner", "employed",
"log_income", "nz_dep", "agreeableness", "conscientiousness",
"extraversion", "neuroticism", "openness",
"community_t0", "wellbeing_t0"
)
X <- as.matrix(df[, covariate_cols])
Y <- df$y
W <- df$a
# fit causal forest
cf <- causal_forest(
X, Y, W,
num.trees = 1000,
honesty = TRUE,
tune.parameters = "all",
seed = 2026
)
# extract ATE with standard error
ate_cf <- average_treatment_effect(cf)
cat("Causal forest ATE:", round(ate_cf["estimate"], 3),
"(SE:", round(ate_cf["std.err"], 3), ")\n")
cat("True ATE: ", round(true_ate, 3), "\n")
Setting honesty = TRUE splits the training data in half: one half builds the tree structure, the other estimates the treatment effects within each leaf. This prevents overfitting and ensures valid confidence intervals.
Compare all estimates
results <- data.frame(
method = c("Naive", "Adjusted regression", "G-computation", "Causal forest"),
estimate = c(naive_ate, adj_ate, gcomp_ate, ate_cf["estimate"]),
bias = c(naive_ate - true_ate, adj_ate - true_ate,
gcomp_ate - true_ate, ate_cf["estimate"] - true_ate)
)
results$estimate <- round(results$estimate, 3)
results$bias <- round(results$bias, 3)
print(results)
cat("\nTrue ATE:", round(true_ate, 3), "\n")
All three adjusted methods (regression, g-computation, causal forest) should recover the true ATE reasonably well. The naive estimate is substantially biased because it does not account for confounding. The causal forest additionally provides valid standard errors and, as we will see in Lab 6, individual-level treatment effect predictions.
Exercises
-
Different exposure-outcome pair. Repeat the analysis using
religious_serviceas the exposure andbelongingas the outcome. How does the bias of the naive estimate compare? Check the true ATE usingmean(d0$tau_religious_belonging). -
Omit baseline adjustment. Re-fit the causal forest without including
community_t0andwellbeing_t0in the covariate matrix. How much does the ATE estimate change? Why might baseline values of the exposure and outcome be important confounders? -
Sample size comparison. Generate data with
n = 1000andn = 10000. How do the causal forest ATE estimates and standard errors change? What does this tell you about the precision of causal forest estimates?