Lab 5: Average Treatment Effects
R script
Download the R script for this lab (right-click → Save As)
This lab introduces several ways to estimate average treatment effects (ATEs). You will compare naive, regression-adjusted, g-computation, and causal forest estimates against known ground-truth effects, then finish with one short illustration of inverse probability of treatment weighting (IPTW).
What you will learn
- Why naive estimates of causal effects are biased when confounding is present
- How covariate adjustment and g-computation reduce this bias
- How confounding control can also come from an exposure model through IPTW
- How to fit a causal forest and extract the ATE
- How to validate estimates against ground truth
New packages
This lab uses the
causalworkshopandgrfpackages. Install them before proceeding if you haven't already.
Setup and data
Install and load the required packages:
# install packages if needed
# install.packages("grf")
# if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
# remotes::install_github("go-bayes/causalworkshop@v0.2.1")
library(causalworkshop)
library(grf)
library(tidyverse)
Generate a simulated three-wave panel dataset. The data are modelled on
the New Zealand Attitudes and Values Study (NZAVS), with baseline
confounders (wave 0), binary exposures (wave 1), and continuous outcomes
(wave 2). Crucially, the data contain known ground-truth treatment
effects in the tau_* columns.
# simulate data
d <- simulate_nzavs_data(n = 5000, seed = 2026)
# check structure
dim(d)
names(d)
The data are in long format (three rows per individual). We need to separate the waves:
# separate waves
d0 <- d |> filter(wave == 0) # baseline confounders
d1 <- d |> filter(wave == 1) # exposure assignment
d2 <- d |> filter(wave == 2) # outcomes
# verify alignment
stopifnot(all(d0$id == d1$id), all(d0$id == d2$id))
We will estimate the effect of community group participation
(community_group) at wave 1 on wellbeing (wellbeing) at wave 2.
# ground truth: the true ATE
true_ate <- mean(d0$tau_community_wellbeing)
cat("True ATE:", round(true_ate, 3), "\n")
Naive ATE (biased)
A naive estimate ignores confounders. We simply regress the outcome on the exposure:
fit_naive <- lm(d2$wellbeing ~ d1$community_group)
naive_ate <- coef(fit_naive)[2]
cat("Naive ATE:", round(naive_ate, 3), "\n")
cat("True ATE: ", round(true_ate, 3), "\n")
cat("Bias: ", round(naive_ate - true_ate, 3), "\n")
Why is the naive estimate biased?
People who join community groups differ systematically from those who don't. They tend to be more extraverted, more agreeable, and less neurotic. These same traits also affect wellbeing directly. The naive estimate captures both the causal effect and the confounding.
Adjusted ATE (regression)
We can reduce bias by conditioning on baseline confounders:
# construct analysis dataframe
df <- data.frame(
y = d2$wellbeing,
a = d1$community_group,
age = d0$age,
male = d0$male,
nz_european = d0$nz_european,
education = d0$education,
partner = d0$partner,
employed = d0$employed,
log_income = d0$log_income,
nz_dep = d0$nz_dep,
agreeableness = d0$agreeableness,
conscientiousness = d0$conscientiousness,
extraversion = d0$extraversion,
neuroticism = d0$neuroticism,
openness = d0$openness,
community_t0 = d0$community_group,
wellbeing_t0 = d0$wellbeing
)
# regression with covariates
fit_adj <- lm(y ~ a + age + male + nz_european + education + partner +
employed + log_income + nz_dep + agreeableness +
conscientiousness + extraversion + neuroticism + openness +
community_t0 + wellbeing_t0, data = df)
adj_ate <- coef(fit_adj)["a"]
cat("Adjusted ATE:", round(adj_ate, 3), "\n")
cat("True ATE: ", round(true_ate, 3), "\n")
cat("Bias: ", round(adj_ate - true_ate, 3), "\n")
What changed?
The adjusted estimate should be much closer to the true ATE. Conditioning on confounders breaks the spurious association between exposure and outcome (recall the fork structure from the ggdag tutorial).
G-computation by hand
G-computation estimates the ATE by predicting outcomes under counterfactual treatment assignments. We create two copies of the data, one where everyone is treated and one where everyone is untreated, predict outcomes for each, and take the average difference.
# create counterfactual datasets
df_treated <- df
df_treated$a <- 1
df_control <- df
df_control$a <- 0
# predict outcomes under each scenario
y_hat_treated <- predict(fit_adj, newdata = df_treated)
y_hat_control <- predict(fit_adj, newdata = df_control)
# ATE via g-computation
gcomp_ate <- mean(y_hat_treated - y_hat_control)
cat("G-computation ATE:", round(gcomp_ate, 3), "\n")
cat("True ATE: ", round(true_ate, 3), "\n")
G-computation vs regression coefficient
When the treatment is binary and the model has no interactions, the g-computation ATE equals the regression coefficient on the treatment variable. They diverge when interactions are present, because g-computation averages over the empirical distribution of covariates.
ATE via causal forest
A causal forest estimates individual-level treatment effects $\widehat{\tau}(x_i)$ non-parametrically. The ATE is the average of these individual effects, with a valid standard error that accounts for the estimation uncertainty.
# construct matrices for the causal forest
covariate_cols <- c(
"age", "male", "nz_european", "education", "partner", "employed",
"log_income", "nz_dep", "agreeableness", "conscientiousness",
"extraversion", "neuroticism", "openness",
"community_t0", "wellbeing_t0"
)
X <- as.matrix(df[, covariate_cols])
Y <- df$y
W <- df$a
# fit causal forest
cf <- causal_forest(
X, Y, W,
num.trees = 1000,
honesty = TRUE,
tune.parameters = "all",
seed = 2026
)
# extract ATE with standard error
ate_cf <- average_treatment_effect(cf)
cat("Causal forest ATE:", round(ate_cf["estimate"], 3),
"(SE:", round(ate_cf["std.err"], 3), ")\n")
cat("True ATE: ", round(true_ate, 3), "\n")
What is honesty?
Setting
honesty = TRUEsplits the training data in half: one half builds the tree structure, the other estimates the treatment effects within each leaf. This prevents overfitting and ensures valid confidence intervals.
Compare all estimates
results <- data.frame(
method = c("Naive", "Adjusted regression", "G-computation", "Causal forest"),
estimate = c(naive_ate, adj_ate, gcomp_ate, ate_cf["estimate"]),
bias = c(naive_ate - true_ate, adj_ate - true_ate,
gcomp_ate - true_ate, ate_cf["estimate"] - true_ate)
)
results$estimate <- round(results$estimate, 3)
results$bias <- round(results$bias, 3)
print(results)
cat("\nTrue ATE:", round(true_ate, 3), "\n")
Key takeaway
All three adjusted methods (regression, g-computation, causal forest) should recover the true ATE reasonably well. The naive estimate is substantially biased because it does not account for confounding. The causal forest additionally provides valid standard errors and, as we will see in Lab 6, individual-level treatment effect predictions.
Optional extension: the same ATE from an exposure model
So far we have controlled confounding through an outcome model.
G-computation works by modelling $Y \mid A, L$ and then using
predict() to compare the treated and untreated worlds.
IPTW takes the other route. It models treatment assignment, $A \mid L$, then gives more weight to people who received an unexpectedly rare treatment for their covariate pattern. This creates a pseudo-population in which treatment is less confounded by $L$.
# model the probability of treatment
ps_model <- glm(
a ~ age + male + nz_european + education + partner +
employed + log_income + nz_dep + agreeableness +
conscientiousness + extraversion + neuroticism + openness +
community_t0 + wellbeing_t0,
data = df,
family = binomial()
)
ps_hat <- predict(ps_model, type = "response")
# stabilised IPTW weights
p_treated <- mean(df$a)
iptw <- ifelse(
df$a == 1,
p_treated / ps_hat,
(1 - p_treated) / (1 - ps_hat)
)
# quick weight check
tibble(
statistic = c("min", "median", "max"),
value = c(min(iptw), median(iptw), max(iptw))
)
# weighted ATE model
fit_iptw <- lm(y ~ a, data = df, weights = iptw)
iptw_ate <- coef(fit_iptw)[["a"]]
tibble(
method = c("G-computation", "IPTW"),
estimate = c(gcomp_ate, iptw_ate),
bias = c(gcomp_ate - true_ate, iptw_ate - true_ate)
)
What to notice
IPTW is aiming at the same ATE as g-computation, but it gets there through an exposure model rather than an outcome model.
This is why IPTW is useful to see now. Later, doubly robust estimators combine both ideas: an outcome model and an exposure model.
Exercises
Lab diary
Complete at least two of the following exercises for your lab diary.
-
Different exposure-outcome pair. Repeat the analysis using
religious_serviceas the exposure andbelongingas the outcome. How does the bias of the naive estimate compare? Check the true ATE usingmean(d0$tau_religious_belonging). -
Omit baseline adjustment. Re-fit the causal forest without including
community_t0andwellbeing_t0in the covariate matrix. How much does the ATE estimate change? Why might baseline values of the exposure and outcome be important confounders? -
Sample size comparison. Generate data with
n = 1000andn = 10000. How do the causal forest ATE estimates and standard errors change? What does this tell you about the precision of causal forest estimates?