Lab 5: Average Treatment Effects

R script

Download the R script for this lab (right-click → Save As)

This lab introduces causal forests as a tool for estimating average treatment effects (ATEs). You will compare naive, regression-adjusted, g-computation, and causal forest estimates against known ground-truth effects.

What you will learn

Why naive estimates of causal effects are biased when confounding is present
How covariate adjustment and g-computation reduce this bias
How to fit a causal forest and extract the ATE
How to validate estimates against ground truth

New packages

This lab uses the causalworkshop and grf packages. Install them before proceeding if you haven't already.

Setup and data

Install and load the required packages:

# install causalworkshop if needed
# install.packages("remotes")
# remotes::install_github("go-bayes/causalworkshop")

library(causalworkshop)
library(grf)
library(tidyverse)

Generate a simulated three-wave panel dataset. The data are modelled on the New Zealand Attitudes and Values Study (NZAVS), with baseline confounders (wave 0), binary exposures (wave 1), and continuous outcomes (wave 2). Crucially, the data contain known ground-truth treatment effects in the tau_* columns.

# simulate data
d <- simulate_nzavs_data(n = 5000, seed = 2026)

# check structure
dim(d)
names(d)

The data are in long format (three rows per individual). We need to separate the waves:

# separate waves
d0 <- d |> filter(wave == 0)  # baseline confounders
d1 <- d |> filter(wave == 1)  # exposure assignment
d2 <- d |> filter(wave == 2)  # outcomes

# verify alignment
stopifnot(all(d0$id == d1$id), all(d0$id == d2$id))

We will estimate the effect of community group participation (community_group) at wave 1 on wellbeing (wellbeing) at wave 2.

# ground truth: the true ATE
true_ate <- mean(d0$tau_community_wellbeing)
cat("True ATE:", round(true_ate, 3), "\n")

Naive ATE (biased)

A naive estimate ignores confounders. We simply regress the outcome on the exposure:

fit_naive <- lm(d2$wellbeing ~ d1$community_group)
naive_ate <- coef(fit_naive)[2]
cat("Naive ATE:", round(naive_ate, 3), "\n")
cat("True ATE: ", round(true_ate, 3), "\n")
cat("Bias:     ", round(naive_ate - true_ate, 3), "\n")

Why is the naive estimate biased?

People who join community groups differ systematically from those who don't. They tend to be more extraverted, more agreeable, and less neurotic. These same traits also affect wellbeing directly. The naive estimate captures both the causal effect and the confounding.

Adjusted ATE (regression)

We can reduce bias by conditioning on baseline confounders:

# construct analysis dataframe
df <- data.frame(
  y = d2$wellbeing,
  a = d1$community_group,
  age = d0$age,
  male = d0$male,
  nz_european = d0$nz_european,
  education = d0$education,
  partner = d0$partner,
  employed = d0$employed,
  log_income = d0$log_income,
  nz_dep = d0$nz_dep,
  agreeableness = d0$agreeableness,
  conscientiousness = d0$conscientiousness,
  extraversion = d0$extraversion,
  neuroticism = d0$neuroticism,
  openness = d0$openness,
  community_t0 = d0$community_group,
  wellbeing_t0 = d0$wellbeing
)

# regression with covariates
fit_adj <- lm(y ~ a + age + male + nz_european + education + partner +
                employed + log_income + nz_dep + agreeableness +
                conscientiousness + extraversion + neuroticism + openness +
                community_t0 + wellbeing_t0, data = df)

adj_ate <- coef(fit_adj)["a"]
cat("Adjusted ATE:", round(adj_ate, 3), "\n")
cat("True ATE:    ", round(true_ate, 3), "\n")
cat("Bias:        ", round(adj_ate - true_ate, 3), "\n")

What changed?

The adjusted estimate should be much closer to the true ATE. Conditioning on confounders breaks the spurious association between exposure and outcome (recall the fork structure from the ggdag tutorial).

G-computation by hand

G-computation estimates the ATE by predicting outcomes under counterfactual treatment assignments. We create two copies of the data, one where everyone is treated and one where everyone is untreated, predict outcomes for each, and take the average difference.

# create counterfactual datasets
df_treated <- df
df_treated$a <- 1

df_control <- df
df_control$a <- 0

# predict outcomes under each scenario
y_hat_treated <- predict(fit_adj, newdata = df_treated)
y_hat_control <- predict(fit_adj, newdata = df_control)

# ATE via g-computation
gcomp_ate <- mean(y_hat_treated - y_hat_control)
cat("G-computation ATE:", round(gcomp_ate, 3), "\n")
cat("True ATE:         ", round(true_ate, 3), "\n")

G-computation vs regression coefficient

When the treatment is binary and the model has no interactions, the g-computation ATE equals the regression coefficient on the treatment variable. They diverge when interactions are present, because g-computation averages over the empirical distribution of covariates.

ATE via causal forest

A causal forest estimates individual-level treatment effects $τ (x_{i})$ non-parametrically. The ATE is the average of these individual effects, with a valid standard error that accounts for the estimation uncertainty.

# construct matrices for the causal forest
covariate_cols <- c(
  "age", "male", "nz_european", "education", "partner", "employed",
  "log_income", "nz_dep", "agreeableness", "conscientiousness",
  "extraversion", "neuroticism", "openness",
  "community_t0", "wellbeing_t0"
)

X <- as.matrix(df[, covariate_cols])
Y <- df$y
W <- df$a

# fit causal forest
cf <- causal_forest(
  X, Y, W,
  num.trees = 1000,
  honesty = TRUE,
  tune.parameters = "all",
  seed = 2026
)

# extract ATE with standard error
ate_cf <- average_treatment_effect(cf)
cat("Causal forest ATE:", round(ate_cf["estimate"], 3),
    "(SE:", round(ate_cf["std.err"], 3), ")\n")
cat("True ATE:         ", round(true_ate, 3), "\n")

What is honesty?

Setting honesty = TRUE splits the training data in half: one half builds the tree structure, the other estimates the treatment effects within each leaf. This prevents overfitting and ensures valid confidence intervals.

Compare all estimates

results <- data.frame(
  method = c("Naive", "Adjusted regression", "G-computation", "Causal forest"),
  estimate = c(naive_ate, adj_ate, gcomp_ate, ate_cf["estimate"]),
  bias = c(naive_ate - true_ate, adj_ate - true_ate,
           gcomp_ate - true_ate, ate_cf["estimate"] - true_ate)
)
results$estimate <- round(results$estimate, 3)
results$bias <- round(results$bias, 3)
print(results)
cat("\nTrue ATE:", round(true_ate, 3), "\n")

Key takeaway

All three adjusted methods (regression, g-computation, causal forest) should recover the true ATE reasonably well. The naive estimate is substantially biased because it does not account for confounding. The causal forest additionally provides valid standard errors and, as we will see in Lab 6, individual-level treatment effect predictions.

Exercises

Lab diary

Complete at least two of the following exercises for your lab diary.

Different exposure-outcome pair. Repeat the analysis using religious_service as the exposure and belonging as the outcome. How does the bias of the naive estimate compare? Check the true ATE using mean(d0$tau_religious_belonging).
Omit baseline adjustment. Re-fit the causal forest without including community_t0 and wellbeing_t0 in the covariate matrix. How much does the ATE estimate change? Why might baseline values of the exposure and outcome be important confounders?
Sample size comparison. Generate data with n = 1000 and n = 10000. How do the causal forest ATE estimates and standard errors change? What does this tell you about the precision of causal forest estimates?

Keyboard shortcuts

PSYC 434: Conducting Research Across Cultures