Lab 9: Policy Trees
Download the R script for this lab (right-click → Save As)
This lab moves from evaluating heterogeneity (Lab 8) to making decisions. Policy trees learn simple, interpretable treatment assignment rules from causal forest predictions. You will fit policy trees of different depths, evaluate their performance, and discuss the ethical implications of algorithmic treatment assignment.
- How to construct a reward matrix from causal forest predictions
- How to fit and interpret depth-1 and depth-2 policy trees
- How to evaluate policy performance against random assignment
- How to express learned rules in plain language
This lab uses the causal forest and individual treatment effect predictions from Labs 5-8. The progression is: estimate effects (Lab 5) discover heterogeneity (Lab 6) evaluate targeting (Lab 8) learn assignment rules (this lab).
Setup
library(causalworkshop)
library(grf)
library(policytree)
library(tidyverse)
Re-fit the causal forest (or re-use from previous labs):
# simulate data
d <- simulate_nzavs_data(n = 5000, seed = 2026)
d0 <- d |> filter(wave == 0)
d1 <- d |> filter(wave == 1)
d2 <- d |> filter(wave == 2)
# construct matrices
covariate_cols <- c(
"age", "male", "nz_european", "education", "partner", "employed",
"log_income", "nz_dep", "agreeableness", "conscientiousness",
"extraversion", "neuroticism", "openness",
"community_group", "wellbeing"
)
X <- as.matrix(d0[, covariate_cols])
Y <- d2$wellbeing
W <- d1$community_group
cf <- causal_forest(
X, Y, W,
num.trees = 1000,
honesty = TRUE,
tune.parameters = "all",
seed = 2026
)
tau_hat <- predict(cf)$predictions
The gamma matrix
A policy tree needs a reward matrix (called the "gamma matrix"). Each row is an individual; each column is an action. The entry gives the expected reward for assigning that individual to that action.
With two actions (treat vs not treat), the gamma matrix has two columns:
# construct gamma matrix
# column 1: reward if not treated (control) = 0 (baseline)
# column 2: reward if treated = predicted treatment effect
gamma_matrix <- cbind(
control = rep(0, length(tau_hat)),
treatment = tau_hat
)
head(gamma_matrix)
We normalise the control reward to zero so that the treatment column represents the gain from treating. A positive value means treatment helps; a negative value means treatment harms. The policy tree then simply needs to decide: for which individuals is the gain positive enough to justify treatment?
Fit a depth-1 policy tree
A depth-1 tree makes a single split, dividing the population into two groups based on one covariate:
# subsample for speed (policy tree fitting can be slow on large datasets)
set.seed(2026)
n_sample <- 500
idx <- sample(seq_len(nrow(X)), n_sample)
X_sample <- as.data.frame(X[idx, ])
gamma_sample <- gamma_matrix[idx, ]
# fit depth-1 policy tree
pt_depth1 <- policy_tree(X_sample, gamma_sample, depth = 1)
# print the tree
print(pt_depth1)
Visualise the tree:
plot(pt_depth1)
The tree shows one splitting variable and a threshold. Individuals above the threshold go one way; individuals below go the other. The leaf labels (1 or 2) correspond to the columns of the gamma matrix: 1 = control, 2 = treatment.
Fit a depth-2 policy tree
A depth-2 tree makes two sequential splits, creating four groups:
# fit depth-2 policy tree
pt_depth2 <- policy_tree(X_sample, gamma_sample, depth = 2)
# print and plot
print(pt_depth2)
plot(pt_depth2)
Evaluate policies
Predict treatment assignments on the full dataset and compare with random assignment:
# predict actions for full dataset
X_full <- as.data.frame(X)
actions_depth1 <- predict(pt_depth1, X_full)
actions_depth2 <- predict(pt_depth2, X_full)
# compute expected reward under each policy
# action = 1 means control, action = 2 means treatment
reward_depth1 <- ifelse(actions_depth1 == 1, gamma_matrix[, 1], gamma_matrix[, 2])
reward_depth2 <- ifelse(actions_depth2 == 1, gamma_matrix[, 1], gamma_matrix[, 2])
reward_random <- 0.5 * gamma_matrix[, 1] + 0.5 * gamma_matrix[, 2]
# compare policies
policy_comparison <- tibble(
policy = c("Random assignment", "Depth-1 tree", "Depth-2 tree", "Treat everyone"),
expected_reward = c(
mean(reward_random),
mean(reward_depth1),
mean(reward_depth2),
mean(tau_hat)
),
treat_rate = c(
0.50,
mean(actions_depth1 == 2),
mean(actions_depth2 == 2),
1.00
)
)
print(policy_comparison |> mutate(across(where(is.numeric), \(x) round(x, 3))))
A depth-2 tree is harder to explain than a depth-1 tree but may assign treatments more efficiently. Compare the expected rewards: if depth-2 is only marginally better, the simpler depth-1 rule may be preferable for transparency.
Interpret the rules in plain language
Read the tree output and translate it into a decision rule:
# what variables and thresholds does the depth-2 tree use?
print(pt_depth2)
# example interpretation (your values will differ):
# "treat individuals with extraversion > 0.3 and baseline wellbeing < 0.1"
Write out the depth-2 policy tree as a set of plain-language if-then rules. For example: "If extraversion is above X, then treat. Otherwise, if neuroticism is below Y, treat; otherwise, do not treat."
Compare policy assignments with actual treatment
How do the policy-recommended assignments compare with who actually received treatment in the data?
# agreement between policy and observed treatment
agreement <- tibble(
actual = W,
policy_depth2 = ifelse(actions_depth2 == 2, 1, 0)
) |>
mutate(agree = actual == policy_depth2)
cat("Agreement rate:", round(mean(agreement$agree), 3), "\n")
cat("Policy treats: ", round(mean(agreement$policy_depth2), 3), "\n")
cat("Actual treated:", round(mean(agreement$actual), 3), "\n")
Ethical considerations
Exercises
-
Different outcome. Fit a policy tree for
religious_serviceonbelonging. Do the splitting variables change? What does this suggest about which covariates drive effect modification for different outcomes? -
Depth-3 tree. Fit a
depth = 3policy tree. Does the expected reward improve substantially over depth-2? Is the tree still interpretable? -
Discuss override. In one paragraph, describe a scenario where a clinician should override a policy tree recommendation. What information would the clinician have that the model does not?