Week 9: Resource Allocation and Policy Trees
Date: 6 May 2026
Readings
Required
Optional
Key concepts
- Policy learning turns CATE scores into treatment rules.
- Policy value must be evaluated out of sample.
- Shallow policy trees trade a little value for a lot of interpretability.
- Fairness constraints are design choices, not automatic outputs.
Week 8 estimated individual-level treatment contrasts and assessed whether targeting has practical value. Rankings alone are not policy. This week translates those rankings into interpretable, publicly defensible allocation rules.
Seminar
Motivating example
A district health board can fund a wellbeing intervention for only 20% of eligible residents.
A causal forest suggests large heterogeneity. The top-ranked group appears to benefit most.
Now we need a rule that can be defended in public, not just a ranking in code.
From ranking to policy
Let $d(x)\in{0,1}$ denote a treatment rule.
Its policy value is
$$ V(d)=\mathbb{E}[Y(d(X))]. $$
With a budget cap $q$, we impose
$$ \mathbb{E}[d(X)]\le q. $$
Policy learning seeks a rule with high value under that constraint.
Why policy trees
A causal forest maps a high-dimensional covariate vector $X$ to a personalised score $\hat{\tau}(X)$. The function itself is too tangled (thousands of overlapping splits) to hand to a decision-maker. The policytree algorithm bridges that gap. It collapses the forest's many $\hat{\tau}(X)$ values into a single shallow decision tree. Each split maximises expected benefit under the budget constraint (Sverdrup et al. (2024)).
In this course we cap tree depth at two, for three reasons. First, at most three yes/no questions per rule means the logic fits on a slide for policy-makers or clinicians. Second, each leaf retains enough observations to yield a stable effect estimate. Third, deeper trees increase computational complexity faster than they improve payoffs; the gain from a depth-3 tree over a depth-2 tree is usually small relative to the added opacity.
Pair exercise: designing a depth-2 policy rule
- You have a 20% treatment budget. From this list, choose two splitting variables: age, deprivation index, baseline loneliness, self-esteem, neuroticism.
- Sketch a depth-2 tree with your two variables. Label each leaf "treat" or "do not treat."
- Verify that approximately 20% of the population falls in the "treat" leaves (make plausible assumptions about the variable distributions).
- Give two reasons to prefer a depth-2 tree over a depth-4 tree for a public-health decision.
The result is a transparent allocation rule: a short set of if-then conditions that approximates what the full forest would recommend, at the cost of some lost precision.
Evaluating policy performance
We should not evaluate a policy on the same data used to train the ranking.
Use held-out or cross-fitted evaluation to estimate policy value and uncertainty.
If gains disappear out of sample, we do not deploy.
Practical interpretation
If a tree splits on self-esteem and neuroticism, that does not mean these variables are morally privileged causes.
It means they helped separate high-value and low-value treatment regions under the chosen objective.
Equity and governance
Efficiency is not enough.
Proxy variables can encode historical inequity. A split on deprivation may indirectly stratify by ethnicity or structural disadvantage.
Under Te Tiriti o Waitangi, allocation rules in health settings need explicit equity consideration.
Before deployment, investigators should check:
- Who gains and who loses?
- Are protected groups differentially affected through proxies?
- Does the rule reduce or worsen disparities?
- Can affected communities understand and contest the rule?
Pair exercise: equity audit
- A policy tree splits first on deprivation index, then on age. The rule treats high-deprivation residents under 40.
- Explain how a deprivation split indirectly stratifies by ethnicity in an Aotearoa New Zealand context (consider the correlation between deprivation and Māori/Pasifika populations).
- Apply two of the four governance checks above to this rule.
- Propose one modification informed by Te Tiriti o Waitangi obligations (e.g., guaranteed allocation floors, community consultation, or co-governance of the decision rule).
- Your partner says "the algorithm is objective because it only uses data." Counter this claim in two sentences.
Workflow for this week
- Estimate heterogeneity and targeting value.
- Fit a shallow policy tree under explicit constraints.
- Evaluate policy value out of sample.
- Report trade-offs: value, fairness, transparency, and feasibility.
Heterogeneity as scientific discovery
Conditional average treatment effect (CATE) machinery does more than power allocation decisions. It helps science move past a one-size-fits-all mindset. Mapping treatment effects across a high-dimensional covariate space tests whether our conventional categories (gender, age group, clinical severity) capture the differences that matter. Sometimes they do; often they do not. Discovering where the forest finds meaningful splits can generate fresh hypotheses about who responds and why, even when no policy decision is on the table. A forest that splits on loneliness rather than age, for example, suggests the psychological mechanism operates through social connection, not biological ageing.
Return to the opening example
Back to the district health board.
The right question is not "what rule maximises sample gain?" The right question is "what rule performs robustly and remains acceptable under equity and governance standards?"
The workflow from question to policy rule is now in place. One assumption has been present throughout but never examined: that our instruments measure the same construct across the groups we compare. Week 10 asks whether that assumption holds.
Pair exercise: policy tree versus ranking
- Strategy A ranks individuals by $\hat{\tau}(X_i)$ and treats the top 20%. Strategy B fits a depth-2 policy tree with a 20% budget constraint.
- Compare the two strategies on: (a) estimated policy value, (b) explainability to a non-technical audience, and (c) ability to answer "why was I selected?"
- State one scenario where Strategy A (pure ranking) is preferable to Strategy B (policy tree).
- State one scenario where Strategy B is preferable.
Lab materials: Lab 9: Policy Trees
Bulbulia, J. A. (2024). A practical guide to causal inference in three-wave panel studies. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/uyg3d
Hoffman, K. L., Salazar-Barreto, D., Rudolph, K. E., & Díaz, I. (2023). Introducing longitudinal modified treatment policies: A unified framework for studying complex exposures. https://doi.org/10.48550/arXiv.2304.09460
Suzuki, E., Shinozaki, T., & Yamamoto, E. (2020). Causal Diagrams: Pitfalls and Tips. Journal of Epidemiology, 30(4), 153–162. https://doi.org/10.2188/jea.JE20190192
Sverdrup, E., Kanodia, A., Zhou, Z., Athey, S., & Wager, S. (2024). Policytree: Policy learning via doubly robust empirical welfare maximization over trees. https://CRAN.R-project.org/package=policytree
VanderWeele, T. J., Mathur, M. B., & Chen, Y. (2020). Outcome-wide longitudinal designs for causal inference: A new template for empirical studies. Statistical Science, 35(3), 437–466.