Assessment Self-Checks

Use these checks when preparing for Test 2, the presentation, and the research report. They keep you focused on the causal framework taught in the course.

Test 2 is in class, on paper, with one A4 sheet of notes, no devices, and no AI tools. The important work is yours: state the target population, causal contrast, outcome, estimand, assumptions, and limits.

Causal Question Check

Before estimating or interpreting anything, state:

  1. Target population: who the answer is meant to apply to.
  2. Causal contrast: the intervention and comparator, with timing.
  3. Outcome: what is measured, when it is measured, and how the score is constructed.
  4. Causal estimand: average treatment effect (ATE), conditional average treatment effect (CATE), or another estimand.
  5. Identification assumptions: consistency, exchangeability, and positivity.
  6. Measurement assumptions: whether the measure represents the same outcome across people, groups, treatment states, and time.

If you cannot state these pieces, the analysis is not yet interpretable.

HTE and Policy Tree Check

Heterogeneous treatment effect (HTE) methods describe how estimated treatment effects vary across measured covariates. Policy trees turn those estimates into simple rules or summaries of high-response regions. They are useful, and easy to over-read.

Before reporting HTE or policy tree results, check:

  1. Outcome: which outcome the rule is for. A rule for one outcome may not be a rule for another.
  2. Estimand: whether you are reporting CATEs, ranked heterogeneity, policy value, a high-response region, or an allocation rule.
  3. Modifier status: whether a split variable is a measured descriptor, proxy, or downstream marker. Do not read a split variable as the causal source of effect modification.
  4. Support: whether both treatment levels are represented in the relevant covariate regions. A policy tree cannot rescue poor positivity.
  5. Parsimony: whether the depth-2 tree clears the prespecified held-out policy-value point-gain threshold. Prefer the simpler rule by default, then use uncertainty, stability, equity, and implementation burden to decide how cautiously to interpret any threshold-clearing depth-2 rule.
  6. Uncertainty: whether the policy value and subgroup effects have confidence intervals.
  7. Oversight: what value judgement would authorise acting on the rule. The model can estimate who appears to benefit more; it cannot decide which public value should win.

Useful wording:

The policy tree describes where estimated treatment benefits are largest in the measured data. The split variables should be interpreted as descriptors for targeting and hypothesis generation, not as identified causes of differential response.

Avoid:

The tree shows that deprivation causes the treatment to work better.

Better:

The tree split on deprivation, which may be a useful marker of differential response. The causal source of that heterogeneity could lie upstream, including in variables not directly measured in the tree.

Measurement Check

For measurement questions, do not treat associational model output as evidence of causal structure. Regression coefficients, factor loadings, structural equation models, invariance tests, fit indices, and reliability statistics summarise patterns in the data. They do not identify the causal structure that produced those patterns.

Ask:

  1. Does the measured score represent the same outcome under the intervention and control conditions?
  2. Does the measure work the same way across the groups being compared?
  3. Could exposure, intervention, or group membership affect how people interpret or answer the items?
  4. Could missingness, translation, response style, or social desirability affect the measured outcome?
  5. If a statistic is used to support "validity", what causal or measurement assumption is it meant to support, and what alternative explanation does it fail to rule out?