Reporting Guide

This guide shows what to report in the research report. Use it as a checklist for average treatment effects, heterogeneous effects, policy trees, and sensitivity analyses.

The ten-step causal inference checklist

Before reporting results, check that each piece is in place.

Steps 0–3: problem definition

Well-defined treatment. Specify the exposure precisely, including the contrast (e.g., "weekly religious service attendance vs. less than weekly").
Time zero. State when treatment assignment or initiation begins and when follow-up starts.
Well-defined outcome. State the outcome measure, its scale, and when it was assessed.
Target population. Define who the results apply to, including any weighting for population representativeness.

Steps 4–6: identification strategy

Exchangeability. Describe the baseline covariates you adjust for and why they are enough, under your causal directed acyclic graph (causal DAG).
Consistency. Explain why the treatment is well-defined and uniform across individuals.
Positivity. Report checks showing that both exposure levels appear across the relevant covariate patterns.

Steps 7–9: implementation

Measurement. Explain how the outcome is measured and any assumptions needed for the measure to mean the same thing across people, groups, and time.
Attrition handling. The simulated data do not have missing responses, so follow the Lab 10 template: state how panel dropout would be handled if present, for example with inverse probability of censoring weights.
Transparent reporting. Document the analysis choices and assumptions.

Target trial emulation

Frame your causal question as: "How would outcomes change if we intervened to set everyone's exposure to level $a=1$ rather than $a=0$, conditional on baseline characteristics?"

Reporting average treatment effects

Standard ATE table format

Include these elements for each outcome:

Outcome	Estimate (SD units)	Bonferroni 95% CI	E-value (point)	E-value (Bonferroni bound)
Outcome A	0.12	[0.05, 0.19]	1.8	1.3
Outcome B	0.15	[0.08, 0.22]	2.1	1.4

Key reporting elements

Effect sizes: report in standard deviation units. The simulator's four wellbeing outcomes are pre-z-scored, so the standardised scale is the only scale available — there is no original 1–7 scale to recover.
Confidence intervals: report the multiplicity-adjusted (Bonferroni) interval, since the design is outcome-wide across four outcomes.
E-values: report two — one for the point estimate and one for the lower end of the multiplicity-adjusted confidence interval (the bound nearest the null). Both are produced by the ate_table() helper in the research-report template's setup.R.
Sample size: total analysed after exclusions and weighting.

Example results text

"Weekly religious service attendance was estimated to improve all four wellbeing outcomes. The largest estimates were for sense of belonging ($\beta = 0.18$ SD units, Bonferroni 95% CI: 0.11–0.25) and life satisfaction ($\beta = 0.15$ SD units, Bonferroni 95% CI: 0.08–0.22). The point E-values were above 1.6 and the Bonferroni-bound E-values above 1.3, meaning an unmeasured confounder would need to be associated with both attendance and these outcomes by a risk ratio of at least 1.3 each, above and beyond measured covariates, to push the multiplicity-adjusted lower bound to the null."

Reporting heterogeneous treatment effects

For Option A, the policy-tree workflow is the only heterogeneity output. The rank-weighted average treatment effect (RATE) and Qini diagnostics introduced in Lab 8 are not part of this report's scaffold; the template does not ship code for them, and you do not need to include them.

Reporting policy tree results

Present each policy tree on the standardised outcome scale used by the simulator (the four wellbeing outcomes are pre-z-scored, so SD units are the only scale available). The tree describes the action or high-response region implied by the supplied rewards; it does not force a fixed percentage treated.

The course policy-tree workflow uses an outcome-only objective. A do not treat leaf means the rule assigns the no-treatment action for that covariate profile. It does not compute money saved, staff time saved, or other resource savings. To make savings part of the analysis, investigators would need to specify a treatment cost in outcome units, subtract that cost from the treatment reward, and refit or compare trees across plausible cost values.

Parsimony rule

Fit both depth-1 and depth-2 policy trees. Prefer the depth-1 tree unless the depth-2 tree improves held-out policy value, using the point estimate, by at least min_gain_for_depth_switch. The course default is:

min_gain_for_depth_switch <- 0.01

State the threshold before reporting the selected tree. If depth-2 does not clear the threshold, report the simpler tree in the main text and place the full depth comparison in the appendix. If depth-2 clears the threshold but uncertainty is wide, describe the rule as promising or fragile and use stability, equity, and implementation burden to temper the conclusion. The margot_policy_workflow() function applies the point-gain rule automatically and exposes the comparison in wf$best$depth_summary_df.

Graphing rule

Apply margot::margot_select_grf_policy_trees() after the policy-tree workflow runs. The graphing rule keeps a policy tree only when both the policy-value lower confidence limit and the treated-uplift lower confidence limit exceed zero. Course defaults set both lower-CI thresholds at zero:

policy_value_lower_threshold <- 0
treated_uplift_lower_threshold <- 0

State the thresholds in your methods. Outcomes that fail the rule remain in your tables and prose; their trees do not appear as figures. The rule is a precommitment device: state the test, then graph only what passes.

Subgroup reporting

For each graphed tree, report the high-response subgroups with their estimated effects, uncertainty, and sample proportions. Use the standardised scale; do not invent an original-scale interpretation, because the simulator outcomes are not on a 1–7 (or any other) raw scale.

Report coverage as the treated share implied by the selected rule, not as a budget constraint. If a programme can treat only a fixed share, say so explicitly. The default policy tree estimates the value of a shallow allocation rule; it does not solve a fixed-capacity allocation problem.

Example: high-response subgroups for life satisfaction

Older adults with high baseline belonging (age > 45, baseline belonging > +1 SD)
- Standardised effect: $\beta = 0.28$ SD units (95% CI: 0.21–0.35)
- Sample proportion: 23%
Younger adults with lower baseline purpose (age <= 45, baseline purpose < 0)
- Standardised effect: $\beta = 0.22$ SD units (95% CI: 0.16–0.28)
- Sample proportion: 31%
All others
- Standardised effect: $\beta = 0.08$ SD units (95% CI: 0.04–0.12)
- Sample proportion: 46%

Example policy tree text

"The policy tree estimates the expected value of assigning the single treatment according to the fitted rule. In this example, the rule assigns treatment to two profiles with larger estimated gains in life satisfaction. Older adults (45+) with high baseline belonging had the largest estimated gain ($\beta = 0.28$ SD units), representing 23% of the sample. Interpret these leaves as parts of an allocation rule, with age and baseline belonging used as splitting variables."

Avoid phrasing such as "the do-not-treat leaf saves resources" unless a treatment cost has been built into the objective. Under the course workflow, the accurate wording is: "the outcome-only rule assigns these profiles to no treatment because the no-treatment action has the higher estimated value for that leaf."

Sensitivity analysis: E-values

Interpretation

An E-value says how strong an unmeasured confounder would need to be, on the risk ratio scale, with both the treatment and the outcome to explain away the observed effect.

There is no universal threshold at which an E-value becomes "safe". Interpret it against the study design, the covariates already measured, and plausible omitted causes in the setting. Report the E-value for the point estimate and for the confidence-limit closest to the null, and explain what kind of unmeasured confounder would be needed for the result to disappear.

Example sensitivity text

"For sense of belonging, the E-value was 2.4. An unmeasured confounder would need to be associated with both religious service attendance and belonging by a risk ratio of 2.4 each, above and beyond the measured covariates, to explain away the estimate."

Methods section template

A complete methods section should include:

Treatment definition: what the exposure is, how it is coded, and the contrast of interest.
Time zero and follow-up: when assignment occurs, when follow-up starts, and why that timing matches the intervention.
Outcome definition: measures used, timing of assessment, any transformations applied.
Target population: sampling frame, weighting strategy, eligibility criteria.
Causal identification: covariates adjusted for, with a justification for conditional exchangeability.
Statistical analysis: estimation method, key tuning parameters (e.g., number of trees, minimum node size).
Attrition handling: censoring weights, stages of dropout addressed.
Heterogeneity assessment: policy-tree depth (with parsimony decision), policy value, treated uplift, and the graphing-rule decision.
Sensitivity analysis: E-values for all primary estimates.

Reporting checklist

Do report

Effect sizes in SD units (the simulator's outcome scale) with multiplicity-adjusted (Bonferroni) confidence intervals
Sample sizes after exclusions and weighting
E-values for both the point estimate and the Bonferroni-adjusted lower bound
Clear practical interpretation of effect sizes
Subgroup sizes and subgroup estimates for graphed policy trees
The parsimony threshold and the graphing-rule thresholds you used
Target trial framework and causal question
Explicit treatment and outcome definitions

Do not report

Model coefficients without interpretation
p-values alone without effect sizes
Original-scale (1–7) effects for the simulated outcomes; the simulator returns z-scored outcomes only
Technical details that obscure main findings
Causal claims beyond your identification strategy

PSYC 434: Conducting Research Across Cultures