Standard GRF and policy-tree workflow • margot

This vignette describes the standard margot workflow for generalised random forest (GRF) analyses with policy-tree reporting. The workflow is designed for outcome-wide studies: the same exposure, covariates, and analysis population are used to estimate effects for multiple outcomes.

The workflow separates four tasks:

Estimate average treatment effects (ATEs) for each outcome.
Diagnose whether the forest predictions show calibrated heterogeneity.
Evaluate policy-tree learning with held-out folds.
Summarise cross-outcome recurrence descriptively.

The ATE task estimates the primary causal estimand. The policy-tree task asks whether a shallow rule can summarise useful variation in the forest’s doubly robust action scores. Because a policy tree is an optimiser, policy-tree learning should be evaluated on held-out observations using cross-validation.

The standard workflow uses the package defaults unless a registration states an override. For margot_policy_tree_cv(), the defaults compare depths one and two with five held-out folds repeated 20 times, select depth two only when it clears the package parsimony rule, and prefer fastpolicytree when it is available.

Reporting convention

Policy trees optimise action-score rewards. The reporting layer therefore separates the action selected by the tree from the signed score contrast shown in each leaf.

Let $\Gamma_{ja}$ denote the action score for observation $j$ under action $a$ , let $w_j$ denote the evaluation weight, let $L$ denote a policy-tree leaf, and let $E_L$ denote the evaluation observations routed to that leaf. For binary actions $C$ and $T$ , margot reports the signed evaluation-sample treatment-control contrast

$\Delta_L = \frac{\sum_{j \in E_L} w_j\{\Gamma_{jT} - \Gamma_{jC}\}} {\sum_{j \in E_L} w_j}.$

Positive values of $\Delta_L$ favour treatment, negative values favour control, and zero values indicate no directional contrast in the action scores. The selected action is reported separately. For a tree fitted on training observations $S_L$ , the stored action is learned as

$\pi(L) = \arg\max_{a \in \{C,T\}} \frac{\sum_{j \in S_L} w_j \Gamma_{ja}}{\sum_{j \in S_L} w_j}.$

If action-score means tie within numerical tolerance, the leaf has no directional contrast. The reported selected action follows the action stored by the fitted tree or a prespecified tie rule; it should not be read as a substantive preference when $\Delta_L = 0$ . In held-out CV summaries, the stored action is learned on training folds and $\Delta_L$ is computed on held-out observations; the held-out table does not reselect actions from held-out means.

This distinction affects interpretation when every leaf selects the same action. Such a tree may still split observations because score-contrast magnitude varies across the covariate space, but it should not be described as a selective rule that changes action across leaves. Between-leaf differences in $\Delta_L$ describe variation in magnitude, not the decision rule itself.

The standard public table therefore leads with selected_action, tc_score_contrast, score_interval, sample_percent, and direction. Lower-level diagnostics remain available, but selected-action advantage and value-contribution columns are hidden unless explicitly requested.

The score interval is a source-specific action-score summary. For display-tree tables, it is an approximate interval for row-level scores after tree selection. For held-out CV tables, it summarises pooled held-out action-score summaries from fold-level trees that may have different leaf structures. In neither case should the interval be described as a formal post-selection subgroup test.

Simulate two outcomes

The simulation creates two outcomes with related, but not identical, heterogeneity. Age and socioeconomic position recur as candidate organising variables, while the second outcome adds a distinct baseline variable.

library(margot)
library(dplyr)

set.seed(20260620)
n <- 900

sim <- tibble(
  age_z = rnorm(n),
  status_z = rnorm(n),
  income_z = rnorm(n),
  baseline_y1 = rnorm(n),
  baseline_y2 = rnorm(n)
) |>
  mutate(
    propensity = plogis(-0.15 + 0.35 * age_z - 0.25 * status_z),
    exposure = rbinom(n(), 1, propensity),
    tau_y1 = 0.06 + 0.08 * (age_z > 0) + 0.04 * (status_z > 0),
    tau_y2 = 0.03 + 0.06 * (age_z > 0) - 0.05 * (income_z < -0.5),
    y1 = 0.25 * baseline_y1 + 0.15 * status_z + exposure * tau_y1 + rnorm(n(), sd = 0.8),
    y2 = 0.30 * baseline_y2 - 0.10 * income_z + exposure * tau_y2 + rnorm(n(), sd = 0.8)
  )

covariates <- sim |>
  select(age_z, status_z, income_z, baseline_y1, baseline_y2)

Estimate outcome-wide ATEs

The ATE layer uses the fitted forests and grf::average_treatment_effect(). Do not add an external policy-tree cross-validation step to the ATE estimate.

fit <- margot_causal_forest(
  data = sim,
  outcome_vars = c("y1", "y2"),
  covariates = covariates,
  W = sim$exposure,
  weights = NULL,
  use_train_test_split = FALSE,
  compute_rate = FALSE,
  compute_conditional_means = FALSE,
  save_models = TRUE,
  save_data = TRUE,
  verbose = FALSE
)

ate_table <- margot_recompute_ate(fit)
ate_table

Add bridge diagnostics

grf::test_calibration() evaluates whether forest predictions are calibrated on held-out forest predictions. The differential-prediction coefficient is also an omnibus diagnostic for heterogeneity. grf::variable_importance() is a descriptive split-use summary. It should not be interpreted as a confirmed moderator test.

calibration <- lapply(fit$full_models, grf::test_calibration)

importance <- lapply(fit$full_models, function(forest) {
  tibble(
    variable = colnames(covariates),
    importance = as.numeric(grf::variable_importance(forest))
  ) |>
    arrange(desc(importance))
})

calibration
importance

Evaluate policy trees on held-out folds

The policy-tree layer learns trees on training folds and evaluates the learned tree on held-out folds. The output includes policy value, split frequencies, threshold summaries, and leaf-level signed treatment-control contrasts. Tree-level policy value is summarised against all-control, all-treatment, and the best constant action, so an all-treatment or all-control tree is not mistaken for a selective rule.

policy_cv <- margot_policy_tree_cv(
  fit,
  verbose = FALSE
)

policy_cv$depth_selection
policy_cv$value_summary
margot_table_policy_value(policy_cv)
policy_cv$leaf_summary

The first call uses the workflow defaults. To make the defaults explicit in a registration, record them as:

# record the standard policy-tree validation settings in the study protocol.
policy_tree_settings <- list(
  depths = c(1L, 2L),
  num_folds = 5L,
  n_repeats = 20L,
  min_gain_for_depth_switch = 0.01,
  max_stability_loss_for_depth_switch = 0.05,
  tree_method = "fastpolicytree"
)

Users may restrict candidate policy-tree variables when the scientific question justifies it. For confirmatory analyses, variable restrictions should be pre-specified or chosen inside the training folds.

policy_cv_subset <- margot_policy_tree_cv(
  fit,
  custom_covariates = c("age_z", "status_z", "income_z"),
  covariate_mode = "custom",
  verbose = FALSE
)

Report a selected policy tree

The held-out CV object selects the reporting depth. The display tree then shows the fitted full-sample tree at that depth. Display-tree leaf labels describe the displayed tree; the held-out CV object remains the source for depth, value, and split-frequency claims.

selected_depth <- policy_cv$depth_map[["model_y1"]]

# use the policy-named wrapper for the branching decision tree.
margot_plot_policy_decision_tree(
  fit,
  model_name = "model_y1",
  max_depth = selected_depth,
  show_leaf_metrics = TRUE
)

Use margot_plot_policy_projection() when the intended artefact is the evaluation-point projection, and margot_plot_policy_tree_panels() when the report should show the branching tree in panel A and the projection in panel B. The older margot_plot_decision_tree() and margot_plot_policy_tree() remain available, but the policy-named wrappers make the intended plot type explicit.

The modular reporting helpers expose the same convention at different levels:

# compute low-level leaf diagnostics for the selected display tree.
leaf_diagnostics <- margot_policy_leaf_summary(
  fit,
  model_name = "model_y1",
  depth = selected_depth
)

# create the public display-tree table with signed T-C first.
display_leaf_table <- margot_table_policy_tree(
  fit,
  model_name = "model_y1",
  depth = selected_depth
)

# create the held-out action-score summary table at the selected depth.
heldout_leaf_table <- margot_table_policy_tree(
  policy_cv,
  model_name = "model_y1",
  source = "heldout_cv"
)

# compare the learned policy with universal action baselines.
value_table <- margot_table_policy_value(
  policy_cv,
  model_name = "model_y1",
  depth = selected_depth
)

# generate cautious stock text for a manuscript or report.
policy_text <- margot_text_policy_tree(source = "heldout_cv")

display_leaf_table
heldout_leaf_table
value_table
policy_text

For exploratory diagnostics, callers can expose the lower-level selected-action advantage and value-contribution columns. These columns are useful for auditing the score arithmetic but should not lead public manuscript tables.

# request diagnostic columns only when auditing the policy-tree arithmetic.
margot_table_policy_tree(
  fit,
  model_name = "model_y1",
  depth = selected_depth,
  include_selected_action_difference = TRUE,
  include_value_contribution = TRUE
)

The integrated helper assembles the standard plot, display-tree table, held-out leaf table, held-out policy-value table, and interpretation text. Each component remains an ordinary R object, so manuscript workflows can replace or omit pieces without losing the shared reporting convention.

# assemble the standard policy-tree report components for one outcome.
policy_report <- margot_report_policy_tree(
  fit,
  model_name = "model_y1",
  policy_cv = policy_cv
)

names(policy_report)
policy_report$metadata
policy_report$table
policy_report$heldout_table
policy_report$policy_value
policy_report$text
policy_report$plots$combined_plot

The table-level metadata flags whether the selected actions vary across leaves. When uniform_selected_action is TRUE, avoid describing the tree as a selective rule even if the tree has multiple leaves.

# check whether the selected action changes across leaves.
display_leaf_table |>
  select(
    model,
    depth,
    node_id,
    selected_action,
    tc_score_contrast,
    n_selected_actions,
    uniform_selected_action
  )

The split summaries remain separate from leaf contrasts:

# inspect root-split stability at the selected reporting depth.
policy_cv$split_summary |>
  filter(model == "model_y1", depth == selected_depth, node_id == 1)

Summarise outcome-wide recurrence

Outcome-wide recurrence asks whether the same baseline variables recur across outcomes. This layer is descriptive unless a study defines a formal family-level target.

recurrence <- margot_policy_recurrence_summary(policy_cv)
recurrence

A cautious report might say:

Age recurred as a root or near-root policy-tree variable across both outcomes, while held-out value against the best constant action was small. We treat age as a recurring exploratory organiser, not a confirmed moderator.

Optional extensions

RATE/AUTOC can be added when investigators need an explicit heterogeneity test. When used, RATE/AUTOC should be cross-validated. Qini/uplift curves remain optional and exploratory until the analysis has a clearly defined cost-benefit interpretation.