Run Multiple Generalized Random Forest (GRF) Causal Forest Models with Enhanced Qini Cross-Validation

This function runs multiple GRF causal forest models with enhanced features. In addition to estimating causal effects, it can compute the Rank-Weighted Average Treatment Effect (RATE) for each model. It also gives you the option to train a separate "Qini forest" on a subset of data and compute Qini curves on held-out data, thereby avoiding in-sample optimism in the Qini plots.

Usage

margot_causal_forest(
  data,
  outcome_vars,
  covariates,
  W,
  weights,
  grf_defaults = list(),
  save_data = FALSE,
  compute_rate = TRUE,
  top_n_vars = 15,
  save_models = TRUE,
  train_proportion = 0.7,
  qini_split = TRUE,
  qini_train_prop = 0.7,
  verbose = TRUE
)

Arguments

data: A data frame containing all necessary variables.
outcome_vars: A character vector of outcome variable names to be modelled.
covariates: A matrix of covariates to be used in the GRF models.
W: A vector of binary treatment assignments.
weights: A vector of weights for the observations.
grf_defaults: A list of default parameters for the GRF models.
save_data: Logical indicating whether to save data, covariates, and weights. Default is FALSE.
compute_rate: Logical indicating whether to compute RATE for each model. Default is TRUE.
top_n_vars: Integer specifying the number of top variables to use for additional computations. Default is 15.
save_models: Logical indicating whether to save the full GRF model objects. Default is TRUE.
train_proportion: Numeric value between 0 and 1 indicating the proportion of non-missing data to use for training policy trees. Default is 0.7.
qini_split: Logical indicating whether to do a separate train/test split exclusively for the Qini calculation. Default is TRUE (i.e., Qini is computed out-of-sample).
qini_train_prop: Proportion of data to use for the Qini training set (if qini_split=TRUE). Default is 0.7.
verbose: Logical indicating whether to display detailed messages during execution. Default is TRUE.

Value

A list containing model results, a combined table, and other relevant information.