Cross-Validation Test for Treatment Effect Heterogeneity
Source:R/margot_rate_cv.R
margot_rate_cv.Rd
Performs uncorrelated sequential cross-validation to test for treatment effect heterogeneity using the approach recommended by the grf package (Wager 2024). This method provides robust statistical testing that avoids overfitting by using sequential, non-overlapping training sets.
Usage
margot_rate_cv(
model_results,
num_folds = 5,
target = c("AUTOC", "QINI"),
model_names = NULL,
use_full_data = TRUE,
seed = 12345,
verbose = TRUE,
parallel = FALSE,
n_cores = future::availableCores() - 1,
future_globals_maxSize = 22,
alpha = 0.05,
adjust = "none",
label_mapping = NULL,
...
)
Arguments
- model_results
Output from `margot_causal_forest()` or `margot_flip_forests()`
- num_folds
Integer. Number of cross-validation folds (default 5). Must be at least 3.
- target
Character vector. RATE targets to test: "AUTOC", "QINI", or c("AUTOC", "QINI") for both (default). When both are specified, results are combined with adjusted p-values accounting for testing both metrics.
- model_names
Character vector. Specific models to test (default NULL tests all)
- use_full_data
Logical. If TRUE and data is available, refit forests on full data rather than using the training split (default TRUE)
- seed
Integer or NULL. Random seed for reproducibility. If NULL, defaults to 12345
- verbose
Logical. Print progress messages (default TRUE)
- parallel
Logical. Use parallel processing for multiple models (default FALSE). Note: Parallel processing requires the margot package to be installed (not just loaded with devtools::load_all()). If the package is not installed, the function will fall back to sequential processing with a warning.
- n_cores
Integer. Number of cores for parallel processing when parallel = TRUE (default all cores - 1)
- future_globals_maxSize
Numeric. Maximum allowed size for exporting globals to parallel workers in GiB. Default is 22. Set to Inf to disable the limit.
- alpha
Numeric. Significance level for hypothesis tests (default 0.05). When using Bonferroni correction, consider alpha = 0.2 due to its conservative nature.
- adjust
Character. Multiple testing adjustment method. Only "bonferroni" or "none" are valid for cross-validation due to the martingale aggregation method. Default is "none". Other methods like BH/BY/FDR are not statistically appropriate for CV aggregation.
- label_mapping
Optional named list for custom label mappings. Keys should be model names (with or without "model_" prefix), and values should be the desired display labels.
- ...
Additional arguments passed to grf::causal_forest
Value
A list containing:
- cv_results
Data frame with CV test results for each model
- interpretation
Character string interpreting the results
- significant_models
Character vector of models with significant heterogeneity
- tables
List of formatted tables similar to margot_rate() output (rate_autoc, rate_qini)
- method_details
List with details about the CV procedure
Details
The function implements the sequential cross-validation approach from Wager (2024) and Nie & Wager (2020). This method: - Splits data into K folds - Trains on folds 1 to k-1, tests on fold k - Aggregates t-statistics using the martingale property - Provides valid p-values for heterogeneity testing
The null hypothesis is that there is no treatment effect heterogeneity (all individuals have the same treatment effect).
## Multiple Testing Correction
Due to the martingale aggregation method used in CV, only Bonferroni correction or no correction are statistically valid. Methods like Benjamini-Hochberg (BH) or false discovery rate (FDR) control are not appropriate because they assume independent p-values, which is violated by the sequential CV structure.
Bonferroni correction is very conservative, especially with noisy heterogeneous treatment effect models. We recommend using alpha = 0.2 when applying Bonferroni correction to maintain reasonable statistical power.
For memory efficiency with large model objects, the function pre-extracts only the necessary data for each model before parallel processing. However, parallel processing is currently experimental and disabled by default due to potential memory issues with large model objects. If you want to try parallel processing, set parallel = TRUE and consider increasing the future.globals.maxSize option with: options(future.globals.maxSize = 15 * 1024^3) # 15 GB
References
Wager, S. (2024). Sequential validation for heterogeneous treatment effects. arXiv preprint arXiv:2401.06818.
Nie, X., & Wager, S. (2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 299-319.
Examples
if (FALSE) { # \dontrun{
# Run causal forest
cf_results <- margot_causal_forest(
data = mydata,
outcome_vars = c("outcome1", "outcome2"),
covariates = covariates,
W = treatment,
save_data = TRUE # Important for CV
)
# Test for heterogeneity with cross-validation (both targets by default)
cv_results <- margot_rate_cv(
model_results = cf_results,
num_folds = 5,
alpha = 0.2, # recommended with Bonferroni
adjust = "bonferroni"
)
# With custom labels
label_mapping <- list(
"model_happiness" = "Happiness",
"model_depression" = "Depression"
)
cv_results <- margot_rate_cv(
model_results = cf_results,
label_mapping = label_mapping
)
# Test single target only
cv_results_autoc <- margot_rate_cv(
model_results = cf_results,
target = "AUTOC"
)
# View results
cat(cv_results$interpretation)
# Access formatted tables (similar to margot_rate output)
cv_results$tables$rate_autoc # AUTOC table
cv_results$tables$rate_qini # QINI table
# Get detailed interpretation with tables (similar to margot_interpret_rate)
detailed_results <- margot_interpret_rate_cv(cv_results)
cat(detailed_results$interpretation)
# View individual target interpretations
cat(detailed_results$autoc_results)
cat(detailed_results$qini_results)
# Comparison between AUTOC and QINI
if (!is.null(detailed_results$comparison)) {
cat(detailed_results$comparison)
}
# Create visualization
plot <- margot_plot_cv_results(cv_results)
} # }