Development Version of Causal Forest with Enhanced Features
Source:R/margot_causal_forest_dev.R
margot_causal_forest_dev.Rd
This development version implements the new architecture with proper train/test splits, evaluation forests, and integrated missing data handling. It maintains a 50/50 train/test split by default and supports toy data sampling.
Usage
margot_causal_forest_dev(
data,
outcome_vars,
treatment = "A",
covariates = NULL,
weights = NULL,
train_prop = 0.5,
toy_data_prop = NULL,
grf_defaults = list(),
eval_forest = TRUE,
save_data = TRUE,
save_forests = TRUE,
handle_missing = "complete",
seed = 12345,
verbose = TRUE
)
Arguments
- data
Data frame containing all variables
- outcome_vars
Character vector of outcome variable names
- treatment
Character name of treatment variable (default: "A")
- covariates
Character vector of covariate names. If NULL, uses all variables except outcomes, treatment, and special columns.
- weights
Optional weights vector or column name
- train_prop
Numeric. Proportion for train/test split (default: 0.5)
- toy_data_prop
Numeric. If provided, uses a random sample of this proportion for testing (default: NULL)
- grf_defaults
List of parameters passed to grf::causal_forest()
- eval_forest
Logical. Whether to create evaluation forest for test set doubly robust scores (default: TRUE)
- save_data
Logical. Whether to save input data (default: TRUE)
- save_forests
Logical. Whether to save forest objects (default: TRUE)
- handle_missing
Character. How to handle missing data: "complete" (default), "impute", or "forest" (let forest handle it)
- seed
Integer. Random seed (default: NULL)
- verbose
Logical. Whether to print progress (default: TRUE)
Value
List containing:
- results
List of model results by outcome
- data_info
Information about data splits and missing values
- forests
Causal forest objects (if save_forests = TRUE)
- eval_forests
Evaluation forest objects (if eval_forest = TRUE)
- data
Original data (if save_data = TRUE)
- metadata
Model metadata and parameters
Details
Key differences from original margot_causal_forest(): - Consistent 50/50 train/test split for all analyses - Optional evaluation forests for computing test set DR scores - Integrated missing data handling options - Support for toy data sampling - Simplified structure focused on forest estimation - Returns data needed for downstream analysis functions
The evaluation forest is trained on the training data and used to compute doubly robust scores for the test set, ensuring proper out-of-sample evaluation for QINI curves and other metrics.
Examples
if (FALSE) { # \dontrun{
# Generate test data
test_data <- margot_simulate_test_data()
# Run causal forest with default settings
cf_results <- margot_causal_forest_dev(
data = test_data$data,
outcome_vars = c("Y1", "Y2", "Y3", "Y4"),
treatment = "A"
)
# Run with toy data for quick testing
cf_results_toy <- margot_causal_forest_dev(
data = test_data$data,
outcome_vars = c("Y1", "Y2"),
treatment = "A",
toy_data_prop = 0.1
)
} # }