Overview
This vignette introduces the key components of the margot package for causal inference with longitudinal data. The package provides tools for three main stages:
- Data Preparation: Converting longitudinal data to wide format
- Causal Inference: Estimating treatment effects using causal forests
- Interpretation: Visualising and understanding results
Installation
The margot package has a modular design. Install only what you need:
# core package (data manipulation and basic functions)
install.packages("margot")
# for causal inference models (optional)
install.packages(c("grf", "lmtp", "policytree", "maq"))
# for visualisation (optional)
install.packages(c("ggplot2", "patchwork", "ggokabeito"))
# for reporting tables (optional)
install.packages(c("gt", "gtsummary", "flextable"))
# or install everything at once (future)
# install.packages("margot.models") # all estimation packages
# install.packages("margot.viz") # all visualisation packages
# install.packages("margot.report") # all reporting packages
Stage 1: Data Preparation
Loading and Exploring Data
library(margot)
library(dplyr)
# the package includes example data
data(df_nz)
# define variable groups
baseline_vars <- c(
"male", "age", "eth_cat", "partner", "agreeableness",
"conscientiousness", "extraversion", "honesty_humility",
"openness", "neuroticism", "sample_weights"
)
exposure_var <- "forgiveness"
outcome_vars <- c(
"alcohol_frequency", "alcohol_intensity",
"hours_exercise", "hours_work", "life_satisfaction"
)
Converting to Wide Format
The margot_wide_machine()
function handles the complete
data preparation pipeline:
# prepare data for causal inference
wide_data <- margot_wide_machine(
data = df_nz,
baseline_vars = baseline_vars,
exposure_var = exposure_var,
outcome_vars = outcome_vars
)
# check the structure
str(wide_data)
Stage 2: Causal Inference
Estimating Treatment Effects
# run causal forest (requires grf package)
results <- margot_causal_forest(
data = wide_data,
exposure = exposure_var,
outcomes = outcome_vars,
baseline_vars = baseline_vars,
weights = "sample_weights"
)
Screening for Heterogeneity
# identify which outcomes show treatment effect heterogeneity
heterogeneity_results <- margot_rate(results)
Policy Learning
# learn optimal treatment policies
policy_results <- margot_policy(
results,
outcomes = outcome_vars,
baseline_vars = baseline_vars
)
Stage 3: Interpretation and Visualisation
Visualising Treatment Effects
# plot average treatment effects (requires ggplot2)
margot_plot(
results,
type = "effects",
title = "Average Treatment Effects of Forgiveness"
)
# create table output (requires gt)
margot_plot(
results,
type = "table",
format = "publication"
)
Understanding Heterogeneity
# visualise policy trees
margot_plot_policy_tree(
policy_results,
outcome = "hours_exercise"
)
# plot qini curves
margot_plot_qini(
policy_results,
outcome = "hours_exercise"
)
Working with Missing Packages
If you haven’t installed optional packages, margot provides helpful error messages:
# example: trying to use causal forest without grf installed
# margot_causal_forest(wide_data)
# Error: Package 'grf' is required for margot_causal_forest() (causal forest estimation).
# Install it with: install.packages('grf')
# For all estimation packages: install.packages('margot.models')
Simulating Data for Testing
The margot_simulate()
function allows you to generate
synthetic longitudinal data with known treatment effects:
# simulate data with known treatment effect
sim_data <- margot_simulate(
n = 500, # 500 individuals
waves = 3, # 3 time points
p_covars = 2, # 2 time-varying covariates
exposure_outcome = 0.6, # true treatment effect
positivity = "good", # well-behaved propensity scores
outcome_type = "continuous",# continuous outcomes
wide = TRUE, # return wide format
seed = 123 # for reproducibility
)
# simulate with treatment feedback and censoring
complex_sim <- margot_simulate(
n = 1000,
waves = 5,
y_feedback = 0.5, # past outcome affects future treatment
covar_feedback = 0.3, # treatment affects future covariates
censoring = list(
rate = 0.2,
exposure_dependence = TRUE # censoring depends on treatment
),
seed = 456
)
Example Analysis
Here’s a minimal example to get started:
# load packages
library(margot)
library(dplyr)
# prepare data
data(df_nz)
# define variables
baseline_vars <- c("male", "age", "partner")
exposure_var <- "forgiveness"
outcome_vars <- c("hours_exercise")
# run complete pipeline
wide_data <- margot_wide_machine(
df_nz, baseline_vars, exposure_var, outcome_vars
)
# estimate effects (requires grf)
if (requireNamespace("grf", quietly = TRUE)) {
results <- margot_causal_forest(
wide_data, exposure_var, outcome_vars, baseline_vars
)
# extract and view the average treatment effect
ate <- results$ate
cat("Average Treatment Effect:", round(ate$estimate, 3), "\n")
cat("95% CI: [", round(ate$ci_lower, 3), ",", round(ate$ci_upper, 3), "]\n")
# visualise if ggplot2 is available
if (requireNamespace("ggplot2", quietly = TRUE)) {
margot_plot(results, type = "effects")
}
} else {
message("Install the 'grf' package to run causal forest analysis")
}
Further Resources
- Package documentation: https://go-bayes.github.io/margot/
- GitHub repository: https://github.com/go-bayes/margot
- Course materials: https://go-bayes.github.io/psych-434-2025/