Simulate longitudinal exposures, outcomes, and covariates

`margot_simulate()` draws baseline covariates (`B`), time-varying covariates (`L`), exposures (`A`), and lead outcomes (`Y`) for a synthetic panel study. Monotone attrition can depend on past exposure and/or a latent shared frailty, and an optional indicator column records whether each unit remains uncensored at every wave. The simulator supports heterogeneous treatment effects, feedback from previous outcomes into future exposures, and marginal item missingness.

Usage

margot_simulate(
  n,
  waves,
  exposures = NULL,
  outcomes = NULL,
  p_covars = 20,
  censoring = list(rate = 0.25),
  item_missing_rate = 0,
  exposure_outcome = 0.5,
  y_feedback = 0.4,
  positivity = "good",
  outcome_type = NULL,
  wide = TRUE,
  seed = NULL,
  params = list(),
  ...
)

Arguments

n: Number of individuals.
waves: Number of follow-up waves (outcomes are produced for wave waves + 1).
exposures: Named list describing each exposure. Every element must contain a type field (`"binary"` or `"normal"`). Optional elements: het (baseline modifiers) and lag_Y = TRUE to enable outcome-to-exposure feedback.
outcomes: Named list describing outcomes. Defaults to a single normal outcome called `"Y"`.
p_covars: Number of baseline (`B`) covariates.
censoring: List controlling attrition. Must include rate; optional logical flags exposure_dependence, latent_dependence, numeric latent_rho, and logical indicator to append tX_not_censored columns.
item_missing_rate: MCAR probability an observed value is replaced by NA.
exposure_outcome: Coefficient for the exposure → outcome path.
y_feedback: Coefficient for lagged outcome feedback when an exposure lists lag_Y = TRUE.
positivity: `"good"`, `"poor"`, or a numeric probability in (0, 1) governing baseline exposure prevalence.
outcome_type: Shortcut for a single outcome: `"continuous"` (default) or `"binary"`. Ignored when outcomes is supplied.
wide: If TRUE (default) return a wide data set; otherwise long format.
seed: Integer seed for reproducibility.
params: Named list of scalar coefficients (see above).
...: Deprecated arguments; ignored with a warning.

Value

A `tibble` in wide or long form containing baseline `B` variables, time-varying `L` covariates, `A` exposures, optional censoring indicators, and lead outcomes `Y`. The object carries an attribute "margot_meta" with the matched call and a timestamp.

Details

The default parameter set is


 .default_sim_params()
 ## $cens_exp_coef   0.4
 ## $cens_latent_rho 0.5
 ## $exp_intercept   -0.2
 ## $exp_L1_coef     0.2
 ## $out_B1_coef     0.1

The `params` argument

Supply a named list to override the internal defaults given by .default_sim_params(). Typical entries include cens_exp_coef, exp_L1_coef, and out_B1_coef.

Examples

## basic usage
dat <- margot_simulate(n = 200, waves = 3, seed = 1)
dplyr::glimpse(dat)
#> Rows: 200
#> Columns: 35
#> $ id    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
#> $ B1    <dbl> 1.38717622, 0.30733801, 1.52385587, -1.45786586, -1.31218682, 0.…
#> $ B2    <dbl> 0.56867050, 0.32317764, -0.79634791, -0.84417680, -0.02831455, -…
#> $ B3    <dbl> 0.359327067, -0.498545661, 0.645835598, -1.284425798, 0.62591385…
#> $ B4    <dbl> 1.13120902, 2.15236607, 0.50850055, -1.05442367, -0.39867018, 0.…
#> $ B5    <dbl> -0.09360758, -0.65509314, -0.25503349, 0.02718249, 2.37682411, -…
#> $ B6    <dbl> 1.3358744996, -0.9704755209, 0.3616893118, -1.4063023965, -0.931…
#> $ B7    <dbl> -1.576558708, -0.088479569, 0.487995724, -2.264870579, 0.1313669…
#> $ B8    <dbl> 1.20071079, -0.59614574, -1.04053816, 0.47034678, 0.56500777, 0.…
#> $ B9    <dbl> 0.97922287, 0.42597734, -0.01367560, 0.58258505, -0.26852743, 2.…
#> $ B10   <dbl> 1.19511520, 0.23398869, -0.57308037, -0.95525308, -0.34721398, 1…
#> $ B11   <dbl> -0.61221854, -2.04242795, 1.02058737, -1.07656682, -1.29516511, …
#> $ B12   <dbl> 0.53291378, 0.08565599, 0.03440012, -2.29850687, -1.24441193, 1.…
#> $ B13   <dbl> -0.176066882, 0.742283929, 0.333243297, -1.440735602, 0.05931179…
#> $ B14   <dbl> 1.93000559, 0.05657933, 0.40769632, -1.33184913, 0.05518891, -1.…
#> $ B15   <dbl> -0.319607922, -0.445111940, 0.453195957, -2.113141884, 0.1863665…
#> $ B16   <dbl> -0.51130370, -1.25585264, 1.82817341, 1.61178682, -0.04679639, -…
#> $ B17   <dbl> 0.89312064, -0.01132154, 1.52172495, -0.22530871, -0.86537188, 0…
#> $ B18   <dbl> -0.073517558, -1.502434951, 1.923492831, -0.785495001, -0.452210…
#> $ B19   <dbl> -0.5040000, 1.3892949, 0.5235887, -1.2756173, -0.3453183, 0.9109…
#> $ B20   <dbl> -0.3947381, 0.2234016, 0.7777960, -1.3440720, -0.2789659, 1.1170…
#> $ t0_A1 <int> 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0…
#> $ t1_L1 <dbl> -1.47983426, 1.02834657, -2.22105108, NA, -1.63855763, NA, 0.357…
#> $ t1_L2 <dbl> -0.302092925, 0.517340829, -1.164086557, NA, 1.274679847, NA, -0…
#> $ t1_L3 <dbl> -0.107179673, -0.298918555, 0.309770555, NA, 0.254681415, NA, 0.…
#> $ t1_A1 <dbl> 0, 1, 0, NA, 1, NA, 1, 0, 1, 0, 0, NA, 1, NA, 1, NA, 1, 1, 1, 0,…
#> $ t2_L1 <dbl> NA, NA, -0.7394305, NA, -1.5425303, NA, 0.3851541, -1.4544780, -…
#> $ t2_L2 <dbl> NA, NA, 1.56739714, NA, -1.34127895, NA, -0.93517145, -0.0457230…
#> $ t2_L3 <dbl> NA, NA, 0.97422723, NA, 1.33379026, NA, 0.27279618, 1.96492237, …
#> $ t2_A1 <dbl> NA, NA, 0, NA, 1, NA, 0, 0, 0, 0, NA, NA, 1, NA, NA, NA, 0, 1, 1…
#> $ t3_L1 <dbl> NA, NA, 1.28184341, NA, -1.36408357, NA, -0.03276689, -1.1094694…
#> $ t3_L2 <dbl> NA, NA, 0.16440270, NA, 0.34927903, NA, 0.95314451, -2.22791423,…
#> $ t3_L3 <dbl> NA, NA, 1.9731395, NA, -0.6404094, NA, -0.5885797, 1.2987237, NA…
#> $ t3_A1 <dbl> NA, NA, 0, NA, 0, NA, 0, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, 0…
#> $ t4_Y  <dbl> NA, NA, 0.7151895, NA, NA, NA, 0.3641293, -1.8084884, NA, NA, NA…

## heterogeneous treatment effect with censoring indicator
dat2 <- margot_simulate(
  n = 800,
  waves = 4,
  exposures = list(
    A1 = list(
      type = "binary",
      het  = list(modifier = "B2", coef = 0.6)
    )
  ),
  censoring = list(
    rate = 0.25, exposure_dependence = TRUE,
    indicator = TRUE
  ),
  seed = 42
)