Transform longitudinal data to wide format with baseline imputation and optional NA indicators
Source:R/margot_wide_machine.R
margot_wide_machine.Rd
This function transforms longitudinal data from long format to wide format, ensuring that baseline measurements are correctly labeled and included. It handles multiple observations per subject across an indefinite number of waves, and allows for the specification of baseline variables, exposure variables, outcome variables, and time-varying confounders.
Usage
margot_wide_machine(
.data,
id = "id",
wave = "wave",
baseline_vars,
exposure_var,
outcome_vars,
confounder_vars = NULL,
imputation_method = c("median", "mice", "none"),
include_exposure_var_baseline = TRUE,
include_outcome_vars_baseline = TRUE,
extend_baseline = FALSE,
include_na_indicators = TRUE
)
Arguments
- .data
A data frame containing the longitudinal data in long format.
- id
The name of the ID column identifying subjects (default is "id").
- wave
The name of the wave/time column (default is "wave").
- baseline_vars
A character vector of baseline variable names to be included at t0.
- exposure_var
A character string specifying the name of the exposure variable to be tracked across time.
- outcome_vars
A character vector of outcome variable names to be tracked across time.
- confounder_vars
An optional character vector of time-varying confounder variable names to include without imputation (default is NULL).
- imputation_method
A character string specifying the imputation method to use for baseline variables. Options are 'median' (default), 'mice', or 'none'.
- include_exposure_var_baseline
Logical indicating whether to include the exposure variable at baseline (t0).
- include_outcome_vars_baseline
Logical indicating whether to include outcome variables at baseline (t0).
- extend_baseline
Logical indicating whether to include baseline_vars in all subsequent waves (default FALSE).
- include_na_indicators
Logical indicating whether to generate NA indicator columns for baseline variables (default TRUE).
Value
A wide-format data frame with each subject's observations across time points represented in a single row. Baseline variables, exposure variables at baseline, and outcome variables at baseline have missing values imputed as specified. NA indicators are created for variables at baseline only if include_na_indicators is TRUE. Exposure variables are tracked across waves but are not imputed beyond baseline. Outcome variables are included only at the final wave unless include_outcome_vars_baseline is TRUE. Confounders (if any) are included without imputation.