This function transforms longitudinal data from long format to wide format, ensuring that baseline measurements are correctly labeled and included. It handles multiple observations per subject across an indefinite number of waves, and allows for the specification of baseline variables, exposure variables, outcome variables, and time-varying confounders.
Source:R/margot_wide_machine.R
margot_wide_machine.Rd
This function transforms longitudinal data from long format to wide format, ensuring that baseline measurements are correctly labeled and included. It handles multiple observations per subject across an indefinite number of waves, and allows for the specification of baseline variables, exposure variables, outcome variables, and time-varying confounders.
Usage
margot_wide_machine(
.data,
id = "id",
wave = "wave",
baseline_vars,
exposure_var,
outcome_vars,
confounder_vars = NULL,
imputation_method = "median",
include_exposure_var_baseline = TRUE,
include_outcome_vars_baseline = TRUE
)
Arguments
- .data
A data frame containing the longitudinal data in long format.
- id
The name of the ID column identifying subjects (default is "id").
- wave
The name of the wave/time column (default is "wave").
- baseline_vars
A character vector of baseline variable names to be included at t0.
- exposure_var
A character string specifying the name of the exposure variable to be tracked across time.
- outcome_vars
A character vector of outcome variable names to be tracked across time.
- confounder_vars
An optional character vector of time-varying confounder variable names to include without imputation (default is NULL).
- imputation_method
A character string specifying the imputation method to use for baseline variables. Options are 'median' (default), 'mice', or 'none'.
- include_exposure_var_baseline
Logical indicating whether to include the exposure variable at baseline (t0).
- include_outcome_vars_baseline
Logical indicating whether to include outcome variables at baseline (t0).
Value
A wide-format data frame with each subject's observations across time points represented in a single row. Baseline variables, exposure variables at baseline, and outcome variables at baseline (if included) have missing values imputed as specified. NA indicators are created for variables at baseline only if they have missing values. Exposure variables are tracked across waves but are not imputed beyond baseline. Outcome variables are included only at the final wave unless `include_outcome_vars_baseline` is `TRUE`. Confounders (if any) are included without imputation.
Details
Key functionalities: - **Imputation at Baseline**: Missing values are imputed at baseline (`t0`) for: - `baseline_vars` - `exposure_var` (if `include_exposure_var_baseline = TRUE`) - `outcome_vars` (if `include_outcome_vars_baseline = TRUE`) - 'median': For numeric variables, missing values are imputed with the median. For categorical variables, missing values are imputed with the mode. - 'mice': Multiple Imputation by Chained Equations is used. If MICE fails, the function falls back to median/mode imputation. - 'none': No imputation is performed. - **NA Indicators**: NA indicator variables are created **only** for baseline variables that have missing values. Each such variable at baseline will have a corresponding NA indicator with the suffix `_na`. - **Exposure Variables**: Tracked across waves but never imputed beyond baseline. - **Outcome Variables**: Included only at the final wave unless specified to include at baseline. - **Confounder Variables**: If specified, these are included across waves without any imputation. Missing values remain as `NA`. - **Variable Inclusion per Wave**: - **Baseline (`t0`)**: Includes `baseline_vars`, exposure variables (if included), outcome variables (if included), and their NA indicators (only if necessary). - **Waves `t1` to `t(y_-2)`**: Include only the exposure variables (and confounders if specified). - **Final Wave (`t(y_-1)`)**: Includes only the outcome variables.
Examples
# Define variables
baseline_vars <- c("age", "education", "income")
exposure_var <- "treatment"
outcome_vars <- c("health_score", "quality_of_life")
confounder_vars <- c("stress_level", "exercise_frequency")
# Transform data to wide format with baseline imputation
df_wide_impute <- margot_wide_machine(
data_long,
id = "patient_id",
wave = "visit_time",
baseline_vars = baseline_vars,
exposure_var = exposure_var,
outcome_vars = outcome_vars,
confounder_vars = confounder_vars,
imputation_method = "mice",
include_exposure_var_baseline = TRUE,
include_outcome_vars_baseline = TRUE
)
#> ℹ Starting data transformation...
#> ℹ Pre-processing data...
#> Error: object 'data_long' not found