Impute Missing Values Using Carry Forward in Longitudinal Data
Source:R/margot_impute_carry_forward.R
margot_impute_carry_forward.Rd
Imputes missing values in longitudinal data by carrying forward previous observations up to a specified number of time points back. By default, it never imputes data for the final wave (end-of-study). Optionally, it can create indicator variables for imputed values.
Usage
margot_impute_carry_forward(
df_wide,
columns_to_impute,
max_carry_forward = 1,
time_point_prefixes = NULL,
time_point_regex = NULL,
require_one_observed = TRUE,
columns_no_future_required = NULL,
create_na_indicator = TRUE,
indicator_suffix = "_na",
indicator_as_suffix = TRUE,
verbose = TRUE,
impute_final_wave = FALSE
)
Arguments
- df_wide
A wide-format dataframe containing longitudinal data.
- columns_to_impute
Character vector of base column names to impute (without time prefixes).
- max_carry_forward
Maximum number of time points to look back for carrying forward values.
- time_point_prefixes
Optional vector of time point prefixes (e.g.,
c("t0", "t1", "t2")
).- time_point_regex
Optional regex pattern to identify time points. Overrides
time_point_prefixes
if provided.- require_one_observed
Logical. If
TRUE
, only impute if at least one value is observed in a following wave.- columns_no_future_required
Character vector of columns that do not require future observations for imputation. Defaults to all columns if
require_one_observed = FALSE
, or none ifrequire_one_observed = TRUE
.- create_na_indicator
Logical. If
TRUE
, creates indicator variables for imputed values.- indicator_suffix
Suffix to add to the original column name for the indicator variable (default is
"_na"
).- indicator_as_suffix
Logical. If
TRUE
, the indicator suffix is added as a suffix; ifFALSE
, it's added as a prefix.- verbose
Logical. If
TRUE
, prints progress information.- impute_final_wave
Logical. If
FALSE
(default), the final wave (end-of-study) is never imputed. IfTRUE
, the final wave can be imputed like other waves.
Examples
if (FALSE) { # \dontrun{
# Example dataframe
df <- data.frame(
id = 1:5,
t0_var1 = c(1, NA, 3, NA, 5),
t1_var1 = c(NA, 2, NA, 4, NA),
t2_var1 = c(1, NA, 3, NA, 5),
t0_var2 = c(NA, 2, NA, 4, 5),
t1_var2 = c(1, NA, 3, NA, NA),
t2_var2 = c(NA, 2, NA, 4, 5),
t2_end_of_study = c(1, 1, 1, 1, 1) # End-of-study indicator
)
# Impute missing values without imputing the final wave
df_imputed <- margot_impute_carry_forward(
df_wide = df,
columns_to_impute = c("var1", "var2"),
max_carry_forward = 2,
create_na_indicator = TRUE,
verbose = TRUE
)
# Impute missing values including the final wave
df_imputed_final <- margot_impute_carry_forward(
df_wide = df,
columns_to_impute = c("var1", "var2"),
max_carry_forward = 2,
create_na_indicator = TRUE,
impute_final_wave = TRUE,
verbose = TRUE
)
} # }