Skip to contents

Imputes missing values in longitudinal data by carrying forward previous observations up to a specified number of time points back. By default, it never imputes data for the final wave (end-of-study). Optionally, it can create indicator variables for imputed values.

Usage

margot_impute_carry_forward(
  df_wide,
  columns_to_impute,
  max_carry_forward = 1,
  time_point_prefixes = NULL,
  time_point_regex = NULL,
  require_one_observed = TRUE,
  columns_no_future_required = NULL,
  create_na_indicator = TRUE,
  indicator_suffix = "_na",
  indicator_as_suffix = TRUE,
  verbose = TRUE,
  impute_final_wave = FALSE
)

Arguments

df_wide

A wide-format dataframe containing longitudinal data.

columns_to_impute

Character vector of base column names to impute (without time prefixes).

max_carry_forward

Maximum number of time points to look back for carrying forward values.

time_point_prefixes

Optional vector of time point prefixes (e.g., c("t0", "t1", "t2")).

time_point_regex

Optional regex pattern to identify time points. Overrides time_point_prefixes if provided.

require_one_observed

Logical. If TRUE, only impute if at least one value is observed in a following wave.

columns_no_future_required

Character vector of columns that do not require future observations for imputation. Defaults to all columns if require_one_observed = FALSE, or none if require_one_observed = TRUE.

create_na_indicator

Logical. If TRUE, creates indicator variables for imputed values.

indicator_suffix

Suffix to add to the original column name for the indicator variable (default is "_na").

indicator_as_suffix

Logical. If TRUE, the indicator suffix is added as a suffix; if FALSE, it's added as a prefix.

verbose

Logical. If TRUE, prints progress information.

impute_final_wave

Logical. If FALSE (default), the final wave (end-of-study) is never imputed. If TRUE, the final wave can be imputed like other waves.

Value

A dataframe with imputed values and optional indicator variables.

Examples

if (FALSE) { # \dontrun{
# Example dataframe
df <- data.frame(
  id = 1:5,
  t0_var1 = c(1, NA, 3, NA, 5),
  t1_var1 = c(NA, 2, NA, 4, NA),
  t2_var1 = c(1, NA, 3, NA, 5),
  t0_var2 = c(NA, 2, NA, 4, 5),
  t1_var2 = c(1, NA, 3, NA, NA),
  t2_var2 = c(NA, 2, NA, 4, 5),
  t2_end_of_study = c(1, 1, 1, 1, 1)  # End-of-study indicator
)

# Impute missing values without imputing the final wave
df_imputed <- margot_impute_carry_forward(
  df_wide = df,
  columns_to_impute = c("var1", "var2"),
  max_carry_forward = 2,
  create_na_indicator = TRUE,
  verbose = TRUE
)

# Impute missing values including the final wave
df_imputed_final <- margot_impute_carry_forward(
  df_wide = df,
  columns_to_impute = c("var1", "var2"),
  max_carry_forward = 2,
  create_na_indicator = TRUE,
  impute_final_wave = TRUE,
  verbose = TRUE
)
} # }