Skip to contents

This function processes wide-format longitudinal data with multiple time points:

- For each wave t < final wave: - If wave t+1 **has exposure columns**, a participant remains "not lost" at wave t only if *all* exposures at wave t+1 are present (no missing). Otherwise, they are censored at wave t. - If wave t+1 **has no exposures** (i.e., final wave is purely outcomes), we require *all* final-wave outcomes to be present. If *any* final-wave outcome is missing, the participant is censored from wave t onward.

Censoring sets all future waves to `NA`, and once censored, participants remain censored.

Usage

.strict_exposure_outcome_censoring(
  df_wide,
  exposure_vars,
  ordinal_columns = NULL,
  continuous_columns_keep = NULL,
  scale_exposure = FALSE,
  not_lost_in_following_wave = "not_lost_following_wave",
  lost_in_following_wave = "lost_following_wave",
  remove_selected_columns = TRUE,
  time_point_prefixes = NULL,
  time_point_regex = NULL,
  save_observed_y = FALSE
)

Arguments

df_wide

A wide-format dataframe with columns like t0_X, t1_X, t2_X, etc.

exposure_vars

Character vector of all exposure names (e.g. c("aaron_antagonism", "aaron_disinhibition", ...)).

ordinal_columns

Character vector of ordinal (factor) variables to be dummy-coded.

continuous_columns_keep

Numeric columns you do NOT want to scale (e.g. if they must remain in original units).

scale_exposure

If FALSE, do not scale exposures; if TRUE, exposures are also scaled.

not_lost_in_following_wave

Name for the "not lost" indicator (default "not_lost_following_wave").

lost_in_following_wave

Name for the "lost" indicator (default "lost_following_wave").

remove_selected_columns

If TRUE, remove original columns after dummy-coding ordinal columns.

time_point_prefixes

Optional vector of wave prefixes (like c("t0","t1","t2")); if NULL, we auto-detect via regex.

time_point_regex

Regex used to detect wave prefixes if `time_point_prefixes` is NULL.

save_observed_y

If FALSE, set any missing final-wave outcomes to NA. If TRUE, keep partial final-wave outcomes.

Value

A processed dataframe, with strict all-or-nothing censoring on exposures in earlier waves, and outcome-based censoring for the final wave if it lacks exposures.

Details

**Core Logic** For wave t from 0 to T-2 (i.e., up to the penultimate wave):


  needed_exposures <- paste0(t+1, "_", exposure_vars)
  not_lost[t] = 1 if rowSums(!is.na(needed_exposures)) == length(needed_exposures)
               else 0

  if not_lost[t] = 0, set waves t+1..T to NA

If wave t+1 is the final wave and it has no exposures, we fallback to the final wave's outcome columns. Then "not_lost[t] = 1 if *all* final-wave outcomes are present, else 0".

This is a "strict" approach: if *any* exposure is missing at wave t+1, we censor from wave t onward.