Strict All-or-Nothing Censoring for Longitudinal Data
Source:R/helpers.R
dot-strict_exposure_outcome_censoring.Rd
This function processes wide-format longitudinal data with multiple time points:
- For each wave t < final wave: - If wave t+1 **has exposure columns**, a participant remains "not lost" at wave t only if *all* exposures at wave t+1 are present (no missing). Otherwise, they are censored at wave t. - If wave t+1 **has no exposures** (i.e., final wave is purely outcomes), we require *all* final-wave outcomes to be present. If *any* final-wave outcome is missing, the participant is censored from wave t onward.
Censoring sets all future waves to `NA`, and once censored, participants remain censored.
Usage
.strict_exposure_outcome_censoring(
df_wide,
exposure_vars,
ordinal_columns = NULL,
continuous_columns_keep = NULL,
scale_exposure = FALSE,
not_lost_in_following_wave = "not_lost_following_wave",
lost_in_following_wave = "lost_following_wave",
remove_selected_columns = TRUE,
time_point_prefixes = NULL,
time_point_regex = NULL,
save_observed_y = FALSE
)
Arguments
- df_wide
A wide-format dataframe with columns like t0_X, t1_X, t2_X, etc.
- exposure_vars
Character vector of all exposure names (e.g. c("aaron_antagonism", "aaron_disinhibition", ...)).
- ordinal_columns
Character vector of ordinal (factor) variables to be dummy-coded.
- continuous_columns_keep
Numeric columns you do NOT want to scale (e.g. if they must remain in original units).
- scale_exposure
If FALSE, do not scale exposures; if TRUE, exposures are also scaled.
- not_lost_in_following_wave
Name for the "not lost" indicator (default "not_lost_following_wave").
- lost_in_following_wave
Name for the "lost" indicator (default "lost_following_wave").
- remove_selected_columns
If TRUE, remove original columns after dummy-coding ordinal columns.
- time_point_prefixes
Optional vector of wave prefixes (like c("t0","t1","t2")); if NULL, we auto-detect via regex.
- time_point_regex
Regex used to detect wave prefixes if `time_point_prefixes` is NULL.
- save_observed_y
If FALSE, set any missing final-wave outcomes to NA. If TRUE, keep partial final-wave outcomes.
Value
A processed dataframe, with strict all-or-nothing censoring on exposures in earlier waves, and outcome-based censoring for the final wave if it lacks exposures.
Details
**Core Logic** For wave t from 0 to T-2 (i.e., up to the penultimate wave):
needed_exposures <- paste0(t+1, "_", exposure_vars)
not_lost[t] = 1 if rowSums(!is.na(needed_exposures)) == length(needed_exposures)
else 0
if not_lost[t] = 0, set waves t+1..T to NA
If wave t+1 is the final wave and it has no exposures, we fallback to the final wave's outcome columns. Then "not_lost[t] = 1 if *all* final-wave outcomes are present, else 0".
This is a "strict" approach: if *any* exposure is missing at wave t+1, we censor from wave t onward.