Skip to contents

This function performs naive cross-sectional linear regressions of a single exposure variable on multiple outcome variables, ignoring potential confounders. It produces output compatible with margot_plot() to demonstrate what happens when proper causal inference methods are not used. The results should be interpreted as misspecified models that do not account for confounding.

Usage

margot_naive_regressions(
  data,
  exposure_var,
  outcome_vars,
  baseline_vars = NULL,
  scale = c("RD", "RR"),
  delta = 1,
  sd = 1,
  coefficient_scale = 1,
  save_output = FALSE,
  save_path = here::here("push_mods"),
  base_filename = "naive_regressions_output",
  use_timestamp = FALSE,
  prefix = NULL
)

Arguments

data

A data frame containing all necessary variables.

exposure_var

A character string specifying the exposure variable name.

outcome_vars

A character vector of outcome variable names to be modeled.

baseline_vars

Optional character vector of baseline variables to include as covariates in the regression models. Default is NULL (no additional covariates).

scale

Character string specifying the scale for E-value calculation. Options are "RD" (risk difference, default) or "RR" (risk ratio).

delta

The hypothesised increase in outcome for RD scale E-value calculations. Default value is 1.

sd

The standard deviation of the outcome for RD scale E-value calculations. Default value is 1.

coefficient_scale

Numeric value to scale coefficients by. Default is 1 (no scaling). Use this to interpret effects for multi-unit changes (e.g., set to 4 to get effects for a 4-unit change in the exposure variable).

save_output

Logical, whether to save the complete output. Default is FALSE.

save_path

The directory path to save the output. Default is "push_mods" in the current working directory.

base_filename

The base filename for saving the output. Default is "naive_regressions_output".

use_timestamp

Logical, whether to include a timestamp in the filename. Default is FALSE.

prefix

Optional prefix to add to the saved output filename. Default is NULL.

Value

A list containing:

models

A list of lm() model objects for each outcome.

combined_table

A data frame with columns E[Y|A], 2.5 and E_Val_bound, compatible with margot_plot().

individual_results

A list of individual regression summaries for each outcome.

Details

This function fits simple linear regressions of the form: outcome ~ exposure + baseline_vars. It calculates confidence intervals and E-values for each regression coefficient. The output uses "E[Y|A]" notation to indicate these are conditional expectations from naive regressions, not causal effects. The E-values calculated are technically incorrect since they assume causal interpretation of the coefficients.

This function is intended for educational purposes to demonstrate the difference between naive associations and properly estimated causal effects.

Examples

if (FALSE) { # \dontrun{
# perform naive regressions
naive_results <- margot_naive_regressions(
  data = my_data,
  exposure_var = "treatment",
  outcome_vars = c("outcome1_z", "outcome2_z", "outcome3_z")
)

# perform naive regressions with baseline covariates
naive_results_adjusted <- margot_naive_regressions(
  data = my_data,
  exposure_var = "treatment",
  outcome_vars = c("outcome1_z", "outcome2_z", "outcome3_z"),
  baseline_vars = c("age", "gender", "baseline_outcome")
)

# perform naive regressions scaled for 4-unit change
naive_results_scaled <- margot_naive_regressions(
  data = my_data,
  exposure_var = "treatment", 
  outcome_vars = c("outcome1_z", "outcome2_z", "outcome3_z"),
  coefficient_scale = 4
)

# plot results with misspecified label
margot_plot(naive_results$combined_table, rename_ate = "Naive Association")
} # }