Simulation: Population Estimate

Note

Suggested Readings

(Westreich and Greenland 2013) The table 2 fallacy: presenting and interpreting confounder and modifier coefficients link
(Bulbulia 2024b) Methods in causal inference. Part 2: Interaction, mediation, and time-varying treatments link

Simulations

S2 and S3 are adapted from the supplement to Bulbulia (2024c). Study S2 uses simulations to show how differences in the distribution of effect modifiers between a study sample and a target population can bias marginal effect estimates (average treatment effects, ATEs). Study S3 provides a mathematical explanation for this result. These simulations are then followed by others that demonstrate how inference may fail when the true timing of events differs from the temporal ordering assumed in one’s measurements.

S2. Generalisability and Transportability

Generalisability and transportability describe different ways of connecting study findings to a target population.

Generalisability

When a study sample is drawn at random from the target population, we may generalise from the sample to that population.

Suppose we sample randomly from a target population, where:

First, n_S denotes the size of the study’s analytic sample S.

Second, N_T denotes the total size of the target population T.

Third, \widehat{ATE}_{n_S} denotes the estimated average treatment effect in the analytic sample S.

Fourth, ATE_T denotes the true average treatment effect in the target population T.

Fifth, \epsilon denotes an arbitrarily small positive value.

Assuming that the rest of the causal inference workflow goes to plan (for example, randomisation succeeds, there is no measurement error, and models are correctly specified), as the random sample size n_S increases, the estimated treatment effect in the analytic sample S converges in probability to the true treatment effect in the target population T:

\lim_{n_S \to \infty} P\bigl(|\widehat{ATE}_{n_S} - ATE_T| < \epsilon\bigr) = 1

for any small positive value of \epsilon.

Informally, under these conditions and with a large enough random sample, the estimated ATE in the study converges to the true ATE in the target population.

Transportability

When the analytic sample is not drawn randomly from the target population, we cannot directly generalise the findings. However, we may still transport the estimated causal effect from the source population to the target population under additional assumptions. In essence, transportability requires us to adjust for differences in the distributions of effect modifiers between the two populations. The closer the source population is to the target population, the more plausible these assumptions become and the less we must rely on complex adjustment methods.

Suppose we have an analytic sample of size n_S drawn from a source population S, and we wish to estimate the average treatment effect in a target population T.

Define:

\widehat{ATE}_S as the estimated average treatment effect in the analytic sample drawn from the source population S.

\widehat{ATE}_T as the (hypothetical) average treatment effect in the target population T that we wish to learn.

f(n_S, R) as a mapping function that adjusts the estimated effect in the analytic sample using a set of measured covariates R (for example, effect modifiers and sampling variables), thereby allowing valid projection from the source population to the target population.

The transportability assumption is that there exists a function f such that

\widehat{ATE}_T = f(\widehat{ATE}_S, R).

Here, \widehat{ATE}_S is the estimated average treatment effect in the source population, and R represents the set of covariates used for adjustment. In practice, f may be implemented using methods such as re-weighting, outcome modelling, or doubly robust estimators that explicitly account for differences in the distribution of R between the two populations.

Finding a suitable function f is the central challenge in correcting sampling bias and achieving transportability (Bareinboim and Pearl 2013; Westreich et al. 2017; Issa J. Dahabreh et al. 2019; Deffner, Rohrer, and McElreath 2022).

S3. A Mathematical Explanation for the Difference in Marginal Effects between Censored and Uncensored Populations

This appendix explains why marginal effects may differ between censored and uncensored populations, even in the absence of unmeasured confounding.

Definitions

Let A denote the exposure variable, where a represents the reference level and a^* represents the comparison level.

Let Y denote the outcome variable.

Let F denote an effect modifier.

Let C denote an indicator for membership in the uncensored or censored population, where C = 0 indicates the uncensored population and C = 1 indicates the censored population.

Average Treatment Effects

The average treatment effects for the uncensored and censored populations are defined as

\Delta_{\text{uncensored}} = \mathbb{E}\bigl[Y(a^*) - Y(a) \mid C = 0\bigr],

\Delta_{\text{censored}} = \mathbb{E}\bigl[Y(a^*) - Y(a) \mid C = 1\bigr].

Potential Outcomes and Causal Consistency

By causal consistency, potential outcomes can be expressed in terms of observed outcomes:

\Delta_{\text{uncensored}} = \mathbb{E}[Y \mid A = a^*, C = 0] - \mathbb{E}[Y \mid A = a, C = 0],

\Delta_{\text{censored}} = \mathbb{E}[Y \mid A = a^*, C = 1] - \mathbb{E}[Y \mid A = a, C = 1].

Law of Total Probability

By the law of total probability, we can decompose these average treatment effects over the distribution of the effect modifier F:

\Delta_{\text{uncensored}} = \sum_{f} \Bigl\{ \mathbb{E}[Y \mid A = a^*, F = f, C = 0] - \mathbb{E}[Y \mid A = a, F = f, C = 0] \Bigr\} \times \Pr(F = f \mid C = 0),

\Delta_{\text{censored}} = \sum_{f} \Bigl\{ \mathbb{E}[Y \mid A = a^*, F = f, C = 1] - \mathbb{E}[Y \mid A = a, F = f, C = 1] \Bigr\} \times \Pr(F = f \mid C = 1).

These expressions show that the marginal effects are weighted averages of stratum-specific effects, where the weights are given by the distribution of F in each population.

Informative Censoring and Effect Modification

Assume that censoring is informative with respect to the effect modifier F, in the sense that

\Pr(F = f \mid C = 0) \neq \Pr(F = f \mid C = 1).

Under this assumption, the weights used to calculate the marginal effects differ between the uncensored and censored populations. Consequently, unless the effect of A on Y is identical across all levels of F, we should not expect the marginal effects to coincide.

In general, we cannot guarantee that

\Delta_{\text{uncensored}} = \Delta_{\text{censored}}.

Equality of marginal effects between the two populations would only hold in special cases, such as a universal null effect (no effect of the exposure on the outcome for any individual), by chance, or under specific conditions discussed by VanderWeele and Robins (2007) and further elucidated by Suzuki et al. (2013). Otherwise,

\Delta_{\text{uncensored}} \neq \Delta_{\text{censored}}.

Furthermore, VanderWeele (2012) proved that if there is effect modification of A by F, then there will be a difference in at least one scale of causal contrast. That is, either

\Delta^{\text{risk ratio}}_{\text{uncensored}} \neq \Delta^{\text{risk ratio}}_{\text{censored}},

\Delta^{\text{difference}}_{\text{uncensored}} \neq \Delta^{\text{difference}}_{\text{censored}}.

Thus, even under no unmeasured confounding, informative censoring combined with effect modification can generate genuine differences in marginal effects between censored and uncensored populations. For comprehensive discussions on sampling, transport, and inference, see Issa J. Dahabreh and Hernán (2019) and Issa J. Dahabreh et al. (2021).

## S4. R Simulation to Clarify Why The Distribution of Effect Modifiers Matters For Estimating Treatment Effects For A Target Population {#id-app-d}

First, we load the stdReg library, which obtains marginal effect estimates by simulating counterfactuals under different levels of treatment (Sjölander 2016). If a treatment is continuous, the levels can be specified.

We also load the parameters library, which creates nice tables (Lüdecke et al. 2020).

#|label: loadlibs

# to obtain marginal effects
if (!requireNamespace("stdReg", quietly = TRUE)) install.packages("stdReg")
library(stdReg)

#  to view data
if (!requireNamespace("skimr", quietly = TRUE)) install.packages("skimr")
library(skimr)

# to create nice tables
if (!requireNamespace("parameters", quietly = TRUE)) install.packages("parameters")
library(parameters)

Warning: package 'parameters' was built under R version 4.5.2

if (!require(margot, quietly = TRUE)) {
  devtools::install_github("go-bayes/margot") # ensure version is at least 1.0.45
  library("margot")
}

margot 1.0.294

Next, we write a function to simulate data for the sample and target populations.

We assume the treatment effect is the same in the sample and target populations, that the coefficient for the effect modifier and the coefficient for interaction are the same, that there is no unmeasured confounding throughout the study, and that there is only selective attrition of one effect modifier such that the baseline population differs from the analytic sample population at the end of the study.

That is: the distribution of effect modifiers is the only respect in which the sample will differ from the target population.

This function will generate data under a range of scenarios. Refer to documentation in the margot package: Bulbulia (2024a)

# function to generate data for the sample and population,
# Along with precise sample weights for the population, there are differences
# in the distribution of the true effect modifier but no differences in the treatment effect
# or the effect modification. all that differs between the sample and the population is
# the distribution of effect modifiers.
# seed
set.seed(123)

# simulate data -- you can use different parameters
data <- margot::simulate_ate_data_with_weights(
  n_sample = 10000,
  n_population = 100000,
  p_z_sample = 0.1,
  p_z_population = 0.5,
  beta_a = 1,
  beta_z = 2.5,
  noise_sd = 0.5
)

# inspect
# skimr::skim(data)

We have generated both sample and population data.

Next, we verify that the distributions of effect modifiers differ in the sample and in the target population:

# obtain the generated data
sample_data <- data$sample_data
population_data <- data$population_data

# check imbalance
table(sample_data$z_sample) # type 1 is rare


   0    1 
9041  959

table(population_data$z_population) # type 1 is common


    0     1 
49785 50215

The sample and population distributions differ.

Next, consider the question: ‘What are the differences in the coefficients that we obtain from the study population at the end of the study, compared with those we would obtain for the target population?’

First, we obtain the regression coefficients for the sample. They are as follows:

# model coefficients sample
model_sample <- glm(y_sample ~ a_sample * z_sample,
  data = sample_data
)

# summary
parameters::model_parameters(model_sample, ci_method = "wald")

Parameter           | Coefficient |       SE |        95% CI | t(9996) |      p
-------------------------------------------------------------------------------
(Intercept)         |   -2.10e-03 | 7.30e-03 | [-0.02, 0.01] |   -0.29 | 0.774 
a sample            |        1.00 |     0.01 | [ 0.98, 1.02] |   96.38 | < .001
z sample            |        2.53 |     0.02 | [ 2.49, 2.58] |  107.68 | < .001
a sample × z sample |        0.45 |     0.03 | [ 0.39, 0.52] |   13.46 | < .001


Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

Next, we obtain the regression coefficients for the weighted regression of the sample. Notice that the coefficients are virtually the same:

# model the sample weighted to the population, again note that these coefficients are similar
model_weighted_sample <- glm(y_sample ~ a_sample * z_sample,
  data = sample_data, weights = weights
)

# summary
summary(parameters::model_parameters(model_weighted_sample,
  ci_method = "wald"
))

Parameter           | Coefficient |        95% CI |      p
----------------------------------------------------------
(Intercept)         |   -2.10e-03 | [-0.02, 0.02] | 0.827 
a sample            |        1.00 | [ 0.97, 1.03] | < .001
z sample            |        2.53 | [ 2.51, 2.56] | < .001
a sample × z sample |        0.45 | [ 0.41, 0.49] | < .001

Model: y_sample ~ a_sample * z_sample (10000 Observations)
Sigma: 0.483 (df = 9996)


Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

We might be tempted to infer that weighting wasn’t relevant to the analysis. However, we’ll see that such an interpretation would be a mistake.

Next, we obtain model coefficients for the population. Note again there is no difference – only narrower errors owing to the large sample size.

# model coefficients population -- note that these coefficients are very similar.
model_population <- glm(y_population ~ a_population * z_population,
  data = population_data
)

parameters::model_parameters(model_population, ci_method = "wald")

Parameter                   | Coefficient |       SE |        95% CI | t(99996) |      p
----------------------------------------------------------------------------------------
(Intercept)                 |   -7.10e-04 | 3.19e-03 | [-0.01, 0.01] |    -0.22 | 0.824 
a population                |        1.00 | 4.49e-03 | [ 0.99, 1.01] |   222.88 | < .001
z population                |        2.50 | 4.49e-03 | [ 2.50, 2.51] |   557.23 | < .001
a population × z population |        0.50 | 6.34e-03 | [ 0.49, 0.51] |    78.46 | < .001


Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
  using a Wald t-distribution approximation.

Again, there is no difference. That is, we find that all model coefficients are practically equivalent. The different distribution of effect modifiers does not result in different coefficient values for the treatment effect, the effect-modifier ‘effect,’ or the interaction of the effect modifier and treatment.

Consider why this is the case: in a large sample where the causal effects are invariant – as we have simulated them to be – we will have good replication in the effect modifiers within the sample, so our statistical model can recover the coefficients for the population without challenge.

However, in causal inference, we are interested in the marginal effect of the treatment for a target population. That is, we seek an estimate for the counterfactual contrast in which everyone in a pre-specified population or stratum of a population was subject to one level of treatment compared with a counterfactual condition in which everyone in this same population was subject to another level of the same treatment.

The marginal effect estimates will differ in at least one measure of effect when the analytic sample population has a different distribution of effect modifiers compared to the target population.

To see this, we use the stdReg package to recover marginal effect estimates, comparing (1) the sample ATE, (2) the true oracle ATE for the population, and (3) the weighted sample ATE. We will use the outputs of the same models above. The only difference is that we will calculate marginal effects from these outputs. We will contrast a difference from an intervention in which everyone receives treatment = 0 with one in which everyone receives treatment = 1; however, this choice is arbitrary, and the general lessons apply irrespective of the estimand.

First, consider this Average Treatment Effect for the analytic population:

# What inference do we draw?
# we cannot say the models are unbiased for the marginal effect estimates.
# regression standardisation
library(stdReg) # to obtain marginal effects

# obtain sample ate
fit_std_sample <- stdReg::stdGlm(model_sample,
  data = sample_data, X = "a_sample"
)

# summary
summary(fit_std_sample, contrast = "difference", reference = 0)


Formula: y_sample ~ a_sample * z_sample
Family: gaussian 
Link function: identity 
Exposure:  a_sample 
Reference level:  a_sample = 0 
Contrast:  difference 

  Estimate Std. Error lower 0.95 upper 0.95
0     0.00    0.00000       0.00       0.00
1     1.04    0.00996       1.02       1.06

The treatment effect is given as a 1.04 unit change in the outcome across the analytic population, with a confidence interval from 1.02 to 1.06.

Next, we obtain the true (oracle) treatment effect for the target population under the same intervention:

## note the population effect is different

# obtain true ate
fit_std_population <- stdReg::stdGlm(model_population,
  data = population_data, X = "a_population"
)

# summary
summary(fit_std_population, contrast = "difference", reference = 0)


Formula: y_population ~ a_population * z_population
Family: gaussian 
Link function: identity 
Exposure:  a_population 
Reference level:  a_population = 0 
Contrast:  difference 

  Estimate Std. Error lower 0.95 upper 0.95
0     0.00    0.00000       0.00       0.00
1     1.25    0.00327       1.24       1.26

Note that the true treatment effect is a 1.25-unit change in the population, with a confidence bound between 1.24 and 1.26. This is well outside the ATE that we obtain from the analytic population!

Next, consider the ATE in the weighted regression, where the analytic sample was weighted to the target population’s true distribution of effect modifiers:

## next try weights adjusted ate where we correctly assign population weights to the sample
fit_std_weighted_sample_weights <- stdReg::stdGlm(model_weighted_sample,
  data = sample_data, X = "a_sample"
)

# this gives us the right answer
summary(fit_std_weighted_sample_weights, contrast = "difference", reference = 0)


Formula: y_sample ~ a_sample * z_sample
Family: gaussian 
Link function: identity 
Exposure:  a_sample 
Reference level:  a_sample = 0 
Contrast:  difference 

  Estimate Std. Error lower 0.95 upper 0.95
0     0.00     0.0000       0.00       0.00
1     1.22     0.0165       1.19       1.25

We find that we obtain the population-level causal effect estimate with accurate coverage by weighting the sample to the target population. So with appropriate weights, our results generalise from the sample to the target population.

Lessons

Regression coefficients do not clarify the problem of sample/target population mismatch — or selection bias as discussed in this manuscript.
Investigators should not rely on regression coefficients alone when evaluating the biases that arise from sample attrition. This advice applies to both methods that authors use to investigate threats of bias. To implement this advice, authors must first take it themselves.
Observed data are generally insufficient for assessing threats. Observed data do not clarify structural sources of bias, nor do they clarify effect-modification in the full counterfactual data condition where all receive the treatment and all do not receive the treatment (at the same level).
To properly assess bias, one needs access to the counterfactual outcome — what would have happened to the missing participants had they not been lost to follow-up or had they responded? The joint distributions over ‘full data’ are inherently unobservable (Van Der Laan and Rose 2011).
In simple settings, like the one we just simulated, we can address the gap between the sample and target population using methods such as modelling the censoring (e.g., censoring weighting). However, we never know what setting we are in or whether it is simple—such modelling must be handled carefully. There is a large and growing epidemiology literature on this topic (see, for example, Li, Miao, and Tchetgen Tchetgen (2023)).

Ambiguities from Cross-Sectional Data

Methodology

Data Generation: we simulate a dataset for 1,000 individuals, where e.g. degree of religious belief (A) influences wealth (L), which in turn affects charitable donations (Y$). The simulation is based on predefined parameters that establish L as a mediator between A and Y.

Parameter Definitions:

The probability of access to green space (A) is set at 0.5.
The effect of A on L (exercise) is given by \beta = 2.
The effect of L on Y (happiness) is given by \delta = 1.5.
Standard deviations for L and Y are set at 1 and 1.5, respectively.

Model 1 (Correct Assumption): fits a linear regression model assuming L as a mediator, including both A and L as regressors on Y. This model aligns with the data-generating process, and, by the rules of d-separation, induces mediator bias for the A\to Y path.

Model 2 (Incorrect Assumption): fits a linear regression model including only A as a regressor on Y, omitting the mediator L. This model assesses the direct effect of A on Y without accounting for mediation.

Analysis: We compares the estimated effects of A on Y under each model specification.

# load libraries
if (!require(kableExtra)) {
  install.packages("kableExtra")
} # tables
if (!require(gtsummary)) {
  install.packages("gtsummary")
} # tables

# simulation seed
set.seed(123) #  reproducibility

# define the parameters
n <- 1000 # Number of observations
p <- 0.5 # Probability of A = 1 (exposure)
alpha <- 0 # Intercept for L (mediator)
beta <- 2 # Effect of A on L
gamma <- 1 # Intercept for Y
delta <- 1.5 # Effect of L on Y
sigma_L <- 1 # Standard deviation of L
sigma_Y <- 1.5 # Standard deviation of Y

# simulate the data: fully mediated effect by L
A <- rbinom(n, 1, p) # binary exposure variable
L <- alpha + beta * A + rnorm(n, 0, sigma_L) # mediator L affect by A
Y <- gamma + delta * L + rnorm(n, 0, sigma_Y) # Y affected only by L,

# make the data frame
data <- data.frame(A = A, L = L, Y = Y)

# fit regression in which we control for L, a mediator
# (cross-sectional data is consistent with this model)
fit_1 <- lm(Y ~ A + L, data = data)

# fit regression in which L is assumed to be a mediator, not a confounder.
# (cross-sectional data is also consistent with this model)
fit_2 <- lm(Y ~ A, data = data)

# create gtsummary tables for each regression model
table1 <- gtsummary::tbl_regression(fit_1)
table2 <- gtsummary::tbl_regression(fit_2)

# merge the tables for comparison
table_comparison <- gtsummary::tbl_merge(
  list(table1, table2),
  tab_spanner = c(
    "Model: Exercise assumed confounder",
    "Model: Exercise assumed to be a mediator"
  )
)
# make html table (for publication)
markdown_table_0 <- as_kable_extra(table_comparison,
  format = "html"
)
# print table (note, you might prefer "markdown" or another format)
markdown_table_0

The following code is designed to estimate the Average Treatment Effect (ATE) using the clarify package in R, which is referenced here as (Greifer et al. 2023). The procedure involves two steps: simulating coefficient distributions for regression models and then calculating the ATE based on these simulations. This process is applied to two distinct models to demonstrate the effects of including versus excluding a mediator variable in the analysis.

Steps to Estimate the ATE

Load the clarify Package: this package provides functions to simulate regression coefficients and compute average marginal effects (AME), robustly facilitating the estimation of ATE.
Set seed: set.seed(123) ensures that the results of the simulations are reproducible, allowing for consistent outcomes across different code runs.
Simulate the data distribution:

sim_coefs_fit_1 and sim_coefs_fit_2 are generated using the sim function from the clarify package, applied to two fitted models (fit_1 and fit_2). These functions simulate the distribution of coefficients based on the specified models, capturing the uncertainty around the estimated parameters.
Calculate ATE:

For both models, the sim_ame function calculates the ATE as the marginal risk difference (RD) when the treatment variable (A) is present (A == 1). This function uses the simulated coefficients to estimate the treatment effect across the simulated distributions, providing a comprehensive view of the ATE under each model.

To streamline the output, the function is set to verbose mode off (verbose = FALSE).

Results:

Summaries of these estimates (summary_sim_est_fit_1 and summary_sim_est_fit_2) are obtained, providing detailed statistics including the estimated ATE and its 95% confidence intervals (CI).

Presentation: report ATE and CIs:

Using the glue package, the ATE and its 95% CIs for both models are formatted into a string for easy reporting. This step transforms the statistical output into a more interpretable form, highlighting the estimated treatment effect and its precision.

# use `clarify` package to obtain ATE
if (!require(clarify)) {
  install.packages("clarify")
} # clarify package
# simulate fit 1 ATE
set.seed(2025)
sim_coefs_fit_1 <- sim(fit_1)
sim_coefs_fit_2 <- sim(fit_2)

# marginal risk difference ATE, simulation-based: model 1 (L is a confounder)
sim_est_fit_1 <-
  sim_ame(
    sim_coefs_fit_1,
    var = "A",
    subset = A == 1,
    contrast = "RD",
    verbose = FALSE
  )
# marginal risk difference ATE, simulation-based: model 2 (L is a mediator)
sim_est_fit_2 <-
  sim_ame(
    sim_coefs_fit_2,
    var = "A",
    subset = A == 1,
    contrast = "RD",
    verbose = FALSE
  )
# obtain summaries
summary_sim_est_fit_1 <- summary(sim_est_fit_1, null = c(`RD` = 0))
summary_sim_est_fit_2 <- summary(sim_est_fit_2, null = c(`RD` = 0))

# reporting
# ate for fit 1, with 95% CI
ATE_fit_1 <- glue::glue(
  "ATE =
                        {round(summary_sim_est_fit_1[3, 1], 2)},
                        CI = [{round(summary_sim_est_fit_1[3, 2], 2)},
                        {round(summary_sim_est_fit_1[3, 3], 2)}]"
)
# ate for fit 2, with 95% CI
ATE_fit_2 <-
  glue::glue(
    "ATE = {round(summary_sim_est_fit_2[3, 1], 2)},
                        CI = [{round(summary_sim_est_fit_2[3, 2], 2)},
                        {round(summary_sim_est_fit_2[3, 3], 2)}]"
  )

# print(ATE_fit_1)
# print(ATE_fit_2)

Upshot of the Simulation and Analysis

Model 1 (L as a Confounder): this analysis assumes that L is a confounder in the relationship between the treatment (A) and the outcome (Y), and thus, it includes L in the model. The ATE estimated here reflects the effect of A while controlling for L.
Model 2 (L as a Mediator): in contrast, this analysis considers L to be a mediator, and the model either includes L explicitly in its estimation process or excludes it to examine the direct effect of A on Y. The approach to mediation analysis here is crucial as it influences the interpretation of the ATE.

By comparing the ATEs from both models, researchers can understand the effect of mediation (or the lack thereof) on the estimated treatment effect. This comparison sheds light on how assumptions about variable roles (confounder vs. mediator) can significantly alter causal inferences drawn from cross-sectional data.

Wherever it is uncertain whether a variable is a confounder or a mediator, we suggest creating two causal diagrams and reporting both analyses.

Appendix D: Simulation of Different Confounding Control Strategies

This appendix outlines the methodology and results of a data simulation designed to compare different strategies for controlling confounding in the context of environmental psychology research. Specifically, the simulation examines the effect of access to open green spaces (treatment, A_1) on happiness (outcome, Y_2) while addressing the challenge of unmeasured confounding. The simulation incorporates baseline measures of exposure and outcome (A_0, Y_0), baseline confounders (L_0), and an unmeasured confounder (U) to evaluate the effectiveness of different analytical approaches.

Methodology

1.Load Libraries kableExtra, gtsummary, and grf.

Target: we simulate data for 10,000 individuals, including baseline exposure to green spaces (A_0), baseline happiness (Y_0), baseline confounders (L_0), and an unmeasured confounder (U). The simulation uses a logistic model for treatment assignment and a linear model for the continuous outcome, incorporating interactions to assess how baseline characteristics modify the treatment effect.
Set seed and simulate the data distribution:

Treatment assignment coefficients: \beta_{A0} = 0.25, \beta_{Y0} = 0.3, \beta_{L0} = 0.2, and \beta_{U} = 0.1. Outcome model coefficients: \delta_{A1} = 0.3, \delta_{Y0} = 0.9, \delta_{A0} = 0.1, \delta_{L0} = 0.3, with an interaction effect (\theta_{A0Y0L0} = 0.5) indicating the combined influence of baseline exposure, outcome, and confounders on the follow-up outcome.

Model comparison:
- No control model: estimates the effect of A_1 on Y_2 without controlling for any confounders.
- Standard covariate control model: controls for baseline confounders (L_0) alongside treatment (A_1).
- Baseline exposure and outcome model: extends the standard model by including baseline treatment and outcome (A_0, Y_0) and their interaction with L_0.
Results: each model’s effectiveness in estimating the true treatment effect is assessed by comparing regression outputs. The simulation evaluates how well each model addresses the bias introduced by unmeasured confounding and the role of baseline characteristics in modifying treatment effects.
Presentation: the results are synthesised in a comparative table, formatted using the kableExtra {Zhu (2021)] and gtsummary packages (Sjoberg et al. 2021), highlighting the estimated treatment effects and their statistical significance across models.

Overall, we use the simulation to illustrate the importance of incorporating baseline characteristics and their interactions to mitigate the influence of unmeasured confounding.

Here is the simulation/model code:

library(kableExtra)
if (!require(kableExtra)) {
  install.packages("kableExtra")
} # causal forest
if (!require(gtsummary)) {
  install.packages("gtsummary")
} # causal forest

Loading required package: gtsummary

if (!require(grf)) {
  install.packages("grf")
} # causal forest

Loading required package: grf


Attaching package: 'grf'

The following object is masked from 'package:parameters':

    get_scores

# r_texmf()eproducibility
set.seed(123)

# set number of observations
n <- 10000

# baseline covariates
U <- rnorm(n) # Unmeasured confounder
A_0 <- rbinom(n, 1, prob = plogis(U)) # Baseline exposure
Y_0 <- rnorm(n, mean = U, sd = 1) # Baseline outcome
L_0 <- rnorm(n, mean = U, sd = 1) # Baseline confounders

# coefficients for treatment assignment
beta_A0 <- 0.25
beta_Y0 <- 0.3
beta_L0 <- 0.2
beta_U <- 0.1

# simulate treatment assignment
A_1 <- rbinom(n, 1, prob = plogis(-0.5 +
  beta_A0 * A_0 +
  beta_Y0 * Y_0 +
  beta_L0 * L_0 +
  beta_U * U))
# coefficients for continuous outcome
delta_A1 <- 0.3
delta_Y0 <- 0.9
delta_A0 <- 0.1
delta_L0 <- 0.3
theta_A0Y0L0 <- 0.5 # Interaction effect between A_1 and L_0
delta_U <- 0.05
# simulate continuous outcome including interaction
Y_2 <- rnorm(n,
  mean = 0 +
    delta_A1 * A_1 +
    delta_Y0 * Y_0 +
    delta_A0 * A_0 +
    delta_L0 * L_0 +
    theta_A0Y0L0 * Y_0 *
      A_0 * L_0 +
    delta_U * U,
  sd = .5
)
# assemble data frame
data <- data.frame(Y_2, A_0, A_1, L_0, Y_0, U)

# model: no control
fit_no_control <- lm(Y_2 ~ A_1, data = data)

# model: standard covariate control
fit_standard <- lm(Y_2 ~ A_1 + L_0, data = data)

# model: interaction with baseline confounders, and baseline outcome and exposure
fit_interaction <- lm(Y_2 ~ A_1 * (L_0 + A_0 + Y_0), data = data)

# create gtsummary tables for each regression model
tbl_fit_no_control <- tbl_regression(fit_no_control)
tbl_fit_standard <- tbl_regression(fit_standard)
tbl_fit_interaction <- tbl_regression(fit_interaction)

# get only the treatment variable
tbl_list_modified <- lapply(
  list(
    tbl_fit_no_control,
    tbl_fit_standard,
    tbl_fit_interaction
  ),
  function(tbl) {
    tbl %>%
      modify_table_body(~ .x %>% dplyr::filter(variable == "A_1"))
  }
)
# merge tables
table_comparison <- tbl_merge(
  tbls = tbl_list_modified,
  tab_spanner = c(
    "No Control",
    "Standard",
    "Interaction"
  )
) |>
  modify_table_styling(
    column = c(p.value_1, p.value_2, p.value_3),
    hide = TRUE
  )
# markdown table for publication
markdown_table <-
  as_kable_extra(table_comparison, format = "markdown")
print(markdown_table)

<table style="NAborder-bottom: 0;">
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">No Control</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Standard</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Interaction</div></th>
</tr>
  <tr>
   <th style="text-align:left;"> Characteristic </th>
   <th style="text-align:center;"> Beta </th>
   <th style="text-align:center;"> 95% CI </th>
   <th style="text-align:center;"> Beta </th>
   <th style="text-align:center;"> 95% CI </th>
   <th style="text-align:center;"> Beta </th>
   <th style="text-align:center;"> 95% CI </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> A_1 </td>
   <td style="text-align:center;"> 1.5 </td>
   <td style="text-align:center;"> 1.5, 1.6 </td>
   <td style="text-align:center;"> 0.86 </td>
   <td style="text-align:center;"> 0.80, 0.93 </td>
   <td style="text-align:center;"> 0.25 </td>
   <td style="text-align:center;"> 0.20, 0.31 </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> Abbreviation: CI = Confidence Interval</td></tr></tfoot>
</table>

Next, in the following code, we calculate the Average Treatment Effect (ATE) using simulation-based approaches for two distinct models: one with standard covariate control and another incorporating interaction. This approach leverages the clarify package in R, which facilitates the simulation and interpretation of estimated coefficients from linear models to derive ATEs under different modelling assumptions (Greifer et al. 2023).

First, we use the sim function from the clarify package to generate simulated coefficient distributions for the standard model (fit_standard) and the interaction model (fit_interaction). This step is crucial for capturing the uncertainty in our estimates arising from sampling variability.

Next, we employ each model’s sim_ame function to compute the average marginal effects (AME), focusing on the treatment variable (A_1). The calculation is done under the assumption that all individuals are treated (i.e., A_1 == 1), and we specify the contrast type as “RD” (Risk Difference) to directly obtain the ATE (Average Treatment Effect). The sim_ame function simulates the treatment effect across the distribution of simulated coefficients, providing a robust estimate of the ATE and its variability.

The summaries of these simulations (summary_sim_est_fit_std and summary_sim_est_fit_int) are then extracted to provide concise estimates of the ATE along with 95% confidence intervals (CIs) for both the standard and interaction models. This step is essential for understanding the magnitude and precision of the treatment effects estimated by the models.

Finally, we use the glue package to format these estimates into a human-readable form, presenting the ATE and its corresponding 95% CIs for each model. This presentation facilitates clear communication of the estimated treatment effects, allowing for direct comparison between the models and highlighting the effect of including baseline characteristics and their interactions on estimating the ATE (Hester and Bryan 2022).

This simulation-based approach to estimating the ATE underscores the importance of considering model complexity and the roles of confounders and mediators in causal inference analyses. By comparing the ATE estimates from different models, we can assess the sensitivity of our causal conclusions to various assumptions and modelling strategies.

# use `clarify` package to obtain ATE
if (!require(clarify)) {
  install.packages("clarify")
} # clarify package

Loading required package: clarify

# simulate fit 1 ATE
set.seed(123)
sim_coefs_fit_no_control <- sim(fit_no_control)
sim_coefs_fit_std <- sim(fit_standard)
sim_coefs_fit_int <- sim(fit_interaction)

# marginal risk difference ATE, no controls
sim_est_fit_no_control <-
  sim_ame(
    sim_coefs_fit_no_control,
    var = "A_1",
    subset = A_1 == 1,
    contrast = "RD",
    verbose = FALSE
  )
# marginal risk difference ATE, simulation-based: model 1 (L is a confounder)
sim_est_fit_std <-
  sim_ame(
    sim_coefs_fit_std,
    var = "A_1",
    subset = A_1 == 1,
    contrast = "RD",
    verbose = FALSE
  )
# marginal risk difference ATE, simulation-based: model 2 (L is a mediator)
sim_est_fit_int <-
  sim_ame(
    sim_coefs_fit_int,
    var = "A_1",
    subset = A_1 == 1,
    contrast = "RD",
    verbose = FALSE
  )
# obtain summaries
summary_sim_coefs_fit_no_control <-
  summary(sim_est_fit_no_control, null = c(`RD` = 0))
summary_sim_est_fit_std <-
  summary(sim_est_fit_std, null = c(`RD` = 0))
summary_sim_est_fit_int <-
  summary(sim_est_fit_int, null = c(`RD` = 0))

# get coefficients for reporting
# ate for fit 1, with 95% CI
ATE_fit_no_control <- glue::glue(
  "ATE = {round(summary_sim_coefs_fit_no_control[3, 1], 2)},
  CI = [{round(summary_sim_coefs_fit_no_control[3, 2], 2)},
  {round(summary_sim_coefs_fit_no_control[3, 3], 2)}]"
)
# ate for fit 2, with 95% CI
ATE_fit_std <- glue::glue(
  "ATE = {round(summary_sim_est_fit_std[3, 1], 2)},
  CI = [{round(summary_sim_est_fit_std[3, 2], 2)},
  {round(summary_sim_est_fit_std[3, 3], 2)}]"
)
# ate for fit 3, with 95% CI
ATE_fit_int <-
  glue::glue(
    "ATE = {round(summary_sim_est_fit_int[3, 1], 2)},
    CI = [{round(summary_sim_est_fit_int[3, 2], 2)},
    {round(summary_sim_est_fit_int[3, 3], 2)}]"
  )
# coefs they used in the manuscript

Using the clarify package, we infer the ATE for the standard model is ATE = 0.86, CI = [0.8, 0.93].

Using the clarify package, we infer the ATE for the model that conditions on the baseline exposure and baseline outcome to be: ATE = 0.43, CI = [0.39, 0.47], which is close to the values supplied to the data-generating mechanism.

Take-home message:

The baseline exposure and baseline outcome are often the most important variables to include for confounding control. The baseline exposure also allows us to estimate an incident-exposure effect. For this reason, we should endeavour to obtain at least three waves of data such that these variables and other baseline confounders are included at time 0, the exposure is included at time 1, and the outcome is included at time 2.

Appendix E: Non-parametric Estimation of Average Treatment Effects Using Causal Forests

This appendix provides a practical example of estimating average treatment effects (ATE) using a non-parametric approach, specifically applying causal forests. Unlike traditional regression models, causal forests allow for estimating treatment effects without imposing strict assumptions about the form of the relationship between treatment, covariates, and outcomes. This flexibility makes them particularly useful for analysing complex datasets where the treatment effect may vary across observations.

Causal Forest Model Implementation

Libraries: the implementation begins with loading the necessary R libraries: grf for estimating conditional and average treatment effects using causal forests and glue for formatting the results for reporting.
1. Data generation: the code assumes the presence of a data frame data generated from the previous code snippet containing the variables:
- A_1: Treatment indicator.
- L_0: A covariate.
- Y_2: Outcome of interest.
- A_0 and Y_0: Baseline exposure and outcome, respectively.
Treatment (W) and outcome (Y) vectors are extracted from data alongside a matrix X that includes covariates and baseline characteristics.
Causal Forest model: a causal forest model is fitted using the causal_forest function from the grf package (Tibshirani et al. 2024). This function takes the covariate matrix X, the outcome vector Y, and the treatment vector W as inputs, and it returns a model object that can be used for further analysis.
Average Treatment Effect estimation: the average_treatment_effect function computes the ATE from the fitted causal forest model. This step is crucial as it quantifies the overall effect of the treatment across the population, adjusting for covariates included in the model.
Reporting: The estimated ATE and its standard error (se) are extracted and formatted for reporting using the glue package (Hester and Bryan 2022). This facilitates clear communication of the results, showing the estimated effect size and its uncertainty.

Key Takeaways

First, causal forests offer a robust way to estimate treatment effects without making parametric solid assumptions. This approach is particularly advantageous in settings where the treatment effect may vary with covariates or across different subpopulations.

Second, the model estimates the ATE as the difference in expected outcomes between treated and untreated units, averaged across the population. This estimate reflects the overall effect of the treatment, accounting for the distribution of covariates in the sample.

Third, we find that the estimated ATE by the causal forest model converges to the actual value used in the data-generating process (assumed to be 0.3). This demonstrates the effectiveness of causal forests in uncovering the true treatment effect from complex data.

This example underscores the utility of semi-parametric and non-parametric methods, such as causal forests, in causal inference analyses.

# load causal forest library
library(grf) # estimate conditional and average treatment effects
library(glue) # reporting

#  'data' is our data frame with columns 'A_1' for treatment, 'L_0' for a covariate, and 'Y_2' for the outcome
#  we also have the baseline exposure 'A_0' and 'Y_0'
#  ensure W (treatment) and Y (outcome) are vectors
W <- as.matrix(data$A_1) # Treatment
Y <- as.matrix(data$Y_2) # Outcome
X <- as.matrix(data[, c("L_0", "A_0", "Y_0")])

# fit causal forest model
fit_causal_forest <- causal_forest(X, Y, W)

# estimate the average treatment effect (ATE)
ate <- average_treatment_effect(fit_causal_forest)

# make data frame for reporting using "glue'
ate <- data.frame(ate)

# obtain ate for report
ATE_fit_causal_forest <-
  glue::glue(
    "ATE = {round(ate[1, 1], 2)}, se = {round(ate[2, 1], 2)}"
  )

Causal forest estimates the average treatment effect as ATE = 0.3, se = 0.01. This approach converges to the true value supplied to the generating mechanism of 0.3

References

Bareinboim, Elias, and Judea Pearl. 2013. “A General Algorithm for Deciding Transportability of Experimental Results.” Journal of Causal Inference 1 (1): 107–34.

Bulbulia, J. A. 2024a. Margot: MARGinal Observational Treatment-Effects. https://doi.org/10.5281/zenodo.10907724.

———. 2024b. “Methods in Causal Inference Part 2: Interaction, Mediation, and Time-Varying Treatments.” Evolutionary Human Sciences 6: e41. https://doi.org/10.1017/ehs.2024.32.

———. 2024c. “Methods in Causal Inference Part 3: Measurement Error and External Validity Threats.” Evolutionary Human Sciences 6: e42. https://doi.org/10.1017/ehs.2024.33.

Dahabreh, Issa J, Sebastien JP A Haneuse, James M Robins, Sarah E Robertson, Ashley L Buchanan, Elizabeth A Stuart, and Miguel A Hernán. 2021. “Study Designs for Extending Causal Inferences from a Randomized Trial to a Target Population.” American Journal of Epidemiology 190 (8): 1632–42.

Dahabreh, Issa J., and Miguel A. Hernán. 2019. “Extending Inferences from a Randomized Trial to a Target Population.” European Journal of Epidemiology 34 (8): 719–22. https://doi.org/10.1007/s10654-019-00533-2.

Dahabreh, Issa J, James M Robins, Sebastien JP Haneuse, and Miguel A Hernán. 2019. “Generalizing Causal Inferences from Randomized Trials: Counterfactual and Graphical Identification.” arXiv Preprint arXiv:1906.10792.

Deffner, Dominik, Julia M. Rohrer, and Richard McElreath. 2022. “A Causal Framework for Cross-Cultural Generalizability.” Advances in Methods and Practices in Psychological Science 5 (3): 25152459221106366. https://doi.org/10.1177/25152459221106366.

Greifer, Noah, Steven Worthington, Stefano Iacus, and Gary King. 2023. Clarify: Simulation-Based Inference for Regression Models. https://iqss.github.io/clarify/.

Hester, Jim, and Jennifer Bryan. 2022. Glue: Interpreted String Literals. https://CRAN.R-project.org/package=glue.

Li, Wei, Wang Miao, and Eric Tchetgen Tchetgen. 2023. “Non-Parametric Inference about Mean Functionals of Non-Ignorable Non-Response Data Without Identifying the Joint Distribution.” Journal of the Royal Statistical Society Series B: Statistical Methodology 85 (3): 913–35.

Lüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, and Dominique Makowski. 2020. “Extracting, Computing and Exploring the Parameters of Statistical Models Using R.” Journal of Open Source Software 5 (53): 2445. https://doi.org/10.21105/joss.02445.

Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13: 570–80. https://doi.org/10.32614/RJ-2021-053.

Sjölander, Arvid. 2016. “Regression Standardization with the R Package stdReg.” European Journal of Epidemiology 31 (6): 563–74. https://doi.org/10.1007/s10654-016-0157-3.

Suzuki, Etsuji, Toshiharu Mitsuhashi, Toshihide Tsuda, and Eiji Yamamoto. 2013. “A Counterfactual Approach to Bias and Effect Modification in Terms of Response Types.” BMC Medical Research Methodology 13 (1): 1–17.

Tibshirani, Julie, Susan Athey, Erik Sverdrup, and Stefan Wager. 2024. Grf: Generalized Random Forests. https://github.com/grf-labs/grf.

Van Der Laan, Mark J., and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. New York, NY: Springer. https://link.springer.com/10.1007/978-1-4419-9782-1.

VanderWeele, Tyler J. 2012. “Confounding and Effect Modification: Distribution and Measure.” Epidemiologic Methods 1 (1): 55–82. https://doi.org/10.1515/2161-962X.1004.

VanderWeele, Tyler J., and James M. Robins. 2007. “Four types of effect modification: a classification based on directed acyclic graphs.” Epidemiology (Cambridge, Mass.) 18 (5): 561–68. https://doi.org/10.1097/EDE.0b013e318127181b.

Westreich, Daniel, Jessie K Edwards, Catherine R Lesko, Elizabeth Stuart, and Stephen R Cole. 2017. “Transportability of Trial Results Using Inverse Odds of Sampling Weights.” American Journal of Epidemiology 186 (8): 1010–14.

Westreich, Daniel, and Sander Greenland. 2013. “The Table 2 Fallacy: Presenting and Interpreting Confounder and Modifier Coefficients.” American Journal of Epidemiology 177 (4): 292–98.

Zhu, Hao. 2021. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.