---
title: "Causal Inference: Average Treatment Effects"
date: "2025-AUG-12"
bibliography: /Users/joseph/GIT/templates/bib/references.bib
editor_options:
chunk_output_type: console
format:
html:
warnings: FALSE
error: FALSE
messages: FALSE
code-overflow: scroll
highlight-style: kate
code-tools:
source: true
toggle: FALSE
html-math-method: katex
reference-location: margin
cap-location: margin
---
```{r}
#| echo: false
#| warning: false
# code-block-border-left: true
# WARNING: COMMENT THIS OUT. JB DOES THIS FOR WORKING WITHOUT WIFI
#source("/Users/joseph/GIT/templates/functions/libs2.R")
# WARNING: COMMENT THIS OUT. JB DOES THIS FOR WORKING WITHOUT WIFI
#source("/Users/joseph/GIT/templates/functions/funs.R")
# ALERT: UNCOMMENT THIS AND DOWNLOAD THE FUNCTIONS FROM JB's GITHUB
# source(
# "https://raw.githubusercontent.com/go-bayes/templates/main/functions/experimental_funs.R"
# )
# source(
# "https://raw.githubusercontent.com/go-bayes/templates/main/functions/experimental_funs.R"
# )
# for making graphs
library("tinytex")
library("extrafont")
loadfonts(device = "all")
```
::: {.callout-note}
**Suggested Readings**
- [@hernan2024WHATIF] Chapters 1-3 [link](https://www.dropbox.com/scl/fi/9hy6xw1g1o4yz94ip8cvd/hernanrobins_WhatIf_2jan24.pdf?rlkey=8eaw6lqhmes7ddepuriwk5xk9&dl=0)
- [@neal2020introduction] Chapter 1-2 [link](https://www.dropbox.com/scl/fi/9hy6xw1g1o4yz94ip8cvd/hernanrobins_WhatIf_2jan24.pdf?rlkey=8eaw6lqhmes7ddepuriwk5xk9&dl=0)
:::
::: {.callout-important}
## Key concepts:
- **The Fundamental Problem of Causal Inference**
- **Causal Inference in Randomised Experiments**
- **Causal Inference in Observational Studies - Average (Marginal) Treatment Effects**
- **Three Fundamental Assumptions for Causal Inference**
:::
::: {.callout-important}
Here, we use the terms "counterfactual outcomes" and "potential outcomes" interchangeably.
:::
## Objectives
- You will understand why causation is never directly observed.
- You will understand how experiments address this "causal gap."
- You will understand how applying three principles from experimental research allows human scientists to close this "causal gap" when making inferences about a population as a whole — that is, inferences about "marginal effects."
# **Part 1: Motivating Example**
## 1990s observational studies indicated 30% all-cause mortality reduction from estrogen therapies?
::: {.callout-note title="In the 1980s and 1990s estrogen treatments appeared to **benefit** postmenopausal women" icon="false"}
Hazard ratio for all-cause mortality: **0.68** current users vs. never users
[@grodstein2006hormone]
:::
## Standard Medical Advice
- 1992 American College of Obstetricians and Gynecologists {"Probable beneficial effect of estrogens on heart disease."}
- 1992 American College of Physicians "Women who have coronary heart disease or who are at increased risk of coronary heart disease are likely to benefit from hormone therapy."}
- 1993 National Cholesterol Education Program {"Epidemiologic evidence for the benefit of estrogen replacement therapy is especially strong for secondary prevention in women with prior CHD."}
- 1996 American Heart Association {"ERT does look promising as a long-term protection against heart attack."}
## Women's Health Initiative: Evaluate Estrogens Experimentally
- Massive **randomised**, double-blind, placebo-controlled trial
- 16,000 U.S. women aged 50-79 years
- Assigned to Estrogen plus Progestin therapy
- Women followed approximately every year for up to 8 years.
## Findings: **Clear Discrepancy**
::: {.callout-warning title="Experimental Findings: **opposite of observational findings**" icon="false"}
All-cause mortality Hazard Ratio: **1.23** for initiators vs non-initiators. [@manson2003estrogen]
:::
## Medical community response: **Reject** all observational studies
- Can observational studies ever be trusted?
- Should observational studies ever be funded again?
- {What went wrong?}
## Opening
::: {.callout-quote}
**Robert Frost writes:**
> Two roads diverged in a yellow wood,
> And sorry I could not travel both
> And be one traveler, long I stood
> And looked down one as far as I could
> To where it bent in the undergrowth;
>
> Then took the other, as just as fair,
> And having perhaps the better claim,
> Because it was grassy and wanted wear;
> Though as for that the passing there
> Had worn them really about the same,
>
> And both that morning equally lay
> In leaves no step had trodden black.
> Oh, I kept the first for another day!
> Yet knowing how way leads on to way,
> I doubted if I should ever come back.
>
> I shall be telling this with a sigh
> Somewhere ages and ages hence:
> Two roads diverged in a wood, and I—
> I took the one less traveled by,
> And that has made all the difference.
>
> -- *The Road Not Taken*
:::
## Introduction: Motivating Example
Consider the following question:
> Alice attends religious service regularly. Does this increase her volunteering?
There is evidence that people who attend religious volunteer more, but would Alice volunteer anyway?
**"And sorry I could not travel both. And be one traveler $\dots$"**
## Part 1: The Fundamental Problem of Causal Inference as a Missing Data Problem
**The fundamental problem of causal inference** is that causality is never directly observed.
Let $Y$ an'a'd $A$ denote random variables.
We formulate a causal question by asking whether experiencing a exposure $A$, when this exposure is set to level $A = a$, would lead to a difference in $Y$, compared to what would have occurred had the exposure been set to a different level, say $A=a'$ will lead to a difference in outcome $Y$. For simplicity, we imagine binary exposure such that $A = 1$ denotes receiving the "religious service" exposure and $A = 0$ denotes receiving the "no religious service" exposure. Assume these are the only two exposures of interest:
Let:
- $Y_i(a = 1)$ denote the cognitive ability of child $i$ if the child were bilingual (potential outcome when $A_i = 1$).
- $Y_i(a = 0)$ denote the cognitive ability of child $i$ if the child were monolingual (potential outcome when $A_i = 0$).
What does it mean to *quantify* a causal effect. We may define the individual-level causal effect of religious service on volunteering for a Alice ($i$) as the difference between two states of the world: one for which Alice experiences regular religious service and the other, not. We write this contrast by referring to the potential outcomes under different levels of exposure:
$$
\text{Causal Effect}_i = Y_i(1) - Y_i(0).
$$
We say there is a causal effect of the bilingual exposure if
$$
Y_i(1) - Y_i(0) \neq 0.
$$
Because each person experiences only **one** exposure condition in reality, we cannot directly compute this difference from any dataset — the missing observation is called the **counterfactual**:
- If $Y_i|A_i = 1$ is observed, then $Y_i(0)|A_i=1$ is counterfactual.
- If $Y_i|A_i = 0$ is observed, then $Y_i(1)|A_i=1$ is counterfactual.
**"And sorry I could not travel both / And be one traveler, long I stood $\dots$"**
In short, individuals cannot simultaneously experience both exposure conditions, so one outcome is inevitably missing.
## How can we make contrasts between counterfactual (potential) outcomes?
### Fundamental Assumption 1: Causal Consistency
Causal consistency means that the potential outcome corresponding to the exposure an individual actually receives is exactly what we observe. In other words, if individual $i$ receives exposure $a$, then the *potential outcome* (or equivalently the *counterfactual outcome* under a given level of exposure $A=a$ -- that is $Y_i(a)$ -- is equivalent to the *the observed outcome*: $Y_i \mid A_i \equiv a$. Where the symbol $\equiv$ means "equivalent to", when we assume that the causal consistency assumption is satisfied, we assume that:
$$
\begin{aligned}
\underbrace{Y_i(1)}_{\text{counterfactual}} &\equiv \underbrace{(Y_i \mid A_i = 1)}_{\text{observable}}, \\
\underbrace{Y_i(0)}_{\text{counterfactual}} &\equiv \underbrace{(Y_i \mid A_i = 0)}_{\text{observable}}.
\end{aligned}
$$
Notice however that we cannot generally obtain individual causal effects because at any given time, each individual may only receive at most one leve of an exposure. Where the symbol $\implies$ means "implies," at any given time, receiving one level of an exposure precludes receiving any other level of that exposure:
$$
Y_i|A_i = 1 \implies Y_i(0)|A_i = 1~ \text{is counterfactual}
$$
Likewise:
$$
Y_i|A_i = 0 \implies Y_i(1)|A_i = 1~ \text{is counterfactual}
$$
Because of the laws of physics (above the atomic scale), an individual can experience only one exposure level at any moment. Consequently, we can observe only one of the two counterfactual outcomes needed to quantify a causal effect. This is the fundamental problem of causal inference. Counterfactual contrasts cannot be individually observed.
However, because of the causal consistency assumption, we can nevertheless recover half of the missing counterfactual (or “potential”) outcomes needed to estimate average treatment effects. We may do this if two other assumptions are satisfied.
### Fundamental Assumption 2: Exchangeability
Exchangeability justifies recovering unobserved counterfactuals from observed outcomes and averaging them. By accepting that $Y_i(a) = Y_i$ if $A_i = a$, we can estimate population-level average potential outcomes. In an experiment where exposure groups are comparable, we define the Average Treatment Effect (ATE) as:
$$
\begin{aligned}
\text{ATE} &= \mathbb{E}[Y(1)] - \mathbb{E}[Y(0)] \\
&= \mathbb{E}(Y \mid A=1) \;-\; \mathbb{E}(Y \mid A=0).
\end{aligned}
$$
Because randomisation (or more generally control over the probability of receiving treatment) ensures that missing counterfactuals are exchangeable with those observed, we can still estimate $\mathbb{E}[Y(a)]$. For instance, assume:
$$
\underbrace{\mathbb{E}[Y(1)\mid A=1]}_{\text{counterfactual}} = \textcolor{red}{\underbrace{\mathbb{E}[Y(1)\mid A=0]}_{\text{unobservable}}} = \underbrace{(Y_i \mid A_i = 1)}_{\text{observed}}
$$
which lets us infer the average outcome if everyone were treated. Likewise, if
$$
\underbrace{\mathbb{E}[Y(0)\mid A=0]}_{\text{counterfactual}} = \textcolor{red}{\underbrace{\mathbb{E}[Y(1)\mid A=0]}_{\text{unobservable}}} = \underbrace{\mathbb{E}(Y \mid A_i = 0)}_{\text{observed}}
$$
then we can infer the average outcome if everyone were given the control. The difference between these two quantities gives the ATE:
$$
\text{ATE} = \Big[
\overbrace{\mathbb{E}[Y(1)\mid A=1]}^{\substack{\text{by consistency:}\\ \equiv \text{ observed } \; \mathbb{E}[Y\mid A=1]}}
\;+\;
\overbrace{\textcolor{red}{\mathbb{E}[Y(1)\mid A=0]}}^{\substack{\text{by exchangeability:}\\ \text{unobservable, yet } \; \equiv \mathbb{E}[Y\mid A=1]}}
\Big]
-\,
\Big[
\overbrace{\mathbb{E}[Y(0)\mid A=0]}^{\substack{\text{by consistency:}\\ \equiv \text{observed } \; \mathbb{E}[Y\mid A=0]}}
\;+\;
\overbrace{\textcolor{red}{\mathbb{E}[Y(0)\mid A=1]}}^{\substack{\text{by exchangeability:}\\ \text{unobservable, yet } \; \equiv \mathbb{E}[Y\mid A=0]}}
\Big]
$$
We have it that $\mathbb{E}[Y\mid A=1]$ and $\mathbb{E}[Y\mid A=0]$ and $\mathbb{E}[Y(1)\mid A=0]$ are observed. If both consistency and exchangeability are satisifed then we may use these observed quantities to identify contrasts of counterfactual quanities.
Thus, although individual-level counterfactuals are missing, the consistency assumptions and the exchangeability assumptions allow us to identify the average effect of treatment using observed data. Randomised controlled experiments allow us to meet these assumptions. Randomisation warrents the exchangeability assumption. Control warrents the consistency asumption.
### Fundamental Assumption 3: Positivity
There is one further assumption, called positivity. It states that treatment assignments cannot be deterministic. That is, for every covariate pattern $L = l$, each individual has a non-zero probability of receiving ever treatment level to be compared:
$$
P(A = a \mid L = l) > 0.
$$
Randomised experiments achieve positivity by design -- at least for the sample that is selected into the study. In observational settings violations occur if some subgroups never receive a particular treatment. If treatments occur but are rare, we may have sufficient data from which to obtain convincing causal inferences.
Positivity is the only assumption that can be verified with data.
## Challenges with Observational Data
### 1. Satisfying Causal Consistency is Difficult in Observational Settings
Below are some ways in which real-world complexities can violate causal consistency in observational studies. For example, causal consistency requires there is no interference between units (also called "SUTVA" or "Stable Unit Treatment Value." Causal consistency also requires that each treatment level is well-defined and applied uniformly. If these conditions fail, then $Y(a)$ may not reflect a consistent exposures across individuals. We are then comparing apples with oranges. Consider some examples:
- **Cultural Dependence**: one group’s "religious service" will differ qualitatively from another's. Attending a !Kung! healing ritual and an Aztec human sacrifice may, plausible, produce different responses in people. In one, charity, in the other, terror.
- **Social Dependencies**: Watts sees Alice gives, so Watts gives too. Here effect under treatment differs depending on how others respond.
If the actual exposures differ across individuals, then consistency $(Y_i(a) = Y_i \mid A_i)$ may fail, because $A=a$ is not the same phenomenon for everyone.
### 2. Conditional Exchangeability (No Unmeasured Confounding) Is Difficult to Achieve
In theory, we can identify a causal effect from observational data if all confounders $L$ are measured. Formally, we need the potential outcomes to be independent of treatment once we condition on $L$. One way to express this asssumption is: $Y(a) \coprod A \mid L$. If the potential outcomes are independent of treatment assignment, we can identify the Average Treatment Effect (ATE) as:
$$
\text{ATE} \;=\; \sum_{l}
\Bigl[\mathbb{E}\bigl(Y \mid A=1, L=l\bigr) \;-\; \mathbb{E}\bigl(Y \mid A=0, L=l\bigr)\Bigr] \;\Pr(L=l).
$$
In randomised experiments, conditioning is automatic because $A$ is unrelated to potential outcomes by design. In observational studies, ensuring or approximating such **conditional exchangeability** is often difficult. For example, bilingualism research would need to consider:
- **Cultural histories**: cultures that value language acquistion might also value knowledge acquistion. Associations might arise from Culture, not causation.
- **Personal values**: families who place a high priority on bilingualism may also promote other developmental resources.
If important confounders go unmeasured or are poorly measured, these differences can bias causal effect estimates.
### 3. The Positivity Assumption May Fail: Treatments Might Not Exist for All
Positivity requires that each individual could, in principle, receive *any* exposure level. But in real-world observational settings, some groups have no access to bilingual education (or no reason to be monolingual), making certain treatment levels impossible for them. If a treatment level does not appear in the data for a given subgroup, any causal effect estimate for that subgroup is purely an extrapolation [@westreich2010; @hernan2023].
## Summary
We introduced the fundamental problem of causal inference by distinguishing correlation (associations in the data) from causation (contrasts between potential outcomes, of which only one can be observed for each individual).
**Randomised experiments** address this problem by balancing confounding variables across treatment levels. Although individual causal effects are unobservable, random assignment allows us to infer **average** causal effects — also called *marginal* effects.
In **observational data**, inferring average treatment effects demands that we satisfy three assumptions that are automatically satisfied in (well-conducted) experiments: **causal consistency**, **exchangeability**, and **positivity**. These assumptions ensure that we can compare like-with-like (that the population-level treatment effect is consistent across individuals), that there are no unmeasured common causes of the exposure and outcomes that may lead to associations in the absence of causality, and that every exposure level is a real possibility for each subgroup.
## What is Causality?
## David Hume's Two Definitions in *Enquiries* (1751)
### Definition 1:
> We may define a cause to be an object followed by another, and where all the objects, similar to the first, are followed by objects similar to the second...
### Definition 2:
> Or, in other words, where, if the first object had not been, the second never would have existed
*Enquiries Concerning Human Understanding, and Concerning the Principles of Morals* 1751
## Our lives are filled with "What If?" questions
## **The Fundamental Problem of Causal Inference**
To **quantify** a causal effect requires a **counterfactual** contrast:
$$\tau_{you} = \Big[Y_{\text{you}}(a =1) - Y_{\text{you}}(a=0)\Big]$$
Where, $Y(a)$ denotes the potential outcome under an intervention $A = a$. Here, we assume a binary intervention. At any time, we may observe the outcome of only \textcolor{red}{one intervention, not both.}
## However, we only observe *facts* not *counterfactuals*
$$
Y_i|A_i = 1 \implies Y_i(0)|A_i = 1~ \textcolor{red}{\text{is counterfactual}}
$$
::: footer
"**And sorry I could not travel both. And be one traveller, long I stood** $\dots$"
:::
## Average Treatment Effect in randomised controlled experiments work from assumptions
$$
\text{Average Treatment Effect} = \left[ \begin{aligned}
&\left( \underbrace{\mathbb{E}[Y(1)|A = 1]}_{\text{observed}} + \textcolor{cyan}{\underbrace{\mathbb{E}[Y(1)|A = 0]}_{\text{unobserved}}} \right) \\
&- \left( \underbrace{\mathbb{E}[Y(0)|A = 0]}_{\text{observed}} + \textcolor{cyan}{\underbrace{\mathbb{E}[Y(0)|A = 1]}_{\text{unobserved}}} \right)
\end{aligned} \right]
$$
## Under identifying assumptions, we may infer causal effects from associations
$$
\text{ATE} = \sum_{l} \left( \mathbb{E}[Y|A=1, \textcolor{cyan}{L=l}] - \mathbb{E}[Y|A=0, \textcolor{cyan}{L=l}] \right) \times \textcolor{cyan}{\Pr(L=l)}
$$
Where $L$ is a set of measured covariates and $A\coprod Y(a)|L$
# **Part 2: The First Self-Inflicted Error: The *What* Error**
## Paradigmatic Concern: Confounding by Common Cause
$$\commoncauseT$$
## Where assumptions justify, we may condition on measured confounders to obtain balance.
::: columns
::: {.column width="50%"}
$$
L \to A; L \to Y
$$
:::
::: {.column width="50%"}
$$
\boxed{L}
$$
:::
:::
# Error 1 – What (over-adjustment)
## "What Error #1": **Mediator Bias**
$$A \to \boxed{L} \to Y$$
## Data Generating Process
```{r}
#| label: sim1-dgp
#| echo: true
#| eval: true
#| tbl-cap: "Religious Service (A) → Wealth (L) → Charity (Y). The true direct effect of A on Y is zero."
library(tidyverse)
set.seed(123) # reproducibility
n <- 1000
service <- rbinom(n, 1, 0.5) # A
wealth <- 2 * service + rnorm(n) # L
charity <- 1.5 * wealth + rnorm(n, sd = 1.5) # Y
sim1 <- tibble(service, wealth, charity)
```
## Model comparison
```{r}
#| label: sim1-table
#| echo: false
#| message: false
#| warning: false
#| tbl-cap: "Controlling for the mediator reverses the sign of the service coefficient."
library(gtsummary)
library(broom.helpers)
fit_omit <- lm(charity ~ service, data = sim1) # correct
fit_adjust <- lm(charity ~ service + wealth, data = sim1) # biased
tbl_merge(
list(tbl_regression(fit_omit),
tbl_regression(fit_adjust)),
tab_spanner = c("Model A: Omit L",
"Model B: Control for L")
)
```
## Which model looks better?
```{r}
#| label: sim1-metrics
#| echo: true
#| eval: true
#| tbl-cap: "Model B wins on BIC and R² – yet its causal estimate is wrong."
library(performance)
compare_performance(fit_adjust, fit_omit, rank = TRUE)
# BIC(fit_adjust) - BIC(fit_omit) # negative → "better" fit
```
## "What Error #2": Collider Bias
$$A\to \boxed{L} \leftarrow Y$$
## Data Generating Process
```{r}
#| label: sim2-dgp
#| echo: true
#| eval: true
#| tbl-cap: "Religious service (A) ← wealth (L) → donations (Y). A and Y are _independent_ in truth."
set.seed(2025)
n <- 1000
service <- rbinom(n, 1, 0.5) # A
donations <- rnorm(n) # Y
wealth <- rnorm(n, mean = service + donations, sd = 1) # collider L
sim2 <- tibble(service, wealth, donations)
```
## Model Comparison
```{r}
#| label: sim2-table
#| echo: false
#| tbl-cap: "Adding the collider creates a spurious, significant effect of A."
fit_correct <- lm(donations ~ service, data = sim2) # correct
fit_biased <- lm(donations ~ service + wealth, data = sim2) # biased
tbl_merge(
list(tbl_regression(fit_correct),
tbl_regression(fit_biased)),
tab_spanner = c("Model A: Omit L",
"Model B: Control for L (collider)")
)
```
## Which model looks better?
```{r}
#| label: sim2-metrics
#| echo: true
#| eval: true
compare_performance(fit_biased, fit_correct, rank = TRUE)
BIC(fit_biased) - BIC(fit_correct)
```
## Take-Home
::: callout-important
Relying on model fit perpetuates the *causality crisis* in psychology [@bulbulia2022].
Draw the DAG first; decide what belongs in the model before looking at numbers.
:::
## The What Error is widespread in **experimental** studies in the social sciences
- "Overall, we find that **46.7% of the experimental studies** published in APSR, AJPS, and JOP from 2012 to 2014 engaged in posttreatment conditioning (35 of 75 studies) ..."
- "About **1 in 4 drop cases or subset the data based on post-treatment criteria,** and **nearly a third include post-treatment variables as covariates**"
- "Most tellingly, **nearly 1 in 8 articles directly conditions on variables that the authors themselves show as being an outcome of the experiment** -- an unambiguous indicator of **a fundamental lack of understanding ... that conditioning on posttreatment variables can invalidate results from randomized experiments.**"
- "Empirically, then, the answer to the question of **whether the discipline already understands posttreatment bias is clear: It does not.**" [@montgomery2018]
## Mediator Bias control strategy: **Longitudinal Hygiene**
::: columns
::: {.column width="50%"}
$$
A\to L \to Y
$$
:::
::: {.column width="50%"}
$$
A\to Y
$$
:::
Don't include $L$
:::
## How to Tame The What Error? **Hide your future**
Use Repeated Measures on the Same Individuals
## Collider bias control strategy: **Hide your future**
## Collider bias by proxy control strategy: **Hide your future**
## Post-exposure collider bias control strategy: **Hide your future**
<!-- The E-value is the minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and the outcome, conditional on the measured covariates, to explain away treatment–outcome association. -->
## Are longitudinal data + sensitivity analysis enough?
::: columns
::: {.column width="50%"}
If the data we collect were like this: $$Y_{\text{time 0}} ~~...~~ A_{\text{time 1}}$$
:::
::: {.column width="50%"}
We should not be tempted to model this.
$$Y \to A$$
:::
:::
## Are longitudinal data + sensitivity analysis enough?
{Too obvious} this is wrong?
$$Y \to A$$
# Error 2: When (time-zero)
## **Longitudinal data** are **not** enough
Temporal ordering was {precisely the problem} with the observational hormone studies in the 80s/90s that modelled \textcolor{cyan}{longitudinal data}.
**How?**
## Researchers failed to emulate an experiment in their data (Target Trial)
- Women's Health Initiative: overall hazard ratio 1.23 (0.99, 1.53)
- Women's Health Initiative: when broken down by years to follow-up:
- 0-2 years 1.51 (1.06, 2.14)
- 2-5 years 1.31 (0.93, 1.83)
- 5 or more years **0.67 (0.41, 1.09)**
::: {.callout-note title="Survivor Bias." icon="false"}
The observational results can be \textcolor{cyan}{entirely explained by selection bias}
:::
::: {.callout-note title="Emulating a target trial with observational data recovers experimental effects." icon="false"}
Re-modelling **initiation into hormone therapy** recovers experimental findings [@hernan2016].
:::
## Visualising the When Error
```{r}
#| label: fig-time-zero-3lines
#| fig-cap: "Three ways to start the clock. Only one is right."
#| fig-width: 8
#| fig-height: 4
#| echo: false
#| message: false
#| warning: false
library(tidyverse)
# lifeline coordinates ----------------------------------------------------
lifelines <- tibble(
scenario = factor(c("early time-zero", "correct time-zero", "late time-zero"),
levels = c("early time-zero", "correct time-zero", "late time-zero")),
clinic = 0, # day of clinic visit
rx = 30, # day prescription filled
event = 200 # day outcome occurs
)
# plot --------------------------------------------------------------------
ggplot(lifelines) +
## horizontal lifelines
geom_segment(aes(x = clinic, xend = event,
y = scenario, yend = scenario),
linewidth = 1.2) +
## milestone dots
geom_point(aes(x = clinic, y = scenario), size = 3) +
geom_point(aes(x = rx, y = scenario), size = 3) +
geom_point(aes(x = event, y = scenario), size = 3) +
## dashed t = 0 lines
# early (wrong)
geom_segment(x = 0, xend = 0, y = 0.7, yend = 1.3,
linetype = "dashed", colour = "red", linewidth = 1) +
# correct
geom_segment(x = 30, xend = 30, y = 1.7, yend = 2.3,
linetype = "dashed", colour = "forestgreen", linewidth = 1) +
# late (wrong)
geom_segment(x = 60, xend = 60, y = 2.7, yend = 3.3,
linetype = "dashed", colour = "red", linewidth = 1) +
## labels
annotate("text", x = 0, y = 1.35, label = "t = 0 (early, wrong)",
colour = "red", hjust = 0, size = 3.8) +
annotate("text", x = 30, y = 2.35, label = "t = 0 (correct)",
colour = "forestgreen", hjust = 0, size = 3.8) +
annotate("text", x = 60, y = 3.35, label = "t = 0 (late, wrong)",
colour = "red", hjust = 0, size = 3.8) +
annotate("text", x = 0, y = 1.05, label = "clinic visit", vjust = -1.2) +
annotate("text", x = 30, y = 2.05, label = "rx pickup", vjust = -1.2) +
annotate("text", x = 200, y = 3.05, label = "event", vjust = -1.2) +
theme_minimal() +
theme(axis.title = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank())
```
## Target Trial Check List
```{r}
#| label: tbl-target-trial
#| tbl-cap: "Target-trial specification for the hormone-therapy example."
#| echo: false
library(tidyverse)
library(kableExtra)
tribble(
~element, ~definition,
"Eligibility", "Post-menopausal women aged 50–79, no prior CHD",
"Treatment", "Initiate oestrogen + progestin on day of Rx pickup",
"Comparison", "No hormone-therapy initiation on that day",
"Outcome", "All-cause mortality",
"Follow-up", "8 years or until death / loss to follow-up"
) |>
kbl(align = "l") |>
kable_styling(c("striped", "hold_position"))
```