Causal inference: advice to authors

Joseph Bulbulia https://josephbulbulia.netlify.app (Victoria University of Wellington)https://www.wgtn.ac.nz
2021-OCT-22
# packages

# function for installing dependencies
ipak <- function(pkg) {
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}
# usage
packages <- c(
  "coda",
  "plyr",
  "mvtnorm",
  "scales",
  "dagitty",
  "ggdag",
  "patchwork",
  "tidyverse",
  "brms",
  "rstan",
  "rstanarm",
  "bayesplot",
  "easystats",
  "ggplot2",
  "viridisLite"
)
ipak(packages)
       coda        plyr     mvtnorm      scales     dagitty 
       TRUE        TRUE        TRUE        TRUE        TRUE 
      ggdag   patchwork   tidyverse        brms       rstan 
       TRUE        TRUE        TRUE        TRUE        TRUE 
   rstanarm   bayesplot   easystats     ggplot2 viridisLite 
       TRUE        TRUE        TRUE        TRUE        TRUE 
#devtools::install_github("rmcelreath/rethinking")
# packages
library("tidyverse")
library("ggdag")
library("brms")
library("rstan")
library("rstanarm")
library("tidybayes")
library("bayesplot")
library("easystats")
library("ggplot2")
library("patchwork")

# rstan options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores ())
theme_set(theme_classic())

Directed Acyclical Graphs

A directed acyclical graph (DAG) is a heuristic tools for investigating causal inference in a regression model. A DAG is a graph that has nodes and edges. The edges point in the direction of a causal influence. The flows are “acyclical” because a cause is not at the same time an effect.

The ggdag package in R is useful for identifying potential confounding in your dataset.

Importantly: assessing causal inference using a DAG typically requires assumptions that are not part of your dataset. A poorly specified DAG will lead to poor inference. There is no working around this.

Omitted variable bias

Often when researchers include additional co-variates into their regression into their regression model they are worried about confounding from omitted variable bias.

They use the word “control” to indicate that they are addressing this hazard. The case is familiar. X and Y are uncorrelated. However a lurking third variable is responsible for their apparent correlation. 1 graphs omitted variable bias.

Show code
library(ggdag)
theme_set(theme_dag())

# create confounded triangle
d1 <- confounder_triangle() %>%
  ggdag_dconnected()



# code for creating a DAG
ggdag_ov <- dagify(Y ~ Z,
                   X ~ Z,
                   exposure = "X",
                   outcome = "Y") %>%
  tidy_dagitty(layout = "tree")

# d-connection
dp1 <- ggdag_ov %>%
  ggdag_dconnected(from = "X", to = "Y") + labs(title = "Omitted variable confounding")



d1 <- dagify(Y ~ Z,
             X ~ Z,
             exposure = "X",
             outcome = "Y") %>%
  tidy_dagitty(layout = "tree") %>%
  ggdag_dseparated(node_size = 16, from = "X", to = "Y") + labs(title = "Omitted variable\n confounding")


d2 <- dagify(Y ~ Z,
             X ~ Z,
             exposure = "X",
             outcome = "Y") %>%
  tidy_dagitty(layout = "tree") %>%
  ggdag_dseparated(controlling_for = "Z", node_size = 16) + labs(title = "Adding Z \nremoves confounding")

# # adjustment set
# dp2 <-
#   ggdag::ggdag_adjustment_set(ggdag_ov) + labs(title = "Adjustment set to avoid omitted variable onfounding")

# figure
library(patchwork)
fig1 <-
  d1 + d2 + plot_annotation(tag_levels = 'a') + plot_layout(guides = 'collect')

# graph figure
fig1 # 8 * 6
Ommitted variable bias occurs when a third variable, z, causes x and y to share information

Figure 1: Ommitted variable bias occurs when a third variable, z, causes x and y to share information

Show code
# save graph
ggsave(
  fig1,
  path = here::here("figs"),
  width = 10,
  height = 5,
  units = "in",
  filename = "fig1.jpg",
  device = 'jpeg',
  limitsize = FALSE,
  dpi = 1200
)

Simulation of omitted variable confounding

We can simulate date that have this causal relationship and ask what would happen were we to regress X on Y without accounting for Z.

Show code
# simulate ommitted variable bias
set.seed(123)
N <- 100
z <- rnorm(N)# sim z
x <- rnorm(N , -z) # sim z -> x
y <- rnorm(N , z) # sim z -> y
df <- data.frame(x, z, y)

# note there is a predictive relationship between X to Y
plot(x, y) 

Regressing X on Y:

Show code
m0 <- lm(y ~ x, data = df)
parameters::model_parameters(m0) %>%
  print_html()
Model Summary
Parameter Coefficient SE 95% CI t(98) p
(Intercept) 0.13 0.11 (-0.09, 0.35) 1.18 0.241
x -0.40 0.08 (-0.56, -0.23) -4.85 < .001

X and Y are correlated. If we know X we can better predict Y. However our ability to predict arises from a process outside of X and Y. Put metaphorically, X “listens to” Z and Y “listens to” Z because Z influences both X and Y. If Z were to change, then so too would X and so too would Y. However X does not cause Y. If we were to intervene in the world and change X this would not change Y. Put yet another way, once we know Z, we do not gain any additional information about Y from X.

To understand the implications of information flow with in a regression model we may use directed acylical graphs or DAGS

Figure 2b tells us to obtain an unbiased estimate of Y on X we must condition on Z.

And indeed, when we included the omitted variable Z in our simulated dateset it breaks the association between X and Y:

Show code
m1 <- lm(y ~ x + z, data = df)

parameters::model_parameters(m1) %>%
  print_html()
Model Summary
Parameter Coefficient SE 95% CI t(97) p
(Intercept) 0.14 0.10 (-0.06, 0.33) 1.40 0.163
x 0.02 0.10 (-0.17, 0.22) 0.24 0.810
z 0.89 0.15 (0.60, 1.18) 6.03 < .001

Post-treatment confounding

Many applied scientists equate omitted variable bias with confounding. For this reason, they seek to “control” for every variable that might be associated with both X and Y. If the relationship between X and Y holds after including such “controls,” many applied scientist will infer that there is a causal relationship between X and Y. We will see that this is not the case; quite the opposite, controlling for many variables is an invitation to confounding, leading to erroneous causal inference. Do not adopt a “causal salad” approach to regression.

“But the approach which dominates in many parts of biology and the social sciences is instead CAUSAL SALAD. Causal salad means tossing various “control” variables into a statistical model, observing changes in estimates, and then telling a story about causation. Causal salad seems founded on the notion that only omitted variables can mislead us about causation. But included variables can just as easily confound us. When tossing a causal salad, a model that makes good predictions may still mislead about causation. If we use the model to plan an intervention, it will get everything wrong. Richard McElreath “Statistical Rethinking” Chapter 1, p46.

Suppose we are interested in the effect of X on Y in a scenario when Z fully mediates the relationship of X on Y.

Show code
# medation model -- full mediation
library(ggdag)
dag_1 <- dagify(Y ~ Z,
                Z ~ X,
                exposure = "X",
                outcome = "Y") %>%
  tidy_dagitty(layout = "tree")

# dag_1 %>%
# ggdag()

pdm <-
  ggdag::ggdag_dconnected(dag_1, from = "X", to = "Y") + labs(title = "Y ~ X") + theme_dag_blank()


pdm2 <-
  ggdag::ggdag_dconnected(dag_1, from = "X", to = "Y", controlling_for = c("Z")) + labs(title = "Y ~ X + Z", subtitle = "Adding Z blocks X->Y, do not condition on Z.") + theme_dag_blank()



# figure
library("ggplot2")
library("patchwork")
fig2 <-
  pdm + pdm2  + plot_annotation(tag_levels = 'a',
                                      title = "Post-treatment confounding") +  plot_layout(guides = 'collect')
                                     # subtitle = "When the causal effect of x on y is fully mediated, \ncontrolling for a mediator will block the path between x and y")  + plot_layout(guides = 'collect')

# graph
fig2
Show code

# practical example, not run
# dm<- dagify( 
#        y ~ m,
#        m ~ x, 
#        labels = c("y" = "Satisfaction with life", 
#                   "m" = "Friendships",
#                   "x" = "Religious Service")
#        )
#   
#        
# pdm <- ggdag(dm, text = FALSE, from = "x",
#                    to = "y", use_labels = "label") %>%
#   ggd
# 
# pdm2 <- ggdag_dseparated(dm, 
#                          from = "x",
#                    to = "y",
#                    controlling_for = "m",text = FALSE, use_labels = "label")
# 
# pdm3 <- ggdag_adjustment_set(dm, 
#                          exposure  = "x",
#                    outcome = "y",text = FALSE, use_labels = "label")
Show code
#save graph
ggsave(
  fig2,
  path = here::here("figs"),
  width = 10,
  height = 5,
  units = "in",
  filename = "fig2.jpg",
  device = 'jpeg',
  limitsize = FALSE,
  dpi = 1200
)

Simulation of post-treatment confounding

What variables do we need to include to obtain an unbiased estimate of X on Y?

Let’s fill out this example out by imagining an experiment.

Suppose we want to know whether a ritual action condition (X) influences charity (Y). We have good reason to assume the effect of X on Y happens entirely through perceived social cohesion (M):

X\(\to\)M\(\to\)Z or ritual \(\to\) social cohesion \(\to\) charity

Lets simulate some data

Show code
set.seed(123)

# Participants
N <- 100

# initial charitable giving
c0 <- rnorm(N , 10 , 2)

# assign treatments and simulate charitable giving and increase in social cohesion
ritual <- rep(0:1 , each = N / 2)
cohesion <- ritual * rnorm(N, .5, .2)

# increase in charity
c1 <- c0 + ritual * cohesion

# dataframe
d <- data.frame(
  c0 = c0 ,
  c1 = c1 ,
  ritual = ritual ,
  cohesion = cohesion
)

# this code is handy from the rethinking package
rethinking::precis(d)
               mean        sd     5.5%      94.5%
c0       10.1808118 1.8256318 7.509343 13.0941897
c1       10.4346925 1.8494557 7.607934 13.3788824
ritual    0.5000000 0.5025189 0.000000  1.0000000
cohesion  0.2538807 0.2868199 0.000000  0.7038016
            histogram
c0         ▁▁▃▃▇▇▃▂▂▁
c1         ▁▁▂▃▇▇▅▃▂▁
ritual     ▇▁▁▁▁▁▁▁▁▇
cohesion ▇▁▁▁▂▁▁▁▁▁▁▁

Does the ritual increase charity?

If we only include the ritual condition in the model, we find that ritual condition reliable predicts increases in charitable giving:

Show code
parameters::model_parameters(lm(c1 ~  c0 + ritual, data = d)) %>% print_html()
Model Summary
Parameter Coefficient SE 95% CI t(97) p
(Intercept) 0.08 0.08 (-0.07, 0.23) 1.05 0.297
c0 0.99 7.26e-03 (0.98, 1.01) 136.74 < .001
ritual 0.51 0.03 (0.46, 0.56) 19.33 < .001

Does the ritual increase charity adjusting for levels of social cohesion?

Show code
parameters::model_parameters(lm(c1 ~  c0 + ritual + cohesion, data = d)) %>% print_html()
Model Summary
Parameter Coefficient SE 95% CI t(96) p
(Intercept) -5.68e-15 4.21e-16 (-6.52e-15, -4.85e-15) -13.49 < .001
c0 1.00 4.06e-17 (1.00, 1.00) 2.46e+16 < .001
ritual 1.78e-16 3.23e-16 (-4.63e-16, 8.19e-16) 0.55 0.583
cohesion 1.00 5.64e-16 (1.00, 1.00) 1.77e+15 < .001

The answer is that the (direct) effect of ritual entirely drops out when we include both ritual and social cohesion. Why is this? The answer is that once our model knows m it does not obtain any new information by knowing x.

If we were interested in assessing x\(\to\)y but x were to effect y through m (i.e x\(\to\)m\(\to\)y) then conditioning on m would block the path from x\(\to\)y. Including m leads to Pipe Confounding.

In experiments we should never condition on a post-treatment variable.

Collider confounding

Collider confounding occurs when we condition on a common effect of x and y.

Show code
# dag
dag_3 <- dagify(Z ~ Y,
                Z ~ X,
                exposure = "X",
                outcome = "Y") %>%
  tidy_dagitty(layout = "fr")

# check d-connected
ddm <-
  ggdag::ggdag_dconnected(dag_3) + labs(title = "Y ~ X (no Z)")

# check d-connected controlling for Z
ddm2 <-
  ggdag::ggdag_dconnected(dag_3, controlling_for = "Z") + labs(title = "Y ~ X + Z", subtitle = "Adding Z opens X->Y")



# experiment
library(ggdag)
dg_suz <- ggdag::dagify(
 b ~  im + ordr + rel + sr + edu +  st + cny,
  sr ~ rel + cny, 
  rel ~  age + ses + edu + male + cny,
  ses ~ cny + edu + age,
  edu ~ cny + male + age,
  im ~ mem + rel + cny + sr + edu,
  mem ~ age + edu + ordr,
  exposure = "sr",
  outcome = "b",
  labels = c(
    "b" = "statement credibility\n(outcome)",
    "sr" = "source\n(exposure)",
    "st" = "statement",
    "im" = "statement importance",
    "mem" = "statement memory",
    "rel" = "religiosity",
    "cny" = "country",
    "male" = "male",
    "ordr" = "order",
    "ses" = "perceived SES",
    "edu" = "education",
    "age" = "age"
  ) ) 

#  not run
# text_size <- 40
# dd4 <- ggdag::ggdag_collider(
#   dg_suz,
#   text = FALSE,
#   use_labels = "label",
#   node_size = 10,
# #  label_size = 10,
#   text_col = "white",
#   label_col = "white",
# ) +
#   #geom_dag_text(size = 18) +
#   # geom_dag_node(size = 40) +
#   geom_dag_collider_edges(
#     size = 0.6 , show.legend = NA) +
#     # angle = 90,
#     # ncp = 5,
#     # arrow = NULL,
#     # lineend = "butt",
#     # na.rm = FALSE,
#     # show.legend = NA) +
#     # geom_dag_text(colour = "steelblue")
#   #  scale_colour_viridis_d(option = "cividis") +
#     scale_color_viridis_d(
#      # name = "d-relationship",
#       end = .8
#     )


# colliders
dd4 <- ggdag::dagify(
  b ~  im + ordr + rel + sr + edu +  st + cny,
  sr ~ rel + cny, 
  rel ~  age + ses + edu + male + cny,
  ses ~ cny + edu + age,
  edu ~ cny + male + age,
  im ~ mem + rel + cny + sr + edu,
  mem ~ age + edu + ordr,
  exposure = "sr",
  outcome = "b",
  labels = c(
    "b" = "statement credibility\n(outcome)",
    "sr" = "source\n(exposure)",
    "st" = "statement",
    "im" = "statement importance",
    "mem" = "statement memory",
    "rel" = "religiosity",
    "cny" = "country",
    "male" = "male",
    "ordr" = "order",
    "ses" = "perceived SES",
    "edu" = "education",
    "age" = "age"
  )
) %>%
  tidy_dagitty() %>%
  node_collider() %>%
  ggplot(aes(
    x = x,
    y = y,
    xend = xend,
    yend = yend ,
    # shape = colliders,
    col = colliders
  ))  +
  scale_color_viridis_d(#name = "d-relationship",
    na.value = "grey85",
    option = "H") +
  geom_dag_edges() +
  #  geom_dag_collider_edges() +
  geom_dag_node() +
  geom_dag_label_repel(aes(label = label),
                       col = "steelblue",
                       show.legend = FALSE) +
  #  geom_dag_text(col = "white") +
  theme_dag()  +
  #+ scale_adjusted() +
  # expand_plot(expand_y = expansion(c(0.2, 0.2))) +
  labs(title = "Graph of collider relationships among variables \nin an experiment (ignoring exposure and outcome.)")

# Adjustment set
dd5 <- ggdag::dagify(
  b ~  im + ordr + rel + sr + edu +  st + cny,
  sr ~ rel + cny, 
  rel ~  age + ses + edu + male + cny,
  ses ~ cny + edu + age,
  edu ~ cny + male + age,
  im ~ mem + rel + cny + sr + edu,
  mem ~ age + edu + ordr,
  exposure = "sr",
  outcome = "b",
  labels = c(
    "b" = "statement credibility\n(outcome)",
    "sr" = "source\n(exposure)",
    "st" = "statement",
    "im" = "statement importance",
    "mem" = "statement memory",
    "rel" = "religiosity",
    "cny" = "country",
    "male" = "male",
    "ordr" = "order",
    "ses" = "perceived SES",
    "edu" = "education",
    "age" = "age"
  )
) %>%
  ggdag_adjustment_set() +
  scale_color_viridis_d(#name = "d-relationship",
    na.value = "grey85",
    option = "E") +
  geom_dag_edges(show.legend = FALSE) +
  # geom_dag_collider_edges() +
  geom_dag_node() +
  geom_dag_label_repel(aes(label = label),
                       col = "steelblue",
                       show.legend = FALSE) +
  #  geom_dag_text(col = "white") +
  theme_dag() +
  # scale_adjusted() +
  expand_plot(expand_y = expansion(c(0.2, 0.2))) +
  labs(title = "To obtain an unbiased estimate of source on credibility among people who\nvary by religiosity we need only condition on religiosity and country.")

# not run
# dd5 <-
#   ggdag::ggdag_adjustment_set(
#     dg_suz,
#     exposure = "rel",
#     outcome = "b",
#     text = FALSE,
#     use_labels = "label",
#     node_size = 10,
#     label_size = 10
#   ) +
#      scale_color_viridis_d(
#     #  name = "d-relationship",
#       na.value = "grey85",
#       end = .4
#     ) +
#   labs(title = "Adjustment sets") + geom_dag_text(na.rm = FALSE,
#                                                   show.legend = FALSE,
#                                                   inherit.aes = TRUE)
#


# figure

# not used
# fig3a <-
#   (ddm + ddm2) + plot_annotation(tag_levels = 'a', title = "Collider Bias")# + plot_layout(guides = 'collect')
# 

# not used
# fig3b <-
#   (dd4  + dd5) + plot_annotation(tag_levels = 'a', title = "Collider Bias in Experiments")# + plot_layout(guides = 'collect')


# graph
fig3 <-
  (ddm + ddm2) / (dd4  + dd5) + plot_annotation(tag_levels = 'a', title = "Collider Bias") # + plot_layout(guides = 'collect')

fig3 # 12 X16)
Show code

# not used
# subtitle = "Experimental control variables may induce collider bias.\nOnly a subset of experimental controls are required \nto close backdoor paths."
Show code
#save graph
ggsave(
  fig3,
  path = here::here("figs"),
  width = 15,
  height = 15,
  units = "in",
  filename = "fig3.jpg",
  device = 'jpeg',
  limitsize = FALSE,
  dpi = 1200
)

Simulation of collider confounding: the selection-distortion effect

Richard McElreath describes the selection-distortion effect (Berkson’s paradox) as an example of Collider Confounding [@mcelreath2020].

Imagine in science there is no relationship between the newsworthiness of science and its trustworthiness. Imagine further that selection committees make decisions on the basis of the both newsworthiness and the trustworthiness of scientific proposals.

This presents us with the following graph

Show code
dag_sd <- dagify(S ~ N,
                 S ~ T,
                 labels = c("S" = "Selection",
                            "N" = "Newsworthy",
                            "T" = "Trustworthy")) %>%
  tidy_dagitty(layout = "nicely")

# Graph
dag_sd %>%
  ggdag(text = FALSE, use_labels = "label")

When two arrows enter into an variable, it opens a path of information between the two variables: trustworth and newsworthy, with each negatively related to the other.

Often this openning of information has disasterous implications. McElreath makes the point that in the human sciences, included variable bias is a woefully underrated problem.

Show code
ggdag_dseparated(
  dag_sd,
  from = "T",
  to = "N",
  controlling_for = "S",
  text = FALSE,
  use_labels = "label"
)

To better avoid the pitfalls of collider bias, applied scientists can use the ggdag package to find colliders conditional on an assumed graph:

Show code
# code for finding colliders
ggdag::ggdag_collider(dag_sd,
                      text = FALSE,
                      use_labels = "label")

The following simulation by Solomon Kurz illustrates the selection-distortion effect, which Richard McElreath discusses in Statistical Rethinking. Kurz’s code is available the CC0-1.0 License here: https://github.com/ASKurz/Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse_2_ed

And his book is here: https://bookdown.org/ajkurz/Statistical_Rethinking_recoded/

I have slightedly modified this code.

First, we simulate uncorrelated variables and a process of selection for sub-populations score high on both indicators.

Show code
# simulate selection distortion effect, following Solomon Kurz
# https://bookdown.org/content/4857/the-haunted-dag-the-causal-terror.html
set.seed(123)
n <- 5000  # number of grant proposals
p <- 0.05  # proportion to select

d <-
  # uncorrelated newsworthiness and trustworthiness
  tibble(
    newsworthiness  = rnorm(n, mean = 0, sd = 1),
    trustworthiness = rnorm(n, mean = 0, sd = 1)
  ) %>%
  # total_score
  mutate(total_score = newsworthiness + trustworthiness) %>%
  # select top 10% of combined scores
  mutate(selected = ifelse(total_score >= quantile(total_score, 1 - p), TRUE, FALSE))

Next filter out the high scoring examples, and assess their correlation.

Note that the act of selection induces a correlation within our dataset.

Show code
d %>%
  filter(selected == TRUE) %>%
  select(newsworthiness, trustworthiness) %>%
  cor()
                newsworthiness trustworthiness
newsworthiness       1.0000000      -0.8038599
trustworthiness     -0.8038599       1.0000000

This makes it seems as if there is a relationship between Trustworthiness and Newsworthiness in science, even when there isn’t any.

Show code
# we'll need this for the annotation
text <-
  tibble(
    newsworthiness  = c(2, 1),
    trustworthiness = c(2.25, -2.5),
    selected        = c(TRUE, FALSE),
    label           = c("selected", "rejected")
  )

# graph 
d %>%
  ggplot(aes(x = newsworthiness, y = trustworthiness, color = selected)) +
  geom_point(aes(shape = selected), alpha = 3 / 4) +
  geom_text(data = text,
            aes(label = label)) +
  geom_smooth(
    data = . %>% filter(selected == TRUE),
    method = "lm",
    fullrange = T,
    color = "lightblue",
    se = F,
    size = 1
  ) +
  # scale_color_manual(values = c("black", "lightblue")) +
  scale_shape_manual(values = c(1, 19)) +
  scale_x_continuous(limits = c(-3, 3.9), expand = c(0, 0)) +
  coord_cartesian(ylim = range(d$trustworthiness)) +
  theme(legend.position = "none") +
  xlab("Newsworthy") +
  ylab("Trustworthy")

Notice, once we know a proposal has been selected, it if is newsworthy we can predict that it is less trustworthy. Our simulation produces this prediction even though we simulated a world in which there is no relationship between trustworthiness and newsworthiness.

Selection bias is commonplace. Imagine that religious service attendance reduces psychological distress and that having a sense of meaning in life reduces distress. Imagine that church attenance does not affect sense of meaning in life. “Controlling for” distress will make it appear as though church attendance causes meaning in life because distress “shares” information.

Show code
dag_ml <- dagify(d ~ rs,
                 d ~ m,
                 labels = c("rs" = "Religious Service",
                            "d" = "Psychological Distress",
                            "m" = "Meaning in Life")) %>%
  tidy_dagitty(layout = "nicely")



pp3 <- ggdag_dseparated(dag_ml, 
                 from = "rs",
                 to = "m",
                 controlling_for = "d",
                 text = FALSE, use_labels = "label")

p3 <- pp3 + labs(title="Collider confounding")
p3

Simulation of collider confounding within experiments

We noted that conditioning on a post-treatment variable can induce bias by blocking the path between the experimental manipulation and the outcome. However, such conditioning can open a path even when there is no experimental effect.

Show code
dag_ex2 <- dagify(
  C1 ~ C0 + U,
  Ch ~ U + R,
  labels = c(
    "R" = "Ritual",
    "C1" = "Charity-post",
    "C0" = "Charity-pre",
    "Ch" = "Cohesion",
    "U" = "Religiousness (Unmeasured)"
  ),
  exposure = "R",
  outcome = "C1",
  latent = "U"
) %>%
  control_for(c("Ch", "C0"))

dag_ex2 %>%
  ggdag(text = FALSE,
        use_labels = "label")

How do we avoid collider bias here?

Note what happens if we condition on cohesion?

Show code
dag_ex2 %>%
  ggdag_collider(text = FALSE,
                 use_labels = "label")  +
  ggtitle("Cohesion is a collider that opens a path from ritual to charity")

The moral of this story: don’t condition on a post-treatment variable!

Show code
dag_ex3 <- dagify(
  C1 ~ C0,
  C1 ~ U,
  Ch ~ U + R,
  labels = c(
    "R" = "Ritual",
    "C1" = "Charity-post",
    "C0" = "Charity-pre",
    "Ch" = "Cohesion",
    "U" = "Religiousness (Unmeasured)"
  ),
  exposure = "R",
  outcome = "C1",
  latent = "U"
)
ggdag_adjustment_set(dag_ex3)

Taxonomy of confounding

There is good news.

As McElreath points out [@mclreath2020], ultimately are only four basic types of confounding:

The Fork (omitted variable bias)

Show code
confounder_triangle(x = "Coffee",
                    y = "Lung Cancer",
                    z = "Smoking") %>%
  ggdag_dconnected(text = FALSE,
                   use_labels = "label")

The Pipe (fully mediated effects)

Show code
mediation_triangle(
  x = NULL,
  y = NULL,
  m = NULL,
  x_y_associated = FALSE
) %>%
  tidy_dagitty(layout = "nicely") %>%
  ggdag()

The Collider

Show code
collider_triangle() %>%
  ggdag_dseparated(controlling_for = "m")

The Descendant

If we “control for” a descendant of a collider, we will introduce collider bias.

Show code
dag_sd <- dagify(
  Z ~ Y,
  Z ~ X,
  D ~ Z,
  labels = c(
    "Z" = "Distress",
    "D" = "Friends",
    "X" = "Religion",
    "Y" = "Meaning"
  ),
  exposure = "X",
  outcome = "Y"
) %>%
  control_for("D") 


ds1 <- ggdag::ggdag_dconnected(dag_sd  , text = FALSE,
                               use_labels = "label") + labs(title = "Meaning ~ Religion") #+ geom_dag_point(color = c("steelblue", "red"))



ds2 <- dag_sd %>%
  ggdag_dseparated(
    from = "X",
    to = "Y",
    controlling_for = "D",
    text = FALSE,
    use_labels = "label"
  )  +
  ggtitle("Meaning ~ Religion + Friends",
          subtitle = "Conditioning on a descendent induces collider bias.") 

dgg <- dagify(
  Z ~ X,
  Z ~ Y,
  D ~ Z,
  labels = c(
    "Z" = "Distress",
    "D" = "Friends",
    "X" = "Religion",
    "Y" = "Meaning"
  ),
  exposure = "X",
  outcome = "Y"
)

ds3 <- ggdag::ggdag_adjustment_set(dgg,
                                   text = FALSE,
                                   use_labels = "label") + labs(title = "Do not condition on Friends") 



library(patchwork)

fig4 <-
  ds1 + ds2  + plot_annotation(tag_levels = 'a',
                               title = "Descendent confounding")  + plot_layout(guides = 'collect')

fig4
Show code
# graph figure
 # 8 x 6
Show code
#save graph
ggsave(
  fig4,
  path = here::here("figs"),
  width = 10,
  height = 10,
  units = "in",
  filename = "fig4.jpg",
  device = 'jpeg',
  limitsize = FALSE,
  dpi = 1200
)

Rules for avoiding confounding

McElreath offers the following advice in Statistical Rethinking, p.286

List all of the paths connecting X (the potential cause of interest) and Y (the outcome).

Classify each path by whether it is open or closed. A path is open unless it contains a collider.

Classify each path by whether it is a backdoor path. A backdoor path has an arrow entering X.

If there are any open backdoor paths, decide which variable(s) to condition on to close it (if possible).

Practically speaking this can be very difficult. We often have many paths to consider.

Appendix 1:

Conditional independence

In the causal inference literature, we evaluate dependencies using the language of “conditional independence.”

Once we condition on Z, the link between X and Y is broken.

Show code
# we can use the language of conditional independence
library(dagitty)
g <- dagitty("dag{ x <- z -> y }")
impliedConditionalIndependencies(g)
x _||_ y | z

We can use implied conditional independence to test whether our DAG is wrong For example, if X were to remain reliably predictive of Y, then we would know that the DAG we have drawn here is not correct.

The data can help us to rule out certain causal assumptions, but the data alone typically do not generate a unique causal model for the world. Causation is under-determined by the data.

Appendix 2: more about mediation

Direct and indirect effects

Suppose we were interested in the causal effect of X on Y. We have a direct effect of X on Y as well as an indirect effect of X on Y through M. We use ggdag to draw the DAG:

Show code
dag_1 <- dagify(y ~ x + m,
                m ~ x,
                exposure = "x",
                outcome = "y") %>%
  tidy_dagitty(layout = "tree")

dag_1 %>%
  ggdag()

What should we condition on if we are interested in the causal effect of changes in X on changes in Y?

We can pose the question to ggdag:

Show code
# ask ggdag which variables to condition on:

ggdag::ggdag_adjustment_set(dag_1)

‘Backdoor Paths Unconditionally Closed’ means that, assuming the DAG we have drawn is correct, we may obtain an unbiased estimate of X on Y without including additional variables.

Later we shall understand why this is the case.1

For now, we can enrich our language for causal inference by considering the concept of d-connected and d-separated:

Two variables are d-connected if information flows between them (condional on the graph), and they are d-separated if they are conditionally independent of each other.

Show code
# use this code to examine d-connectedness
ggdag::ggdag_dconnected(dag_1)

In this case, d-connection is a good thing because we can estimate the causal effect of X on Y. In other cases, d-connection will spoil the model. We have seen this for omitted variable bias. X and Y are d-separated conditional on Z, and that’s our motivation for including Z! These concepts are tricky, but they get easier with practice.

To add some grit to our exploration of mediation lets simulate data that are consistent with our mediation DAG

First we ask, is X is related to Y?

Show code
# regression model
m2 <- lm(y ~ x_s, data = df2)

#table
parameters::model_parameters(m2) %>%
  print_html()
Model Summary
Parameter Coefficient SE 95% CI t(98) p
(Intercept) 0.19 0.14 (-0.08, 0.47) 1.41 0.161
x s 1.66 0.14 (1.38, 1.93) 12.00 < .001

Yes.

Next we ask, Is X related to Y conditional on M?

Show code
# regression model
m2 <- lm(y ~ x_s + m_s, data = df2)

# table
parameters::model_parameters(m2) %>% print_html()
Model Summary
Parameter Coefficient SE 95% CI t(97) p
(Intercept) 0.19 0.10 (4.92e-03, 0.38) 2.04 0.044
x s 0.77 0.13 (0.51, 1.02) 6.00 < .001
m s 1.33 0.13 (1.07, 1.58) 10.34 < .001

Yes, but notice this is a different question. The effect of X is attenuated because M contributes to the causal effect of Y.

A mediation model would tell us the same. Recall from lecture 9 we can write a mediation model as follows:

Show code
path_m <- brms::bf(m_s ~ x_s)
path_y <- brms::bf(y ~ x_s + m_s)

m1 <- brms::brm(
  path_m + path_y + set_rescor(FALSE),
  data = df2,
  file = here::here("models", "mediation-lect11-1")
)

parameters::model_parameters(m1) %>% print_html()
Model Summary
ms
Parameter Median 95% CI pd Rhat ESS
(Intercept) 8.70e-04 (-0.14, 0.15) 50.32% 1.000 4869.00
x_s 0.67 (0.50, 0.81) 100% 1.000 5672.00
sigma 0.76 (0.65, 0.87) 100% 1.000 5393.00
(Intercept) 0.19 (-9.49e-04, 0.39) 97.60% 0.999 5134.00
x_s 0.77 (0.51, 1.04) 100% 1.000 3567.00
m_s 1.33 (1.07, 1.58) 100% 0.999 3585.00
sigma 0.96 (0.84, 1.11) 100% 1.000 5112.00

Recalling:

Show code
bmlm::mlm_path_plot(xlab = "Focal\n(X)",
              mlab = "Mediator\n(M)",
              ylab = "Outcome\n(Y)")

For a mediation model we may recover the indirect, direct and total effects as follow:

Show code
# get posterior distributions
post <- brms::as_draws_df(m1)
Show code
# sum and multiply the posterior distributions to obain parameter estimates
post <- as_tibble(post)
post2 <- post %>%
  transmute(
    a = b_ms_x_s ,
    b = b_y_m_s,
    cp = b_y_x_s,
    me = a * b,
    c = cp + me#,
    # pme = me / c
  )

# plot the results
mcmc_intervals(post2) + theme_classic()

So we can ask how X affects Y in relation to X’s effect on M.

However, to obtain an unbiased causal estimate of X on Y we only needed to include X. We didn’t need to condition on M to estimate the causal effect of X.

Masked relationships

Imagine two variables were to affect an outcome. Both are correlated with each other. One affects the outcome positively and the other affects the outcome negatively. How shall we investigate the causal role of the focal predictor?

Consider two correlated variables that jointly predict Political conservatism (C), religion (R). Imagine that one variable has a positive effect and the other has a negative effect on distress (K6).

First consider this relationship, where conservatism causes religion

Show code
dag_m1 <- dagify(K ~ C + R,
                 R ~ C,
                 exposure = "C",
                 outcome = "K") %>%
  tidy_dagitty(layout = "tree")

# graph
dag_m1 %>%
  ggdag()

We can simulate the data:

Show code
# C -> K <- R
# C -> R
set.seed(123)
n <- 100
C <- rnorm(n)
R <- rnorm(n , C)
K <- rnorm(n , R - C)


d_sim <- data.frame(K = K, R = R, C = C)

First we only condition on conservatism

Show code
ms1 <- parameters::model_parameters(lm(K  ~ C,
                                       data = d_sim))
plot(ms1)
Show code
ms1
Parameter   | Coefficient |   SE |        95% CI | t(98) |     p
----------------------------------------------------------------
(Intercept) |        0.03 | 0.14 | [-0.24, 0.30] |  0.22 | 0.829
C           |       -0.19 | 0.15 | [-0.49, 0.11] | -1.24 | 0.219

Next, only religion:

Show code
ms2 <- parameters::model_parameters(lm(K  ~ R, data = d_sim))
plot(ms2)

When we add both C and R, we see them “pop” in opposite directions, as is typical of masking:

Show code
ms3 <- parameters::model_parameters(lm(K  ~ C + R, data = d_sim))
plot(ms3)

Mediation model

Show code
path_m <- brms::bf(R ~ C)
path_y <- brms::bf(K ~ C + R)

m2 <- brms::brm(
  path_m + path_y + set_rescor(FALSE),
  data = d_sim,
  file = here::here("models", "mediation-lect11-2")
)

# parameters::model_parameters(m2)%>%
#   print_html()

Recalling:

Show code
bmlm::mlm_path_plot(xlab = "Focal\n(X)",
              mlab = "Mediator\n(M)",
              ylab = "Outcome\n(Y)")

We recover the indirect, direct and total effects as follows:

Show code

postA <- brms::as_draws_df(m2)
postA
# A draws_df: 1000 iterations, 4 chains, and 8 variables
   b_R_Intercept b_K_Intercept b_R_C b_K_C b_K_R sigma_R
1          0.041         0.204  0.97  -1.1  1.12    1.22
2         -0.131         0.207  0.99  -1.1  0.82    1.02
3         -0.014         0.107  1.05  -1.3  1.17    1.05
4         -0.075         0.148  1.00  -1.1  0.93    0.86
5         -0.133         0.128  0.88  -1.2  1.08    1.08
6         -0.236         0.172  1.09  -1.2  1.01    1.01
7          0.011         0.192  0.72  -1.1  1.01    0.95
8         -0.073         0.328  1.03  -1.3  1.00    1.00
9         -0.154         0.233  0.99  -1.1  1.11    0.94
10        -0.077         0.037  0.93  -1.0  1.00    1.02
   sigma_K lp__
1     0.99 -287
2     0.88 -285
3     1.06 -284
4     0.92 -282
5     0.99 -282
6     0.88 -282
7     1.06 -284
8     1.03 -283
9     0.99 -282
10    0.92 -281
# ... with 3990 more draws
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}
Show code

postA <- as_tibble(postA)
post2 <- postA %>%
  transmute(
    a = b_R_C ,
    b = b_K_R,
    cp = b_K_C,
    me = a * b,
    c = cp + me #,
    #  pme = me / c
  )
mcmc_intervals(post2) + theme_classic()

Note that when you ask ggdag to assess how to obtain an unbiased estimate of C on K it will tell you you don’t need to condition on R.

Show code
dag_m1 %>%
  ggdag_adjustment_set()

Yet recall when we just assessed the relationship of C on K we got this:

Show code
plot(ms1)

Is the DAG wrong?

No. The fact that C\(\to\)R is positive and R\(\to\)K is negative means that if we were to increase C, we wouldn’t reliably increase K. The total effect of C just isn’t reliable. (Note, Rubin describes this somewhere, as does Tyler Vanderweele’s 2015, add references … anon.)

Readings

Readings are as follows:


  1. We shall see there is no “backdoor path” from X to Y that would bias our estimate, hence the estimate X->Y is an unbiased causal estimate – again, conditional on our DAG.↩︎

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. Source code is available at https://github.com/go-bayes/rbb-causal, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".