Temporal Causal Diagrams: Unveiling Causal Order and Indexing Nodes by Time

Author

Joseph Bulbulia

Common Causal Graphs

Elements of causal DAGs:

Nodes: These symbolize variables within a causal system. We denote nodes with letters such as

A, Y A, ~ Y

Edges or Vertices: These are arrows connecting nodes, signifying causal relationships. We denote edges with arrows:

AY A \to Y

Variable Naming Conventions

Outcome: typically denoted by YY. The effect or outcome of interest. Do not attempt to draw a causal DAG unless this outcome is clearly defined. Exposure or Treatment: typically denoted by AA or XX. The intervention. Do not attempt to draw a causal DAG unless the exposure is a clearly defined and does not violate deterministic non-positivity. Confounders: typically denoted by CC or LL. Informally the variables influencing both the exposure/treatment and the outcome. Or more formally: Unmeasured Confounders: typically denoted by UU: Selection Variables: typically denoted by UU: Variables affecting a unit’s inclusion in the study (including retention in the study). Box: denotes conditioning on a variable. For example, to denote selection into the study we write

\frameboxS\framebox{S}

To denote conditioning on a confounder set LL we write

\frameboxL\framebox{L}

Key Concepts

  • Markov Factorisation: Pertains to a causal DAG in which the joint distribution of all nodes can be expressed as a product of conditional distributions. Each variable is conditionally independent of its non-descendants, given its parents. This is crucial for identifying conditional independencies within the graph.
  • D-separation (direction separation): Pertains to a condition in which there is no path between some sets of variables in the graph, given the conditioned variables. Establishing d-separation allows us to infer conditional independencies, which in turn help identify the set of measured variables we need to adjust for to obtain an unbiased estimate of the causal effect, or in the presence of unmeasured or partially measured confounders, to reduce bias.

Assumption of causal diagrammes

Causal Markov Condition

The Causal Markov Condition is an assumption that each variable is independent of its non-descendants, given its parents in the graph. In other words, it assumes that all dependencies between variables are mediated by direct causal relationships. If two variables are correlated, it must be because one causes the other, or they have a shared cause, not because of any unmeasured confounding variables.

Formally, for each variable XX in the graph, XX is independent of its non-descendants NonDesc(XX), given its parents Pa(XX).

This is strong assumption. Typically we must assume that there are hidden, unmeasured confounders that introduce dependencies between variables, which are not depicted in the graph. **It is important to (1) identify known unmeasured confounders and (2) label them on the the causal diagramme.

Faithfulness

The Faithfulness assumption is the inverse to the Causal Markov Condition. It states that if two variables are uncorrelated, it is because there is no direct or indirect causal path between them, not because of any cancelling out of effects. Essentially, it assumes that the relationships in your data are stable and consistent, and will not change if you intervene to change some of the variables.

Formally, if AA and YY are independent given a set of variables LL, then there does not exist a set of edges between AA and YY that remains after conditioning on LL.

As with the Causal Markov Condition, Faithfulness is a strong assumption, and it might not typically hold in the real world. There could be complex causal structures or interactions that lead to apparent independence between variables, even though they are causally related.

General Advice for drawing a causal DAG

  • Define all variables clearly.
  • Define any novel conventions you employ. This could include dotted or coloured arrows to indicate confounding that is induced, or unaddressed (as below)
  • Adopt minimalism. Include only those nodes and edges that are needed to clarify the problem. Use diagrams only when they bring more clarity than textual descriptions alone.
  • Chronological order. Where possible maintain temporal order of the nodes in the spatial order of the graph. Typically from left to right or top to bottom. When depicting repeated measures, index them using time subscripts:
  • Add time-stamps to your nodes. To bring additoinal clarity, it is almost always useful to time-stamp the nodes of your graph, for example, in schematic form:

Lt0At1Yt2 L_{t0} \rightarrow A_{t1} \rightarrow Y_{t2}

  • Where exposures are not assigned randomly, we should nearly always assume unmeasured confounding. For this reason, your causal DAG should include a description of the sensitivity analyses you will perform to clarify the sensitivity of your findings to unmeasured confounding. Where there are known unmeasured confounders these should be described.

Recall that DAGs are qualitative representations. The stamps need not defined clearly defined units of time. Rather time stamps should preserve chronological order.

Elemental counfounds

There are four elemental confounds [@mcelreath2020 p.185]. Consider how chronological consciensciousness assists with understanding both constraints on data.

1. The problem of confounding by common cause

The problem of confounding by common cause arises when there is a variable denoted by LL that influences both the exposure, denoted by AA and the outcome variable, denoted by Y.Y. Because LL is a common cause of AA and LL is may create a statistical association between AA and YY that does not reflect a causal association between AA and YY. Put differently, although intervening on AA might not affect YY, AA and YY may be associated. For example, people who smoke may have yellow fingers. Smoking causes cancer. Because smoking (LL) is a common cause of yellow fingers (AA) and cancer (YY), AA and YY will be associated. However, intervening to change the colour of people’s fingers would not affect cancer. The dashed red arrow in the graph indicate bias arising from the open backdoor path from AA to YY that results from the common cause LL.”

Figure 1: Counfounding by common cause. The dashed red arrow indicates bias arising from the open backdoor path from A to Y.

Advice: attend to the temporal order of cauasality

Confounding by a common cause can be addressed by adjusting for it. Typically we adjust through through statistical models such as regression, matching, or inverse probability of treatment weighting. Again, it is beyond the scope of this tutorial to describe causal estimation techniques. Figure Figure 2 clarifies that any confounding that is a cause of AA and YY will precede AA (and so YY), because causes precede effects. By indexing the the nodes on the graph, we can see that confounding control typically requires time-series data.

Figure 2: Solution: adjust for pre-exposure confounder.

2. Confounding by collider stratification (conditioning on a common effect)

Conditioning on a common effect occurs when a variable LL is affected by both the treatment AA and an outcome YY.

Suppose AA and YY are initially independent, such that AY(a)A \coprod Y(a). Conditioning on the common effect LL opens a backdoor path between AA and YY, possibly inducing an association. This occurs because LL gives information about the relationship of AA and YY. Here’s an example:

Let AA denote “exercise”. Let YY denote “heart disease”. Let LL denote “weight”. Suppose, “exercise” and “heart disease” are not causally linked. However, they both affect “weight”, and if we condition on “weight” in a cross-sectional study, we might find a statistical association between “exercise” and “heart disease” even in the absence of causation.

We denote the observed associations as follows:

  • P(A=1)P(A = 1): Probability of exercising
  • P(Y=1)P(Y = 1): Probability of having heart disease
  • P(L=1)P(L = 1): Probability of being overweight

Without conditioning on LL, we have:

P(A=1,Y=1)=P(A=1)P(Y=1)P(A = 1, Y = 1) = P(A = 1)P(Y = 1)

However, if we condition on LL (thecommon effect of both AA and YY), we find:

P(A=1,Y=1L=1)P(A=1L=1)P(Y=1L=1)P(A = 1, Y = 1 | L = 1) \neq P(A = 1 | L = 1)P(Y = 1 | L = 1)

The common effect LL, once conditioned on, creates a non-causal association between AA and YY. This can mislead us into believing there’s a direct link between exercise and heart disease, which is not the case. In the cross-sectional data, if we only observe AA, YY, and LL without understanding their causal relationship, we might erroneously conclude that there is a causal relationship between AA and YY. This is the collider stratification bias.

Figure 3: Confounding by conditioning on a collider.

Advice: attend to the temporal order of cauasality

To address the problem of conditioning on a common effect, we should generally ensure that all confounders LL that are common causes of the exposure AA and the outcome YY are measured before the occurance of the exposure AA, and furthermore that the exposure AA is measured before the occurance of the outcome YY. If such temporal order is preserved, LL cannot be an effect of AA, and thus neither of YY. By measuring all relevant confounders before the exposure, researchers can minimise the scope for collider confounding by conditioning on a common effect. This rule is not absolute. As indicated in Figure 9, it may be useful in certain circumstances to condition on a confounder that occurs after the outcome has occurred.

Figure 4: Solution: avoid colliders

M-bias: conditioning on a collider that occurs before the exposure may introduce bias

Typically, confounders should be measured before their exposures. However, researchers should be cautious about conditioning on pre-exposure variable, as doing so can induce confounding. As shown in Figure 5, collider stratification may arise even if LL occurs before AA. This happens when LL does not affect AA or YY, but may be the descendent of a unmeasured variable that affects AA and another unmeasured variable that also affects YY. Conditioning on LL in this scenario elicits what is called “M-bias.” Note, however, that if LL is not a common cause of AA and YY, LL should not be included in our model because it is not a source of confounding. Here, AY(a)A \coprod Y(a) and AY(a)LA \cancel{\coprod} Y(a)| L. The solution: do not condition on the pre-exposure variable LL.

Figure 5: M-bias: confounding control by including previous measures of the outcome

3 The problem of conditioning on a mediator

Conditioning on a mediator occurs when LL lies on the causal pathway between the treatment AA and the outcome YY. Conditioning on LL can lead to biased estimates by blocking or distorting the total effect of AA and YY. Where LL is a mediator, including LL will typically attenuate the effect of AA on YY. This scenario is presented in Figure 6. Where LL is a collider between AA and an unmeasured confouder UU, then including LL may increase the strength of association between AA and YY. This scenario is presented in Figure 8.

In either case, unless one is interested in mediation analysis, conditioning on a post-treatment variable is nearly always a bad idea.

Figure 6: Confounding by a mediator.

Advice: attend to the temporal order of cauasality

To address the problem of mediator bias, when interested in total effects do not condition on a mediator. This can be done by ensuring that LL occurs before AA (and YY). Again we discover the importance of an explicit temporal ordering for our variables. Although note, if LL is associated with YY but is not associated with AA conditioning on LL will improve the efficiency of the causal effect estimate of AA on YY. However, if AA might affect LL, then LL might be a mediator, and including LL risks bias. As with some much in causal estimation, we must understand the context.

Figure 7: Ensure confounders occur before exposures.

4. Conditioning on a descendant

Say XX is a cause of XX\prime. If we condition on X we partially condition on XX\prime.

There are both negative and positive implications for causal estimation in real-world scenarios.

First the negative. Suppose there is a confounder LL that is caused by an unobserved variable UU, and is affected by the treatment AA. Suppose further that UU causes the outcome YY. In this scenario, as described in Figure 8, conditioning on LL, which is a descendant of AA and UU, can lead to a spurious association between AA and YY through the path ALUYA \to L \to U \to Y.

Figure 8: Confounding by descent

Advice: attend to the temporal order of causality, and use expert knowledge of all relevant nodes.

Ensuring the confounder (LL) is measured before the exposure (AA) has two benefits.

First, if LL is a confounder, that is, if LL is a variable which if we fail to condition on it will bias the association between treatment and outcome, the strategy of including only pre-treatment indicators of LL will reduce bias. Figure 9 presents this strategy

Figure 9: Solution: again, ensure temporal ordering in all measured variables.

Secondly, note that we may use descendent to reduce bias. For example, if an unmeasured confounder UU affects AA, YY, and LL\prime, then adjusting for LL\prime may help to reduce confounding caused by UU. This scenario is presented in Figure 10. Note that in this graph, LL\prime may occur after the exposure, and indeed after the outcome. This shows that it would be wrong to infer that merely because causes preceed effects, we should only condition on confounders that preceed the exposure.

Figure 10: Solution: note that conditioning on a confounder that occurs after the exposure and outcome addresses the problem of unmeasured confounding. The dotted paths denote that the effect of U on A and Y is partially adjusted by conditioning on L, even though L occurs after the outcome. The dotted blue path suggest suppressing of the biased relationship between A and Y under the null. A genetic factor that affects the exposure and the outcome early in life, and that also expresses a measured indicator late in life, might constitute an example for which post-outcome confounding control might be possible.

Causal Interaction?

Applied researchers will often be interested in testing interactions. What is causal interaction and how may we represent it on a causal diagramme?

We must distinguish the concept of causal interaction from the concept of effect modification.

Causal interaction as two independent exposures

Causal interaction is the effect of two exosures that may occur jointly or separately (or not occur). We say there is interaction on the scale of interest when the effect of one exposure on an outcome depends on the level of another exposure. For example, the effect of a drug (exposure A) on recovery time from a disease (outcome Y) might depend on whether or not the patient is also receiving physical therapy (exposure B). In terms of causal quantities, if we denote the potential outcomes under different exposure combinations as Y(a,b)Y(a,b), a causal interaction on the difference scale would be present if Y(1,1)Y(1,0)Y(0,1)Y(0,0)Y(1,1) - Y(1,0) \neq Y(0,1) - Y(0,0).

When drawing a causal diagram, we represent the two exposures as separate nodes and draw edges from them to the outcome, as showin in Figure 11. This is because causal diagrams are non-parametric; they represent the qualitative aspects of causal relationships without making specific assumptions about the functional form of these relationships.

Figure 11: Causal interaction: the are two exposures are causally independent of each other

Effect measures for causal interaction

On the difference scale, the total causal effect of an exposure AA on an outcome YY is typically quantified as Y(1)Y(0)Y(1) - Y(0), where Y(a)Y(a) represents the potential outcome under exposure level a. If there is another exposure BB, the causal interaction effect on the difference scale would be quantified as [Y(1,1)Y(1,0)][Y(0,1)Y(0,0)][Y(1,1) - Y(1,0)] - [Y(0,1) - Y(0,0)].

Note that causal effect of interactions might differ on the ratio scale. For instance, the total causal effect on the ratio scale would be Y(1)/Y(0)Y(1) / Y(0), and the interaction effect would be [Y(1,1)/Y(1,0)]/[Y(0,1)/Y(0,0)][Y(1,1) / Y(1,0)] / [Y(0,1) / Y(0,0)].

Causal interaction as effect modification

Effect modification models the effect the magnitude of of a single exposure on an outcome across different levels of another variable.

Here we assume independence of the counterfactual outcome conditional on measured confounders, within strata of co-variate G:

Y(a)AL,GY(a) \coprod A | L, G

Note that there here, there is only one counterfactual outcome.

Figure 12: A simple graph for effect-modification.

Advice for causal mediation

  1. No unmeasured exposure-outcome confounders given LL

    This assumption is denoted by Y(a,m)ALY(a,m) \coprod A | L. It implies that when we control for the covariates LL, there are no unmeasured confounders that influence both the exposure AA and the outcome YY. For example, if we are studying the effect of a drug (exposure) on recovery time from a disease (outcome), and age and gender are our covariates LL, this assumption would mean that there are no other factors, not accounted for in LL, that influence both the decision to take the drug and the recovery time.

  2. No unmeasured mediator-outcome confounders given LL

    This assumption is denoted by Y(a,m)MLY(a,m) \coprod M | L. It implies that when we control for the covariates LL, there are no unmeasured confounders that influence both the mediator MM and the outcome YY. For instance, if we are studying the effect of exercise (exposure) on weight loss (outcome) mediated by calorie intake (mediator), and age and gender are our covariates LL, this assumption would mean that there are no other factors, not accounted for in LL, that influence both the calorie intake and the weight loss.

  3. No unmeasured exposure-mediator confounders given LL

    This assumption is denoted by M(a)ALM(a) \coprod A | L. It implies that when we control for the covariates LL, there are no unmeasured confounders that influence both the exposure AA and the mediator MM. Using the previous example, this assumption would mean that there are no other factors, not accounted for in LL, that influence both the decision to exercise and the calorie intake.

  4. No mediator-outcome confounder affected by the exposure (no red arrow)

This assumption is denoted by Y(a,m)MaLY(a,m) \coprod M^{a*} | L. It implies that there are no variables that confound the relationship between the mediator and the outcome that are affected by the exposure. For example, if we are studying the effect of education (exposure) on income (outcome) mediated by job type (mediator), this assumption would mean that there are no factors that influence both job type and income that are affected by the level of education.

These assumptions are fundamental for the identification of causal mediation effects. If these assumptions are violated, the estimates of the mediation effect can be biased. Importantly, these assumptions cannot be fully tested with observed data. They require substantive knowledge about the underlying causal process. Note that when assumption 4 is violated, natural direct and indirect effects are not identified in the data. [Cite Tyler here]

Figure 13: Assumptions for mediation analysis

Advice for modelling repeated exposures in longitudinal data (confounder-treatment feedback)?

Causal mediation is a special case in which we have multiple sequential exposures.

For example, consider temporally fixed multiple exposures. The counterfactual outcomes may be denoted Y(at1,at2)Y(a_{t1} ,a_{t2}). There are four counterfactual outcomes corresponding to the four fixed “treatment regimes”:

  1. Always treat (Y(1,1)): This regime involves providing the treatment at every opportunity.

  2. Never treat (Y(0,0)): This regime involves abstaining from providing the treatment at any opportunity.

  3. Treat once first (Y(1,0)): This regime involves providing the treatment only at the first opportunity and not at subsequent one.

  4. Treat once second (Y(0,1)): This regime involves abstaining from providing the treatment at the first opportunity, but then providing it at the second one.

There are six causal contrasts that we might compute.

  1. Always treat vs. Never treat
  2. Always treat vs. Treat once first
  3. Always treat vs. Treat once second
  4. Never treat vs. Treat once first
  5. Never treat vs. Treat once second
  6. Treat once first vs. Treat once second

We might also consider treatment to be a function of the previous outcome. For example, we might Treat once first and then treat again or do not treat again depending on the outcome of the previous treatment. This is called “time-varying treatment regimes.”

Note that to estimate the “effect” of a treatment regime, we must compare the counterfactual quantities of interest. The same conditions that apply for causal identification in mediation analysis apply to causal idenification in multiple treatment settings. And notice, just as mediation opens the possibility of time-varying confounding (condition 4, in which the exposure effects the confounders of the mediator/outcome path), so too we find that with time-varying treatments comes the problem of time-varying confounding. Unlike traditional causal mediation analysis, the sequence of treatement regimes that we might consider is indefinitely long.

Temporally organised causal diagrammes help us to discover the problems with traditional multi-level regression analysis and structural equation modelling. Suppose we are interested in the question of whether beliefs in big Gods affect social complexity.

First consider fixed regimes Suppose we have well-defined concept of social complexity and excellent measurements over time. Suppose we want to compare the effects of beliefs on big Gods on Social complexity using historical data measured over two centuries. Our question is whether the introduction and persistence of such beliefs differs from having no such beliefs. The treatment strategies are: “always believe in big Gods” versus “never believe in big Gods” on the level of social complexity. The a causal diagram illustrates two time points in our study the study.

Here, AtxA_{tx} represents the cultural belief in “big Gods” at time xx, and YtxY_{tx} is the outcome, social complexity, at time xx. Economic trade, denoted as LtxL_{tx}, is a time-varying confounder because it varies over time and confounds the effect of AA on YY at several time points xx. To complete our causal diagramme we include an unmeasured confounder UU, such as geographical constraints, which might influence both the belief in “big Gods” and social complexity.

We know that the level of economic trade at time 00, Lt0L_{t0}, influences the belief in “big Gods” at time 11, At1A_{t1}. We therefore draw an arrow from Lt0L_{t0} to At1A_{t1}. But we also know that the belief in “big Gods”, At1A_{t1}, affects the future level of economic trade, Lt(2)L_{t(2)}. This means that we need to add an arrow from At1A_{t1} to Lt(2)L_{t(2)}. This causal graph represents a feedback process between the time-varying exposure AA and the time-varying confounder LL. This is the simplest graph with exposure-confounder feedback. In real world setting there could be arrows. However, our DAG however need show the minimum number of arrows to exhibit the problem of exposure-confounder feedback.

What happens if we condition on the time-varying confounder Lt3L_{t3}. Two things occur. First, we block all the backdoor paths between the exposure At2A_{t2} and the outcome. We need to block those paths to eliminate confounding. Therefore, condition on the time-varying confounding would appear to be essential. Second, paths that were previously blocked are now open. For example, the path At1,Lt2,U,Yt(4)A_{t1}, L_{t2}, U, Y_{t(4)}, which was previous closed is opened because the time varying confounder is the common effect of At1A_{t1} and UU. Conditioning opens the path At1,Lt2,U,Y3A_{t1}, L_{t2}, U, Y_{3}. The same problem occurs if the time-varying exposure and time-varying confounder share a common cause (without the exposure affecting the confounder). And the problem is only more entrenched when the exposures At1A_{t1} affects the outcome Yt4Y_{t4}. Because Lt2L_{t2} is along the path from At1A_{t1} to Yt4Y_{t4} conditioning on Lt2L_{t2} partially blocks the path between the exposure and the outcome. Conditioning on Lt2L_{t2} in this setting induces both collider stratification bias and mediator bias. Yet we must conditoin on Lt2L_{t2} to block the open backdoor path between Lt2L_{t2} and Yt4Y_{t4}. The general problem of xposure-confounder feedback is described in detail in [@hernan2023]. This problem presents a serious issue for cultural evolutionary studies. The bad news is that nearly traditional regresion based methods cannot address this problem. The good new is that

Figure 14: Exposure confounder feedback is a problem for time-series models. Unfortunately, this problem cannot be addressed with regression-based methods, whatever the combination of Bayesian, multi-level, and phylogentic sophistication. We may only estimate controlled (simulated) effects in these setting using G-methods. Currently, outside of epidemiology, g-methods are rarely used.

More about SWIGS…

Footnotes

  1. We may compute the combination of contrasts by C(n,r)=n!(nr)!r!C(n, r) = \frac{n!}{(n-r)! \cdot r!}↩︎

Reuse

MIT