G-computation

Author
Joseph Bulbulia

1

Published

11/1/22

Causal inference

Suppose we want to learn whether a dichotomous exposure has a causal effect. We must answer three questions:

  1. What would happen with exposure?
  2. What would happen with no exposure?
  3. Do these potential outcomes differ?

Consider a binary exposure AA with two levels, A=0A = 0 and A=1A=1 and a continuous outcome YY.  Estimating the causal effect of AA on outcome YY requires contrasting two states of the world.  The state of the the world Y1Y^1 when, perhaps contrary to fact, AA is set to 11 and the state of the world Y0Y^0 when, perhaps contrary to fact, AA is set to 00. We say that AA affects YY  when the quantitiesYa=1Ya=00Y^{a=1} - Y^{a=0} \neq 0.  

The fundamental problem of causal inference.

Note that, at any given time, at most only one level of the exposure AA may be realised for any individual. That is, Ya=1Y^{a=1} and Ya=0Y^{a=0} are never simultaneously observed in any individual.  As such, individual-level causal effects are not typically identifiable from data. This is called the funamental problem of causal inference (; ). It is for this reason that we refer to the Ya=1Y^{a=1} and Ya=0Y^{a=0} – the quantities of interest – as “potential” or “counterfactual” outcomes.

Although we cannot generally obtain counterfactual contrasts for individuals, when certain assumptions are satisfied, we may identify the average or marginal causal effects at the level of groups of individuals who are experience different levels of exposure.  We say there is an average or marginal causal effect if the difference of the averages of the population who are exposed and unexposed does not equal zero:  E(Ya=1)E(Ya=0)0E(Y^{a=1}) - E(Y^{a=0})\neq 0 . Likewise, because the difference of the means is equivalent to the mean of the differences, if E(Ya=1Ya=0)0E(Y^{a=1} - Y^{a=0})\neq 0 ().  This contrast in the expected average of the counterfactual outcomes is the marginal effect. Here, our example uses a binary exposure but we may contrast the counterfactual outcomes for two levels of a continuous exposure. To do this we must state the levels of exposure at which we seek comparisons. 

Critically, to ensure valid inference, we need to ensure that the potential outcomes are independent of the exposures received, namely: 

Ya  ⁣ ⁣ ⁣ALY^a  \perp\!\!\!\perp A|L

Or equivalently:

A ⁣ ⁣ ⁣YaLA \perp\!\!\!\perp Y^a |L

If we have have only observational data, we cannot generally ensure that the exposures are independent of the counterfactual outcomes. For this reason, we use sensitivity analyis such at the calculating an E-value to assess the robustness of any result to unmeasured confounding ().

G-computation (or standardisation)

To consistently estimate a causal association from the statistical association evident in the data, we must infer the average outcomes for the entire population was it subject to different levels of the exposure variable A=a,A=aA = a, A=a*

If we  are interested in obtaining an average causal effect for the population.

ATE=E[Ya]E[Ya]=ATE = E[Y^{a*}] - E[Y^a] =

lE[YA=a,L=l]Pr[L=l]cE[YA=a,L=l]Pr[L=l]\sum_l E[Y|A=a*, L=l]\Pr[L=l] - \sum_c E[Y|A=a, L=l]\Pr[L=l]

We need not estimate Pr(L=l)\Pr(L = l). Rather we may obtain the weighted mean for the distribution of the counfounders in the data by taking the double expectation :(Hernán, MA, Robins JM 2020, page 166).


ATE=E[E(YA=a,L)E(YA=a,L)]ATE = E[E(Y|A = a, \boldsymbol{L}) - E(Y|A = a*, \boldsymbol{L})]

To obtain marginal contrasts for the expected outcomes for the entire population solitary worship to different levels of  congregations size  used the stdReg package in R (Sjölander 2016)

The steps for G-computation

  1. First we fit a regression for each outcome on the exposure A_t0A\_{t0} and baseline covariatesL=(L1,L2,L3Ln)\boldsymbol{L} = (L_1, L_2,L_3 \dots L_n). To avoid implausible assumptions of linearity we model the relationship between the exposure each of the potential outcomes of interest using a cubic spline.  We include in the set of baseline confounders L\boldsymbol{L} the baseline measure of the exposure as well as the baseline response (or responses):

{A_t1,Yt1=(Y1,t1,Y2,t1,Y3,t1Yn,t1)} L\{A\_{t-1}, \boldsymbol{Y_{t-1}} = (Y_{1, t-1}, Y_{2, t-1}, Y_{3, t-1}\dots Y_{n, t-1})\} \subset \boldsymbol{L}  

This gives us

E(YA,L)E(Y|A, \boldsymbol{L})

  1. Second, we use the model in (1) to predict the values of a potential outcome YaY^{a} by setting the exposure to the value A=aA = a.

This gives us

E^(YA=a,L)\hat{E}(Y|A = a, \boldsymbol{L})

  1. Third, we use the model in (1) to predict the values of a different potential outcome YaY^{a*}by setting the exposure to a different value of A=aA =a^{\star}.

This gives us 

E^(YA=a,L)\hat{E}(Y|A = a*,\boldsymbol{L})

  1. Forth we obtain the focal contrast as the expected difference in the expected average outcomes when exposure at levels aa* and aa. Where outcomes are continuous we calculate both mean of YaY^a and the mean of  YaY^{a*} and then obtain their difference.  This gives us: 

ATE^=E^[E^(YA=a,L)E^(YA=a,L)]\hat{ATE} = \hat{E}[\hat{E}(Y|A = a, \boldsymbol{L}) - \hat{E}(Y|A = a*, \boldsymbol{L})]

  1. Where outcomes are binary, we calculate the causal risk ratio in moving between the different levels of A. Where binary outcomes are more common than 10% we use a log-normal model to calculate a causal rate ratio ()

  2. The stdReg package in R calculates standard errors using the Delta method from which we construct confidence intervals under asymptotic assumptions(). Additionally, we pool uncertainty arising from the multiple imputation procedure by employing Rubin’s Rules. 

Acknowledgements

We are grateful to Arvid Sjölander for his help in modifying his stdReg package in R to enable G-computation with multiply-imputed datasets.

References

Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Cambridge University Press.
Hernan, M. A., and J. M. Robins. 2023. Causal Inference. Chapman & Hall/CRC Monographs on Statistics & Applied Probab. Taylor & Francis. https://books.google.co.nz/books?id=\_KnHIAAACAAJ.
Rubin, D. B. 1976. “Inference and Missing Data.” Biometrika 63 (3): 581–92. https://doi.org/10.1093/biomet/63.3.581.
Sjölander, Arvid. 2016. “Regression Standardization with the R Package stdReg.” European Journal of Epidemiology 31 (6): 563–74. https://doi.org/10.1007/s10654-016-0157-3.
VanderWeele, Tyler J, and Peng Ding. 2017. “Sensitivity Analysis in Observational Research: Introducing the e-Value.” Annals of Internal Medicine 167 (4): 268274.
VanderWeele, Tyler J, Maya B Mathur, and Ying Chen. 2020. “Outcome-Wide Longitudinal Designs for Causal Inference: A New Template for Empirical Studies.” Statistical Science 35 (3): 437466.

Reuse

CC BY-NC-SA

Citation

BibTeX citation:
@online{bulbulia2022,
  author = {Joseph Bulbulia},
  title = {G-Computation},
  date = {2022-11-01},
  url = {https://go-bayes.github.io/b-causal-lab/},
  langid = {en}
}
For attribution, please cite this work as:
Joseph Bulbulia. 2022. “G-Computation.” November 1, 2022. https://go-bayes.github.io/b-causal-lab/.