Beyond Correlation: A Practical Introduction to Causal Inference in Observational Social Psychology

SASP 2025 Pre-conference Workshop

Author

Affiliation

Joseph Bulbulia

Victoria University of Wellington, New Zealand

Workshop location

Date: Wednesday, 19 November 2025, 09:00–16:00

Location: University of Melbourne (Parkville campus), Redmond Barry Building (Building 115), Level 5 Room 516

Format: Registration via SASP abstract submission; in-person.

Before lunch we’ll cover average treatment effects. This module is self-sufficient, and will culminate with student presentations. If you just want to get a sense of causal inference, and learn about causal graphs, the morning will be sufficient.

If you want to be mighty, stick around. After lunch we’ll get into estimation of heterogeneous treatment effects, and walk through a data exercise that will show you how to do causal inference with using causal machine learning. You deserve to be empowered.

Preparation: Bring your curiosity and attention. For the morning session, bring a writing instrument and some paper, or software that allows free hand writing, as we’ll draw graphs together.

If you want to do the data exercise, download R and Rstudio (or your favourite IDE) and then download the workshop R package (bottom 👇).

Important: make sure you install the package before the workshop. Ping me if you need help.

Links to Lectures

In the weeks before the workshop, I’ll be simplifying the background lectures and adding some background videos. Check back here ~ week before.

Venue Location

We’ll meet on Level 5 of the Redmond Barry Building (Building 115). See the campus map for map for detail by clicking the arrow in the centre of the OpenStreetMap. The sessions will start at 9am.

Why is causal inference important?

Suppose you observe that psychology graduate students who own expensive espresso machines publish more papers. Should your department invest in high-end coffee equipment for all incoming students?

Probably not. This is the difference between prognostic and causal knowledge.

Observational psychology typically recovers prognostic knowledge – which factors predict which others. The word ‘predictor’ is engrained in our vocabulary. And we have a great tolerance for it. Should we? Predictions don’t tell us what would happen if we intervened.

Consider: Owning an expensive coffee machine predicts publication success, but giving someone an expensive machine wouldn’t cause their productivity to increase. The machine is likely a marker for other factors: being further along in candidature, having more disposable income, or possessing pre-existing caffeinated ambition. Of course the machines might help, but how much?

The distinction between prediction and causation, which is often stated only to be ignored, has serious implications for conduct in psychological science

We might observe that adolescents with higher self-esteem have better academic outcomes. But if we were to intervene to boost self-esteem directly (through positive affirmations or praise), would grades improve? Or is self-esteem mostly a marker for other causal factors like stable family environments or prior academic success?
We might find that people who practice mindfulness meditation report lower anxiety. But does meditation cause reduced anxiety, or are less anxious people simply more likely to maintain a meditation practice – i.e. to sit, for long periods of time, still?
We might observe that children who attend preschool show better social skills. But before recommending universal preschool, we need to ask: would sending any child to preschool produce these benefits, or do families who can afford and prioritise preschool differ in ways that independently foster social development?

Prognostic models tell us what to expect. Causal models tell us what we can change. For psychologists aiming to improve human welfare, that distinction is everything.

The workshop

This full-day, practice-oriented workshop introduces modern causal inference for observational data, with an emphasis on discovering for whom effects are strongest (afternoon).

We will:

Unveil a causal workflow that starts with asking a clearly formulated causal question.
Before answering a causal question we must consider and explicate identification assumptions. We will consider these assumptions and introduce graphical tools for addressing them. You will discover that clarifying assumptions is not a statistical task.
Only after stating our questions and assumptions do we turn to the data and their analysis. By the end of the workshop, you will conduct an analysis using doubly robust machine learning – which has many advantages over standard estimators.
After estimation there is communication. I’ll describe how to graphical tools to clarify results audiences with applied interests.

Objectives

You will leave with:

An understanding of how to ask, and answer, causal questions.
Annotated R code for simulating and estimating average and heterogeneous treatment effects.
A curated reading list for self-study.
Examples of how to communicate causal results to academic, policy, and organisational audiences.

My assumptions about you.

The workshop assumes familiarity with regression and R, but no prior training in causal inference.

A note on pace

Please do not worry if the math or code looks intimidating at first glance. We’ll slow everything down, build each idea together from first principles, and keep the focus on intuition before symbols. Bring your curiosity. If you want to do the data exercise, bring a computer pre-loaded with the workshop R (see👇). By the end of the day you’ll understand that causal workflow is manageable, and even fun, when we unpack it step by step.

Setup

Install the current version of R (free) from https://cran.r-project.org/
Install RStudio Desktop (free) from https://posit.co/download/rstudio-desktop/

(Or pick your favourite IDE – I use neovim Lazy – the code doesn’t care.)

Install the workshop packages:

if (!requireNamespace("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

devtools::install_github("go-bayes/causalworkshop")

# optional packages for extended demonstrations
devtools::install_github("go-bayes/margot")

Once installed, run:

library(causalworkshop)
check_workshop_prerequisites()

Download the R scripts:

library(causalworkshop)

get_workshop_scripts()  # copies scripts into ./workshop-scripts
list_workshop_scripts()

Work through scripts 01–05 Prior to the workshop (06 is optional).

Workshop materials

The lectures are somewhat technical at the moment. I’ll be simplifying them before the workshop and posting YouTube videos prior to the workshop. Stay tuned here for updates.
Student presentations will be posted here after the workshop.
Previous talks and examples remain available for reference; we will link to them from the relevant sections.

Agenda

Table 1: Schedule and Learning Goals

Time	Session	Learning goals & activities
09:00–09:45	How to ask a causal question	State precise causal questions with the potential outcomes framework; focus on the average treatment effect (ATE).
09:45–10:30	The causal workflow: from question to answer	Trace the causal workflow: define populations, build causal diagrams, assess assumptions, analyse data, conduct sensitivity analysis, and communicate results.
10:30–11:00	Causal diagrams and the identification problem	Use causal diagrams (DAGs) to identify bias from confounding, colliders, and mediators.
11:00–11:30	Coffee break & discussion (you might need to bring a your own coffee)	Informal space for questions, clarifying assumptions, and networking.
11:30–12:00	Worked example: employer gratitude and worker wellbeing (Zahle Wisely)	Presenter: Zahle Wisely. Apply the causal workflow to estimate the effects of employer gratitude on employee wellbeing; Q&A follows.
12:00–12:30	Worked example: religious service attendance and personality change (Hannah Robinson)	Presenter: Hannah Robinson. Apply the causal workflow to estimate the effects of religious service attendance on personality; Q&A follows.
12:30–13:45	Lunch	Break.
13:45–14:30	Beyond averages: heterogeneous treatment effects and policy trees	Introduce heterogeneous treatment effects (HTE), conditional ATEs, and policy trees; discuss when effects differ across people.
14:30–15:00	Revisiting our examples: policy tree analyses	Analyse heterogeneity in Zahle and Hannah's case studies; interpret tree-based decision rules useful for obtaining policies.
15:00–16:00	Hands-on demonstration: formulating questions and estimating effects in R	R demonstration using causal forests; interpret simulated ATEs and HTEs, graphs, reporting & etc.

Contact

Questions before the workshop? Email me at joseph.bulbulia@vuw.ac.nz.

--- title: "Beyond Correlation: A Practical Introduction to Causal Inference in Observational Social Psychology" subtitle: "SASP 2025 Pre-conference Workshop" author: - name: "Joseph Bulbulia" id: "JB" orcid: "0000-0002-5861-2056" email: "joseph.bulbulia@vuw.ac.nz" corresponding: true affiliations: - name: "Victoria University of Wellington, New Zealand" id: "VUW" format: html: highlight-style: dracula code-fold: true code-overflow: scroll code-line-numbers: true execute: echo: false warning: false error: false editor_options: chunk_output_type: console --- ::: callout-note ### Workshop location **Date:** Wednesday, 19 November 2025, 09:00–16:00 **Location:** University of Melbourne (Parkville campus), Redmond Barry Building (Building 115), Level 5 Room 516 **Format:** Registration via SASP abstract submission; in-person. Before lunch we'll cover average treatment effects. This module is self-sufficient, and will culminate with student presentations. If you just want to get a sense of causal inference, and learn about causal graphs, the morning will be sufficient. If you want to be mighty, stick around. After lunch we'll get into estimation of heterogeneous treatment effects, and walk through a data exercise that will show you how to do causal inference with using **causal machine learning**. You deserve to be empowered. **Preparation**: Bring your curiosity and attention. For the morning session, bring a writing instrument and some paper, or software that allows free hand writing, as we'll draw graphs together. If you want to do the data exercise, download R and Rstudio (or your favourite IDE) and then download the workshop R package (bottom 👇). Important: make sure you install the package **before** the workshop. Ping me if you need help. ::: ::: {.callout-important} ## Links to Lectures In the weeks before the workshop, I'll be **simplifying the background lectures** and adding some background videos. Check back here ~ week before. ::: ## Venue Location We'll meet on Level 5 of the Redmond Barry Building (Building 115). See the campus map for map for detail by clicking the arrow in the centre of the OpenStreetMap. The sessions will start at 9am. ```{r} #| echo: false #| message: false #| warning: false #| column: body-outset library(leaflet) leaflet() %>% addTiles() %>% addMarkers( lng = 144.9617, lat = -37.7971, popup = "Redmond Barry Building (Building 115)<br>L5 Room 516<br><a href='https://maps.unimelb.edu.au/parkville/building/115' target='_blank'>View on campus map</a>" ) ``` ## Why is causal inference important? Suppose you observe that psychology graduate students who own expensive espresso machines publish more papers. Should your department invest in high-end coffee equipment for all incoming students? Probably not. **This is the difference between prognostic and causal knowledge**. Observational psychology typically recovers prognostic knowledge -- which factors predict which others. The word 'predictor' is engrained in our vocabulary. And we have a great tolerance for it. Should we? Predictions don't tell us what would happen if we intervened. Consider: Owning an expensive coffee machine predicts publication success, but giving someone an expensive machine wouldn't cause their productivity to increase. The machine is likely a marker for other factors: being further along in candidature, having more disposable income, or possessing pre-existing caffeinated ambition. Of course the machines might help, but how much? **The distinction between prediction and causation**, which is often stated only to be ignored, **has serious implications for conduct in psychological science** 1) We might observe that adolescents with higher self-esteem have better academic outcomes. But if we were to intervene to boost self-esteem directly (through positive affirmations or praise), would grades improve? Or is self-esteem mostly a marker for other causal factors like stable family environments or prior academic success? 2) We might find that people who practice mindfulness meditation report lower anxiety. But does meditation cause reduced anxiety, or are less anxious people simply more likely to maintain a meditation practice -- i.e. to sit, for long periods of time, still? 3) We might observe that children who attend preschool show better social skills. But before recommending universal preschool, we need to ask: would sending any child to preschool produce these benefits, or do families who can afford and prioritise preschool differ in ways that independently foster social development? **Prognostic models tell us what to expect. Causal models tell us what we can change. For psychologists aiming to improve human welfare, that distinction is everything.** ## The workshop This full-day, practice-oriented workshop introduces modern causal inference for observational data, with an emphasis on discovering for whom effects are strongest (afternoon). We will: - Unveil a **causal workflow** that starts with asking a clearly formulated causal question. - Before answering a causal question we must consider and explicate **identification assumptions**. We will consider these assumptions and introduce graphical tools for addressing them. You will discover that clarifying assumptions is not a statistical task. - **Only after stating our questions and assumptions do we turn to the data and their analysis**. By the end of the workshop, you will conduct an analysis using doubly robust machine learning -- which has many advantages over standard estimators. - After estimation there is **communication**. I'll describe how to graphical tools to clarify results audiences with applied interests. ### Objectives You will leave with: - An understanding of how to ask, and answer, causal questions. - Annotated R code for simulating and estimating average and heterogeneous treatment effects. - A curated reading list for self-study. - Examples of how to communicate causal results to academic, policy, and organisational audiences. ## My assumptions about you. The workshop assumes familiarity with regression and R, but no prior training in causal inference. ::: callout-tip ### A note on pace Please do not worry if the math or code looks intimidating at first glance. We'll slow everything down, build each idea together from first principles, and keep the focus on intuition before symbols. Bring your curiosity. If you want to do the data exercise, bring a computer pre-loaded with the workshop R (see👇). By the end of the day you'll understand that causal workflow is manageable, and even fun, when we unpack it step by step. ::: ## Setup 1. Install the current version of R (free) from <https://cran.r-project.org/> 2. Install RStudio Desktop (free) from <https://posit.co/download/rstudio-desktop/> (Or pick your favourite IDE -- I use neovim Lazy -- the code doesn't care.) 3. Install the workshop packages: ```r if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") } devtools::install_github("go-bayes/causalworkshop") # optional packages for extended demonstrations devtools::install_github("go-bayes/margot") ``` Once installed, run: ```r library(causalworkshop) check_workshop_prerequisites() ``` Download the R scripts: ```r library(causalworkshop) get_workshop_scripts() # copies scripts into ./workshop-scripts list_workshop_scripts() ``` Work through scripts 01–05 **Prior to the workshop** (06 is optional). ## Workshop materials - The lectures are somewhat technical at the moment. I'll be simplifying them before the workshop and posting YouTube videos prior to the workshop. Stay tuned here for updates. - Student presentations will be posted here **after** the workshop. - Previous talks and examples remain available for reference; we will link to them from the relevant sections. ## Agenda ```{r} #| label: tbl-schedule #| tbl-cap: "Schedule and Learning Goals" #| eval: true library(tidyverse) library(kableExtra) schedule <- tibble( Time = c( "09:00–09:45", "09:45–10:30", "10:30–11:00", "11:00–11:30", "11:30–12:00", "12:00–12:30", "12:30–13:45", "13:45–14:30", "14:30–15:00", "15:00–16:00" ), Session = c( "How to ask a causal question", "The causal workflow: from question to answer", "Causal diagrams and the identification problem", "Coffee break & discussion (you might need to bring a your own coffee)", "Worked example: employer gratitude and worker wellbeing (Zahle Wisely)", "Worked example: religious service attendance and personality change (Hannah Robinson)", "Lunch", "Beyond averages: heterogeneous treatment effects and policy trees", "Revisiting our examples: policy tree analyses", "Hands-on demonstration: formulating questions and estimating effects in R" ), `Learning goals & activities` = c( "State precise causal questions with the potential outcomes framework; focus on the average treatment effect (ATE).", "Trace the causal workflow: define populations, build causal diagrams, assess assumptions, analyse data, conduct sensitivity analysis, and communicate results.", "Use causal diagrams (DAGs) to identify bias from confounding, colliders, and mediators.", "Informal space for questions, clarifying assumptions, and networking.", "Presenter: Zahle Wisely. Apply the causal workflow to estimate the effects of employer gratitude on employee wellbeing; Q&A follows.", "Presenter: Hannah Robinson. Apply the causal workflow to estimate the effects of religious service attendance on personality; Q&A follows.", "Break.", "Introduce heterogeneous treatment effects (HTE), conditional ATEs, and policy trees; discuss when effects differ across people.", "Analyse heterogeneity in Zahle and Hannah's case studies; interpret tree-based decision rules useful for obtaining *policies*.", "R demonstration using causal forests; interpret simulated ATEs and HTEs, graphs, reporting & etc." ) ) schedule %>% kbl(col.names = c("Time", "Session", "Learning goals & activities")) %>% kable_styling( bootstrap_options = c("striped", "hover"), full_width = TRUE ) ``` ## Contact Questions before the workshop? Email me at [joseph.bulbulia@vuw.ac.nz](mailto:joseph.bulbulia@vuw.ac.nz).