Week 1: How to Ask a Question in Psychological Science?
Lab: Git and GitHub
Today we introduce the following topics relevant to the test(s):
- Confounding (introduced in week 2)
- Internal validity (introduced in week 2)
- External validity (introduced in week 3)
Today we discuss these concepts informally. We define them formally in week 2.
Install R and RStudio before week 2. Instructions are in Lab 2: Install R and RStudio.
Lecture: Introduction to the Course
Two foundational questions
Every causal inquiry in psychological science begins with two questions. First: what do I want to know? Second: for which population does this knowledge generalise? These questions sound simple, yet most published studies answer neither one clearly. A regression table reports associations, but associations are not causes. A convenience sample of undergraduates describes that sample, not the population we care about. This course teaches you to state your question precisely before you touch any data.
The first question forces you to specify an intervention. "Does green space affect happiness?" is vague. A sharper version: "What is the causal effect of moving from a neighbourhood with no accessible park to one with an accessible park on self-reported life satisfaction?" Here the treatment is well-defined ( for park access, for no park access), the outcome is specified ( = life satisfaction), and someone could, in principle, run an experiment to test it.
The second question forces you to specify a target population. Results obtained from a sample of university students in Wellington may not generalise to elderly residents in rural Southland. The distinction between the sample population (who you studied) and the target population (who you want to learn about) is central to external validity, which we revisit formally in Week 4.
The fundamental problem of causal inference
Consider a person who moves to a neighbourhood with a park. She reports high life satisfaction. Would she have reported the same life satisfaction had she stayed in her old neighbourhood without a park? We cannot know, because she experienced only one of the two conditions. This is the fundamental problem of causal inference: for any individual, we can observe at most one potential outcome.
We formalise this with potential outcomes notation. Let denote the outcome that person would experience under treatment (), and the outcome under control (). The individual causal effect is:
Because we observe only one of or , the individual causal effect is never directly observable. This is not a limitation of our methods; it is a logical constraint. No amount of data collection, no statistical technique, and no machine learning algorithm can reveal both potential outcomes for the same person at the same time.
From individuals to populations
If individual effects are unobservable, what can we learn? We can learn about average effects across a population. The Average Treatment Effect (ATE) is defined as:
This is the expected difference in the outcome if everyone in the target population experienced treatment versus if everyone experienced control. The ATE is a population-level quantity. It tells us what would happen on average, not what would happen to any particular person.
The shift from individual to population is not just a concession to practicality. Causal inference contrasts counterfactual states at the population or subpopulation level. When we say "green space causes happiness," we mean that on average, across a defined population, access to green space raises life satisfaction relative to the counterfactual of no access.
How randomisation solves the problem
If we randomly assign people to neighbourhoods with or without parks, treatment assignment is independent of all other characteristics, both observed and unobserved. The healthy, the unhealthy, the wealthy, the poor, the optimists, and the pessimists are all equally probable in both groups. Under randomisation, the average outcome among the treated group estimates , and the average outcome among the control group estimates . Their difference estimates the ATE.
Randomisation works because it satisfies a condition called exchangeability: the potential outcomes are independent of treatment assignment. We write this as . In words, the treatment and control groups are exchangeable: had the treated group received control instead, their average outcome would have matched the control group's observed average, and vice versa.
We develop this idea formally in Week 2, where we also introduce the three assumptions that underpin all causal inference (causal consistency, conditional exchangeability, and positivity). For now, the key insight is that randomisation transforms the unobservable population-level counterfactual contrast into an observable comparison between groups.
What comes next
Most psychological research cannot randomise the variables we care about. We cannot randomly assign people to experience trauma, adopt a religion, or lose a job. This course is about what to do when randomisation is impossible. The answer involves stating our causal assumptions explicitly (using causal diagrams), checking whether those assumptions are sufficient for identification (using the backdoor criterion), and estimating the causal effect using methods that respect the structure of the problem.
The green-space example and counterfactual framework are adapted from:
- Bulbulia JA (2024). "A causal inference framework for cross-cultural research." link
The two foundational questions and the formal treatment of causal inference appear in:
- Bulbulia JA (2024). "Methods in causal inference. Part 1: causal diagrams and confounding." Evolutionary Human Sciences. link
Lab materials: Lab 1: Git and GitHub