Week 1: How to Ask a Question in Psychological Science?

Lab: Git and GitHub

Causal graph: we refer to this image in the lecture and begin reviewing causal graphs in Week 2

Background readings

None today. Recommended readings are listed at the end of this page.

Key concepts for the test(s)

Today we introduce the following topics relevant to the test(s):

Confounding (introduced in week 2)
Internal validity (introduced in week 2)
External validity (introduced in week 3)

Today we discuss these concepts informally. We define them formally in week 2.

Before next week

Install R and RStudio before week 2. Instructions are in Lab 2: Install R and RStudio.

Lecture: Introduction to the Course

Two foundational questions

Every causal inquiry in psychological science begins with two questions. First: what do I want to know? Second: for which population does this knowledge generalise? These questions sound simple, yet most published studies answer neither one clearly. A regression table reports associations, but associations are not causes. A convenience sample of undergraduates describes that sample, not the population we care about. This course teaches you to state your question precisely before you touch any data.

The first question forces you to specify an intervention. "Does green space affect happiness?" is vague. A sharper version: "What is the causal effect of moving from a neighbourhood with no accessible park to one with an accessible park on self-reported life satisfaction?" Here the treatment is well-defined ( $A = 1$ for park access, $A = 0$ for no park access), the outcome is specified ( $Y$ = life satisfaction), and someone could, in principle, run an experiment to test it.

The second question forces you to specify a target population. Results obtained from a sample of university students in Wellington may not generalise to elderly residents in rural Southland. The distinction between the sample population (who you studied) and the target population (who you want to learn about) is central to external validity, which we revisit formally in Week 4.

The fundamental problem of causal inference

Consider a person who moves to a neighbourhood with a park. She reports high life satisfaction. Would she have reported the same life satisfaction had she stayed in her old neighbourhood without a park? We cannot know, because she experienced only one of the two conditions. This is the fundamental problem of causal inference: for any individual, we can observe at most one potential outcome.

We formalise this with potential outcomes notation. Let $Y_{i} (1)$ denote the outcome that person $i$ would experience under treatment ( $A = 1$ ), and $Y_{i} (0)$ the outcome under control ( $A = 0$ ). The individual causal effect is:

$δ_{i} = Y_{i} (1) - Y_{i} (0)$

Because we observe only one of $Y_{i} (1)$ or $Y_{i} (0)$ , the individual causal effect $δ_{i}$ is never directly observable. This is not a limitation of our methods; it is a logical constraint. No amount of data collection, no statistical technique, and no machine learning algorithm can reveal both potential outcomes for the same person at the same time.

From individuals to populations

If individual effects are unobservable, what can we learn? We can learn about average effects across a population. The Average Treatment Effect (ATE) is defined as:

$ATE = E [Y (1) - Y (0)]$

This is the expected difference in the outcome if everyone in the target population experienced treatment versus if everyone experienced control. The ATE is a population-level quantity. It tells us what would happen on average, not what would happen to any particular person.

The shift from individual to population is not just a concession to practicality. Causal inference contrasts counterfactual states at the population or subpopulation level. When we say "green space causes happiness," we mean that on average, across a defined population, access to green space raises life satisfaction relative to the counterfactual of no access.

How randomisation solves the problem

If we randomly assign people to neighbourhoods with or without parks, treatment assignment is independent of all other characteristics, both observed and unobserved. The healthy, the unhealthy, the wealthy, the poor, the optimists, and the pessimists are all equally probable in both groups. Under randomisation, the average outcome among the treated group estimates $E [Y (1)]$ , and the average outcome among the control group estimates $E [Y (0)]$ . Their difference estimates the ATE.

Randomisation works because it satisfies a condition called exchangeability: the potential outcomes are independent of treatment assignment. We write this as $Y (a) ⊥ ⊥ A$ . In words, the treatment and control groups are exchangeable: had the treated group received control instead, their average outcome would have matched the control group's observed average, and vice versa.

We develop this idea formally in Week 2, where we also introduce the three assumptions that underpin all causal inference (causal consistency, conditional exchangeability, and positivity). For now, the key insight is that randomisation transforms the unobservable population-level counterfactual contrast into an observable comparison between groups.

What comes next

Most psychological research cannot randomise the variables we care about. We cannot randomly assign people to experience trauma, adopt a religion, or lose a job. This course is about what to do when randomisation is impossible. The answer involves stating our causal assumptions explicitly (using causal diagrams), checking whether those assumptions are sufficient for identification (using the backdoor criterion), and estimating the causal effect using methods that respect the structure of the problem.

Keyboard shortcuts

PSYC 434: Conducting Research Across Cultures

Week 1: How to Ask a Question in Psychological Science?

Lecture: Introduction to the Course

Two foundational questions

The fundamental problem of causal inference

From individuals to populations

How randomisation solves the problem

What comes next