Lab 10: Measurement Invariance

R script

Download the R script for this lab (right-click → Save As)

This lab introduces measurement invariance testing, a prerequisite for meaningful cross-group comparisons. You will conduct exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and multigroup CFA on simulated distress scale data with known non-invariance built in.

What you will learn

  1. How to assess factorability (KMO, Bartlett's test) and extract factors using EFA
  2. How to fit a CFA in lavaan and evaluate model fit (CFI, RMSEA, SRMR)
  3. How to test configural, metric, and scalar invariance across groups
  4. How to discover and release non-invariant items (partial invariance)
  5. Why measurement invariance matters for causal inference across groups

Connection to the lecture

This lab aligns with Week 10's lecture on classical measurement theory from a causal perspective. The lecture covers measurement error in DAGs, EFA, CFA, and VanderWeele's model linking measurement to causal identification. This lab focuses on the practical skills: fitting and comparing models.

Setup

library(causalworkshop)
library(psych)
library(lavaan)
library(tidyverse)

Install packages

If you haven't installed these packages, run: install.packages(c("psych", "lavaan")) if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes") remotes::install_github("go-bayes/causalworkshop@v0.2.1")

Generate data

The simulate_measurement_items() function generates a 6-item psychological distress scale (modelled on the Kessler-6) with a known factor structure and built-in measurement non-invariance across two groups:

# simulate measurement data
d <- simulate_measurement_items(n = 2000, seed = 2026)

# check structure
dim(d)
names(d)

The six items correspond to:

  • item_1: nervous
  • item_2: hopeless
  • item_3: restless
  • item_4: depressed
  • item_5: effort (everything was an effort)
  • item_6: worthless
# check the true factor loadings
attr(d, "true_loadings")

# check the true intercepts (they differ for items 3 and 5 across groups)
attr(d, "true_intercepts_group0")
attr(d, "true_intercepts_group1")

What is non-invariance?

Items 3 (restless) and 5 (effort) have different intercepts across groups. This means that at the same level of true distress, group 1 members score higher on these two items. If we ignore this, cross-group comparisons of mean distress scores will be biased.

Exploratory factor analysis (EFA)

Before fitting a CFA, we check whether the data are factorable and how many factors to extract.

Factorability

# select just the items
items <- d |> select(item_1:item_6)

# Kaiser-Meyer-Olkin measure of sampling adequacy
psych::KMO(items)

# Bartlett's test of sphericity
psych::cortest.bartlett(cor(items), n = nrow(items))

Interpreting KMO

KMO values above 0.60 are considered adequate for factor analysis. Values above 0.80 are good. Bartlett's test should be significant (p < 0.05), indicating that correlations between items are sufficiently large for factor analysis.

Extract factors

# one-factor solution
fa_1 <- psych::fa(items, nfactors = 1, fm = "ml", rotate = "none")
print(fa_1$loadings, cutoff = 0.3)

# two-factor solution (for comparison)
fa_2 <- psych::fa(items, nfactors = 2, fm = "ml", rotate = "oblimin")
print(fa_2$loadings, cutoff = 0.3)

How many factors?

The one-factor solution should show all six items loading substantially on a single factor (consistent with the data-generating process). The two-factor solution should not improve fit meaningfully. Compare the proportion of variance explained and check whether the two-factor loadings make theoretical sense.

Confirmatory factor analysis (CFA)

Now we specify the one-factor model and fit it using lavaan:

# specify one-factor CFA model
model <- "
  distress =~ item_1 + item_2 + item_3 + item_4 + item_5 + item_6
"

# fit CFA on full sample
fit_cfa <- cfa(model, data = d)

# summary with fit measures
summary(fit_cfa, fit.measures = TRUE, standardized = TRUE)

Evaluate model fit:

# extract key fit indices
fit_indices <- fitmeasures(fit_cfa, c("cfi", "rmsea", "srmr"))
print(round(fit_indices, 3))

Fit index guidelines

  • CFI (Comparative Fit Index): > 0.95 is excellent, > 0.90 is acceptable
  • RMSEA (Root Mean Square Error of Approximation): < 0.06 is excellent, < 0.08 is acceptable
  • SRMR (Standardised Root Mean Square Residual): < 0.08 is acceptable

Multigroup CFA: invariance testing

We now test whether the factor structure is equivalent across the two groups. Invariance testing proceeds through a hierarchy of progressively stricter constraints:

Step 1: Configural invariance

The same factor structure holds in both groups, but loadings and intercepts are freely estimated:

fit_configural <- cfa(model, data = d, group = "group")
summary(fit_configural, fit.measures = TRUE)

Step 2: Metric invariance

Factor loadings are constrained to be equal across groups:

fit_metric <- cfa(model, data = d, group = "group",
                  group.equal = "loadings")
summary(fit_metric, fit.measures = TRUE)

Compare configural and metric models:

lavTestLRT(fit_configural, fit_metric)

Interpreting the comparison

A non-significant chi-square difference (p > 0.05) means the metric model fits no worse than the configural model, so equal loadings across groups are supported. Because chi-square is sensitive to large sample sizes, also inspect $\Delta$CFI, $\Delta$RMSEA, and $\Delta$SRMR below.

Step 3: Scalar invariance

Both loadings and intercepts are constrained to be equal:

fit_scalar <- cfa(model, data = d, group = "group",
                  group.equal = c("loadings", "intercepts"))
summary(fit_scalar, fit.measures = TRUE)

Compare metric and scalar models:

lavTestLRT(fit_metric, fit_scalar)

Expected result

Full scalar invariance should fail. In this lab, chi-square differences are usually significant, and fit-index changes should also worsen when intercepts are constrained. This is by design: items 3 and 5 have different intercepts across groups in the data-generating process.

Discover partial non-invariance

When full scalar invariance fails, we can release constraints on specific items to achieve partial scalar invariance. Based on modification indices or theory, we free the intercepts of items 3 and 5:

# partial scalar invariance: free intercepts for items 3 and 5
model_partial <- "
  distress =~ item_1 + item_2 + item_3 + item_4 + item_5 + item_6
  item_3 ~ c(i3a, i3b) * 1
  item_5 ~ c(i5a, i5b) * 1
"

fit_partial <- cfa(model_partial, data = d, group = "group",
                   group.equal = c("loadings", "intercepts"))
summary(fit_partial, fit.measures = TRUE)

Compare partial scalar with metric invariance:

lavTestLRT(fit_metric, fit_partial)

Calculate fit-index changes between adjacent models:

delta_fit <- tibble(
  comparison = c("Metric - Configural", "Scalar - Metric", "Partial Scalar - Metric"),
  delta_cfi = c(
    fitmeasures(fit_metric, "cfi") - fitmeasures(fit_configural, "cfi"),
    fitmeasures(fit_scalar, "cfi") - fitmeasures(fit_metric, "cfi"),
    fitmeasures(fit_partial, "cfi") - fitmeasures(fit_metric, "cfi")
  ),
  delta_rmsea = c(
    fitmeasures(fit_metric, "rmsea") - fitmeasures(fit_configural, "rmsea"),
    fitmeasures(fit_scalar, "rmsea") - fitmeasures(fit_metric, "rmsea"),
    fitmeasures(fit_partial, "rmsea") - fitmeasures(fit_metric, "rmsea")
  ),
  delta_srmr = c(
    fitmeasures(fit_metric, "srmr") - fitmeasures(fit_configural, "srmr"),
    fitmeasures(fit_scalar, "srmr") - fitmeasures(fit_metric, "srmr"),
    fitmeasures(fit_partial, "srmr") - fitmeasures(fit_metric, "srmr")
  )
) |>
  mutate(across(starts_with("delta_"), \(x) round(x, 3)))

print(delta_fit)

Expected result

Partial scalar invariance should hold after jointly considering chi-square tests and fit-index changes. As a rule of thumb, evidence against invariance is often flagged by $\Delta$CFI < -0.01, $\Delta$RMSEA > 0.015, or $\Delta$SRMR > 0.01. By releasing items 3 and 5, you should see improved fit relative to full scalar constraints.

Compare all models

# summary table of fit indices
models <- list(
  Configural = fit_configural,
  Metric = fit_metric,
  Scalar = fit_scalar,
  "Partial Scalar" = fit_partial
)

fit_table <- map_dfr(names(models), function(name) {
  fm <- fitmeasures(models[[name]], c("cfi", "rmsea", "srmr", "chisq", "df"))
  tibble(
    model = name,
    cfi = round(fm["cfi"], 3),
    rmsea = round(fm["rmsea"], 3),
    srmr = round(fm["srmr"], 3),
    chisq = round(fm["chisq"], 1),
    df = fm["df"]
  )
})

print(fit_table)

Connection to causal inference

Why measurement invariance matters for causal inference

If a scale measures the same construct differently across groups (non-invariance), then cross-group comparisons of treatment effects may be biased. In DAG terms, the measured outcome $Y^*$ is a function of both the true outcome $Y$ and group membership $G$:

$$Y^* = f(Y, G)$$

If $f$ differs by group (non-invariance), then even if the treatment has the same causal effect on $Y$ in both groups, the observed effect on $Y^*$ will differ. This is measurement bias in the causal inference framework.

Establishing measurement invariance before estimating treatment effects is therefore a prerequisite for valid cross-group comparisons.

Exercises

Lab diary

Complete at least two of the following exercises for your lab diary.

  1. Group by sex. Repeat the invariance testing using male as the grouping variable instead of group. Does the invariance pattern change? (Since the data-generating process only introduces non-invariance by group, invariance by male should hold.)

  2. Two-factor model. Fit a two-factor CFA where items 1-3 load on factor 1 and items 4-6 load on factor 2. Compare fit with the one-factor model. Does the data support a two-factor structure?

  3. Interpretation. In one paragraph, explain what you would conclude about using a single distress score to compare psychological wellbeing across two demographic groups, given that items 3 and 5 show non-invariance. What practical steps would you take before reporting cross-group differences?