Lab 10: Measurement Invariance
R script
Download the R script for this lab (right-click → Save As)
This lab introduces measurement invariance testing, a prerequisite for meaningful cross-group comparisons. You will conduct exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and multigroup CFA on simulated distress scale data with known non-invariance built in.
What you will learn
- How to assess factorability (KMO, Bartlett's test) and extract factors using EFA
- How to fit a CFA in lavaan and evaluate model fit (CFI, RMSEA, SRMR)
- How to test configural, metric, and scalar invariance across groups
- How to discover and release non-invariant items (partial invariance)
- Why measurement invariance matters for causal inference across groups
Connection to the lecture
This lab aligns with Week 10's lecture on classical measurement theory from a causal perspective. The lecture covers measurement error in DAGs, EFA, CFA, and VanderWeele's model linking measurement to causal identification. This lab focuses on the practical skills: fitting and comparing models.
Setup
library(causalworkshop)
library(psych)
library(lavaan)
library(tidyverse)
Install packages
If you haven't installed these packages, run:
install.packages(c("psych", "lavaan"))if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")remotes::install_github("go-bayes/causalworkshop@v0.2.1")
Generate data
The simulate_measurement_items() function generates a 6-item
psychological distress scale (modelled on the Kessler-6) with a known
factor structure and built-in measurement non-invariance across two
groups:
# simulate measurement data
d <- simulate_measurement_items(n = 2000, seed = 2026)
# check structure
dim(d)
names(d)
The six items correspond to:
item_1: nervousitem_2: hopelessitem_3: restlessitem_4: depresseditem_5: effort (everything was an effort)item_6: worthless
# check the true factor loadings
attr(d, "true_loadings")
# check the true intercepts (they differ for items 3 and 5 across groups)
attr(d, "true_intercepts_group0")
attr(d, "true_intercepts_group1")
What is non-invariance?
Items 3 (restless) and 5 (effort) have different intercepts across groups. This means that at the same level of true distress, group 1 members score higher on these two items. If we ignore this, cross-group comparisons of mean distress scores will be biased.
Exploratory factor analysis (EFA)
Before fitting a CFA, we check whether the data are factorable and how many factors to extract.
Factorability
# select just the items
items <- d |> select(item_1:item_6)
# Kaiser-Meyer-Olkin measure of sampling adequacy
psych::KMO(items)
# Bartlett's test of sphericity
psych::cortest.bartlett(cor(items), n = nrow(items))
Interpreting KMO
KMO values above 0.60 are considered adequate for factor analysis. Values above 0.80 are good. Bartlett's test should be significant (p < 0.05), indicating that correlations between items are sufficiently large for factor analysis.
Extract factors
# one-factor solution
fa_1 <- psych::fa(items, nfactors = 1, fm = "ml", rotate = "none")
print(fa_1$loadings, cutoff = 0.3)
# two-factor solution (for comparison)
fa_2 <- psych::fa(items, nfactors = 2, fm = "ml", rotate = "oblimin")
print(fa_2$loadings, cutoff = 0.3)
How many factors?
The one-factor solution should show all six items loading substantially on a single factor (consistent with the data-generating process). The two-factor solution should not improve fit meaningfully. Compare the proportion of variance explained and check whether the two-factor loadings make theoretical sense.
Confirmatory factor analysis (CFA)
Now we specify the one-factor model and fit it using lavaan:
# specify one-factor CFA model
model <- "
distress =~ item_1 + item_2 + item_3 + item_4 + item_5 + item_6
"
# fit CFA on full sample
fit_cfa <- cfa(model, data = d)
# summary with fit measures
summary(fit_cfa, fit.measures = TRUE, standardized = TRUE)
Evaluate model fit:
# extract key fit indices
fit_indices <- fitmeasures(fit_cfa, c("cfi", "rmsea", "srmr"))
print(round(fit_indices, 3))
Fit index guidelines
- CFI (Comparative Fit Index): > 0.95 is excellent, > 0.90 is acceptable
- RMSEA (Root Mean Square Error of Approximation): < 0.06 is excellent, < 0.08 is acceptable
- SRMR (Standardised Root Mean Square Residual): < 0.08 is acceptable
Multigroup CFA: invariance testing
We now test whether the factor structure is equivalent across the two groups. Invariance testing proceeds through a hierarchy of progressively stricter constraints:
Step 1: Configural invariance
The same factor structure holds in both groups, but loadings and intercepts are freely estimated:
fit_configural <- cfa(model, data = d, group = "group")
summary(fit_configural, fit.measures = TRUE)
Step 2: Metric invariance
Factor loadings are constrained to be equal across groups:
fit_metric <- cfa(model, data = d, group = "group",
group.equal = "loadings")
summary(fit_metric, fit.measures = TRUE)
Compare configural and metric models:
lavTestLRT(fit_configural, fit_metric)
Interpreting the comparison
A non-significant chi-square difference (p > 0.05) means the metric model fits no worse than the configural model, so equal loadings across groups are supported. Because chi-square is sensitive to large sample sizes, also inspect $\Delta$CFI, $\Delta$RMSEA, and $\Delta$SRMR below.
Step 3: Scalar invariance
Both loadings and intercepts are constrained to be equal:
fit_scalar <- cfa(model, data = d, group = "group",
group.equal = c("loadings", "intercepts"))
summary(fit_scalar, fit.measures = TRUE)
Compare metric and scalar models:
lavTestLRT(fit_metric, fit_scalar)
Expected result
Full scalar invariance should fail. In this lab, chi-square differences are usually significant, and fit-index changes should also worsen when intercepts are constrained. This is by design: items 3 and 5 have different intercepts across groups in the data-generating process.
Discover partial non-invariance
When full scalar invariance fails, we can release constraints on specific items to achieve partial scalar invariance. Based on modification indices or theory, we free the intercepts of items 3 and 5:
# partial scalar invariance: free intercepts for items 3 and 5
model_partial <- "
distress =~ item_1 + item_2 + item_3 + item_4 + item_5 + item_6
item_3 ~ c(i3a, i3b) * 1
item_5 ~ c(i5a, i5b) * 1
"
fit_partial <- cfa(model_partial, data = d, group = "group",
group.equal = c("loadings", "intercepts"))
summary(fit_partial, fit.measures = TRUE)
Compare partial scalar with metric invariance:
lavTestLRT(fit_metric, fit_partial)
Calculate fit-index changes between adjacent models:
delta_fit <- tibble(
comparison = c("Metric - Configural", "Scalar - Metric", "Partial Scalar - Metric"),
delta_cfi = c(
fitmeasures(fit_metric, "cfi") - fitmeasures(fit_configural, "cfi"),
fitmeasures(fit_scalar, "cfi") - fitmeasures(fit_metric, "cfi"),
fitmeasures(fit_partial, "cfi") - fitmeasures(fit_metric, "cfi")
),
delta_rmsea = c(
fitmeasures(fit_metric, "rmsea") - fitmeasures(fit_configural, "rmsea"),
fitmeasures(fit_scalar, "rmsea") - fitmeasures(fit_metric, "rmsea"),
fitmeasures(fit_partial, "rmsea") - fitmeasures(fit_metric, "rmsea")
),
delta_srmr = c(
fitmeasures(fit_metric, "srmr") - fitmeasures(fit_configural, "srmr"),
fitmeasures(fit_scalar, "srmr") - fitmeasures(fit_metric, "srmr"),
fitmeasures(fit_partial, "srmr") - fitmeasures(fit_metric, "srmr")
)
) |>
mutate(across(starts_with("delta_"), \(x) round(x, 3)))
print(delta_fit)
Expected result
Partial scalar invariance should hold after jointly considering chi-square tests and fit-index changes. As a rule of thumb, evidence against invariance is often flagged by $\Delta$CFI < -0.01, $\Delta$RMSEA > 0.015, or $\Delta$SRMR > 0.01. By releasing items 3 and 5, you should see improved fit relative to full scalar constraints.
Compare all models
# summary table of fit indices
models <- list(
Configural = fit_configural,
Metric = fit_metric,
Scalar = fit_scalar,
"Partial Scalar" = fit_partial
)
fit_table <- map_dfr(names(models), function(name) {
fm <- fitmeasures(models[[name]], c("cfi", "rmsea", "srmr", "chisq", "df"))
tibble(
model = name,
cfi = round(fm["cfi"], 3),
rmsea = round(fm["rmsea"], 3),
srmr = round(fm["srmr"], 3),
chisq = round(fm["chisq"], 1),
df = fm["df"]
)
})
print(fit_table)
Connection to causal inference
Why measurement invariance matters for causal inference
If a scale measures the same construct differently across groups (non-invariance), then cross-group comparisons of treatment effects may be biased. In DAG terms, the measured outcome $Y^*$ is a function of both the true outcome $Y$ and group membership $G$:
$$Y^* = f(Y, G)$$
If $f$ differs by group (non-invariance), then even if the treatment has the same causal effect on $Y$ in both groups, the observed effect on $Y^*$ will differ. This is measurement bias in the causal inference framework.
Establishing measurement invariance before estimating treatment effects is therefore a prerequisite for valid cross-group comparisons.
Exercises
Lab diary
Complete at least two of the following exercises for your lab diary.
-
Group by sex. Repeat the invariance testing using
maleas the grouping variable instead ofgroup. Does the invariance pattern change? (Since the data-generating process only introduces non-invariance bygroup, invariance bymaleshould hold.) -
Two-factor model. Fit a two-factor CFA where items 1-3 load on factor 1 and items 4-6 load on factor 2. Compare fit with the one-factor model. Does the data support a two-factor structure?
-
Interpretation. In one paragraph, explain what you would conclude about using a single distress score to compare psychological wellbeing across two demographic groups, given that items 3 and 5 show non-invariance. What practical steps would you take before reporting cross-group differences?