Lab 10: Measurement Invariance
Download the R script for this lab (right-click → Save As)
This lab introduces measurement invariance testing, a prerequisite for meaningful cross-group comparisons. You will conduct exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and multigroup CFA on simulated distress scale data with known non-invariance built in.
- How to assess factorability (KMO, Bartlett's test) and extract factors using EFA
- How to fit a CFA in lavaan and evaluate model fit (CFI, RMSEA, SRMR)
- How to test configural, metric, and scalar invariance across groups
- How to discover and release non-invariant items (partial invariance)
- Why measurement invariance matters for causal inference across groups
This lab aligns with Week 10's lecture on classical measurement theory from a causal perspective. The lecture covers measurement error in DAGs, EFA, CFA, and VanderWeele's model linking measurement to causal identification. This lab focuses on the practical skills: fitting and comparing models.
Setup
library(causalworkshop)
library(psych)
library(lavaan)
library(tidyverse)
If you haven't installed psych or lavaan, run install.packages(c("psych", "lavaan")) first.
Generate data
The simulate_measurement_items() function generates a 6-item psychological distress scale (modelled on the Kessler-6) with a known factor structure and built-in measurement non-invariance across two groups:
# simulate measurement data
d <- simulate_measurement_items(n = 2000, seed = 2026)
# check structure
dim(d)
names(d)
The six items correspond to:
item_1: nervousitem_2: hopelessitem_3: restlessitem_4: depresseditem_5: effort (everything was an effort)item_6: worthless
# check the true factor loadings
attr(d, "true_loadings")
# check the true intercepts (they differ for items 3 and 5 across groups)
attr(d, "true_intercepts_group0")
attr(d, "true_intercepts_group1")
Items 3 (restless) and 5 (effort) have different intercepts across groups. This means that at the same level of true distress, group 1 members score higher on these two items. If we ignore this, cross-group comparisons of mean distress scores will be biased.
Exploratory factor analysis (EFA)
Before fitting a CFA, we check whether the data are factorable and how many factors to extract.
Factorability
# select just the items
items <- d |> select(item_1:item_6)
# Kaiser-Meyer-Olkin measure of sampling adequacy
psych::KMO(items)
# Bartlett's test of sphericity
psych::cortest.bartlett(cor(items), n = nrow(items))
KMO values above 0.60 are considered adequate for factor analysis. Values above 0.80 are good. Bartlett's test should be significant (p < 0.05), indicating that correlations between items are sufficiently large for factor analysis.
Extract factors
# one-factor solution
fa_1 <- psych::fa(items, nfactors = 1, fm = "ml", rotate = "none")
print(fa_1$loadings, cutoff = 0.3)
# two-factor solution (for comparison)
fa_2 <- psych::fa(items, nfactors = 2, fm = "ml", rotate = "oblimin")
print(fa_2$loadings, cutoff = 0.3)
The one-factor solution should show all six items loading substantially on a single factor (consistent with the data-generating process). The two-factor solution should not improve fit meaningfully. Compare the proportion of variance explained and check whether the two-factor loadings make theoretical sense.
Confirmatory factor analysis (CFA)
Now we specify the one-factor model and fit it using lavaan:
# specify one-factor CFA model
model <- "
distress =~ item_1 + item_2 + item_3 + item_4 + item_5 + item_6
"
# fit CFA on full sample
fit_cfa <- cfa(model, data = d)
# summary with fit measures
summary(fit_cfa, fit.measures = TRUE, standardized = TRUE)
Evaluate model fit:
# extract key fit indices
fit_indices <- fitmeasures(fit_cfa, c("cfi", "rmsea", "srmr"))
print(round(fit_indices, 3))
- CFI (Comparative Fit Index): > 0.95 is excellent, > 0.90 is acceptable
- RMSEA (Root Mean Square Error of Approximation): < 0.06 is excellent, < 0.08 is acceptable
- SRMR (Standardised Root Mean Square Residual): < 0.08 is acceptable
Multigroup CFA: invariance testing
We now test whether the factor structure is equivalent across the two groups. Invariance testing proceeds through a hierarchy of progressively stricter constraints:
Step 1: Configural invariance
The same factor structure holds in both groups, but loadings and intercepts are freely estimated:
fit_configural <- cfa(model, data = d, group = "group")
summary(fit_configural, fit.measures = TRUE)
Step 2: Metric invariance
Factor loadings are constrained to be equal across groups:
fit_metric <- cfa(model, data = d, group = "group",
group.equal = "loadings")
summary(fit_metric, fit.measures = TRUE)
Compare configural and metric models:
lavTestLRT(fit_configural, fit_metric)
A non-significant chi-square difference (p > 0.05) means the metric model fits no worse than the configural model, so equal loadings across groups are supported. This is expected because the data-generating process uses the same loadings for both groups.
Step 3: Scalar invariance
Both loadings and intercepts are constrained to be equal:
fit_scalar <- cfa(model, data = d, group = "group",
group.equal = c("loadings", "intercepts"))
summary(fit_scalar, fit.measures = TRUE)
Compare metric and scalar models:
lavTestLRT(fit_metric, fit_scalar)
Full scalar invariance should fail (significant chi-square difference, p < 0.05). This is by design: items 3 and 5 have different intercepts across groups in the data-generating process. The model is telling us that something is wrong with assuming equal intercepts for all items.
Discover partial non-invariance
When full scalar invariance fails, we can release constraints on specific items to achieve partial scalar invariance. Based on modification indices or theory, we free the intercepts of items 3 and 5:
# partial scalar invariance: free intercepts for items 3 and 5
model_partial <- "
distress =~ item_1 + item_2 + item_3 + item_4 + item_5 + item_6
item_3 ~ c(i3a, i3b) * 1
item_5 ~ c(i5a, i5b) * 1
"
fit_partial <- cfa(model_partial, data = d, group = "group",
group.equal = c("loadings", "intercepts"))
summary(fit_partial, fit.measures = TRUE)
Compare partial scalar with metric invariance:
lavTestLRT(fit_metric, fit_partial)
Partial scalar invariance should hold (non-significant chi-square difference). By releasing the intercepts for items 3 and 5, we account for the known non-invariance in the data. This means we can compare latent factor means across groups, but only after accounting for differential item functioning on items 3 and 5.
Compare all models
# summary table of fit indices
models <- list(
Configural = fit_configural,
Metric = fit_metric,
Scalar = fit_scalar,
"Partial Scalar" = fit_partial
)
fit_table <- map_dfr(names(models), function(name) {
fm <- fitmeasures(models[[name]], c("cfi", "rmsea", "srmr", "chisq", "df"))
tibble(
model = name,
cfi = round(fm["cfi"], 3),
rmsea = round(fm["rmsea"], 3),
srmr = round(fm["srmr"], 3),
chisq = round(fm["chisq"], 1),
df = fm["df"]
)
})
print(fit_table)
Connection to causal inference
If a scale measures the same construct differently across groups (non-invariance), then cross-group comparisons of treatment effects may be biased. In DAG terms, the measured outcome is a function of both the true outcome and group membership :
If differs by group (non-invariance), then even if the treatment has the same causal effect on in both groups, the observed effect on will differ. This is measurement bias in the causal inference framework.
Establishing measurement invariance before estimating treatment effects is therefore a prerequisite for valid cross-group comparisons.
Exercises
-
Group by sex. Repeat the invariance testing using
maleas the grouping variable instead ofgroup. Does the invariance pattern change? (Since the data-generating process only introduces non-invariance bygroup, invariance bymaleshould hold.) -
Two-factor model. Fit a two-factor CFA where items 1-3 load on factor 1 and items 4-6 load on factor 2. Compare fit with the one-factor model. Does the data support a two-factor structure?
-
Interpretation. In one paragraph, explain what you would conclude about using a single distress score to compare psychological wellbeing across two demographic groups, given that items 3 and 5 show non-invariance. What practical steps would you take before reporting cross-group differences?