Psych 447: Mixed-effects/Multi-level models (continued), with excursion into metanalysis

Joseph Bulbulia

Show code

knitr::include_graphics("ml.png")

Show code

### Libraries
library("tidyverse")
library("ggplot2")
library("patchwork")
library("lubridate")
library("kableExtra")
library("gtsummary")
library("lubridate")
library("equatiomatic")
library("ggdag")
library("brms")
library("rstan")
library("rstanarm")
library("bayesplot")
library("easystats")
library("kableExtra")
library("broom")
library("tidybayes")
library("bmlm")
library("metafor")
library("scales")
library("kableExtra")
library("knitr")
if (!require(metafor)) {
  install.packages("metafor")
}
# rstan options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores ())
theme_set(theme_classic())

Resources

The following is useful tutorial for explaining what multi-level modelling is doing under the hood (only read the first five pages). The tutorial would be especially useful for experimentalists: here: (Sorensen and Vasishth 2015) The published version is less clear, but still useful (Vasishth et al. 2018)
Matti Vuorre provides and explanation for how to do meta-analysis using the BRMS package here.
In my view, the clearest account of multilevel modelling is Richard McElreath’s Statistical Rethinking(McElreath 2020).
Solomon Kurz’s BRMS companion to Statistical Rethinking is really great, and has lots excellent code for making illuminating graphs: (Kurz 2020)
There is no single multi-level model. To understand how model syntax, heat over the GLMM Faq page here; For model specification see especially this
To understand how to write multi-level models in BRMS, in addition to the Kurz book, see Paul Bürkner’s tutorial here

Objectives

To unify meta-analysis within the framework of multilevel modeling.
To extend multilevel modelling to included group-varying slopes (as well as intercepts).
To understand how to interpret results.

Terminological confusions

There is much debate about terminology in multi-level modelling. In the circles I travel, the term “mixed-effects model” is preferred. There’s some discussion about the terminological issues on the GLMMFAQ page here. In part, this is because there are many different multi-level models. In part, the confusion owes to the evolution of nomenclature among different groups. And in part, the confusion owes to people being genuinely confused. Our task here is to walk you through the steps of relatively simple multi-level models; these will be powerful assets to your work. Howeer, we want to do more than to teach you how to write and interpret multilevel models; we want to foster basic intuitions that will help you to understand what these models are doing under the hood.

Regarding nomenclature, we prefer the terms “group-varying intercept” and “group-varying slope” to the terms “random effect” and “random slope.” The term “random” tends to invite confusion – sampling theory requires “random” draws from various populations, etc. However, when we use the terms “group-varying” it will be important to remember that a “group” can include an individual who has been repeatedly measured over time.

Meta-analysis

Matti Vuorre provides an explanation for how to do metanalysis using the BRMS package here. See also, Matti’s brmstools package here. We will follow Matti’s approach.

The metafor package has data prepared for meta-analysis. We import the data:

Show code

# call data
# data("dat.bangertdrowns2004", package = "metafor")
# 
# # obtain first 15 studies and rename columns
# dat <- dat.bangertdrowns2004 %>%
#   mutate(study = paste0(author, " (", year, ")"), sei = sqrt(vi)) %>%
#   select(study, yi, sei) 
# 
# # Truncate
# dat <- dat[,-c(5:10)] %>% as_tibble()
# 
# # Remove commas in study names
# dat$study <-
#   str_replace(dat$study, ",", "")  
# 

dat <- escalc(measure="ZCOR", ri=ri, ni=ni, data=dat.molloy2014)
dat <- dat[,-c(5:10)] %>% as_tibble()
dat$yi <- as.numeric(dat$yi)
dat$vi <- as.numeric(dat$vi)
dat <- as_tibble(dat)
names(dat) <- c("study", "year", "ni", "ri", "yi", "vi")
dat$study <- paste0(dat$study, " (", dat$year, ")")
dat$sei <- as.numeric(sqrt(dat$vi))
dat$study <- as.character(dat$study)
# Major pain in reordering
dat$study[5] <- "Christensen et al. (1995)"
tmp <- dat[5,]
dat[5,] <- dat[4,]
dat[4,] <- tmp

Here, yi is the observed outcome in effect size units, and sei is the observed sampling standard deviation (\(\sigma_i\)). (You can read more about the metafor package here).

Model

In metanalysis we assume:

\[y_i \sim N(\theta_i, \sigma_i^2)\] Where \(\theta_i\) is the unknown true effect-size corresponding to \(y_i\), where \(\theta\) has a standard deviation that is equal to the obeserved standard error \(\sigma_i\).

In metanalysis, we assume that there is an underlying parameter which identifies the effect size of the hypothetical population of studies from which the sample of studies collected was drawn. Recall, a parameter is an abstract concept – it is the property of population that we never observe. As such,

\[\theta_i \sim N(\mu, \tau^2)\]

Where \(\mu\) is the population average effect size, and \(\tau^2\), the population variance, or between-study heterogeneity.

We can write the meta-analytic model as a random intercept model:

\[y_i \sim N(\mu + \theta_i, \sigma_i^2)\\ \sigma_i \sim N(0,\tau^2)\]

Code for meta-analytic model

The code for a simple meta-analytic model in BRMS is as follows:

m_meta <- brm(
  yi | se(sei) ~ 1 + (1 | study),
  data = dat,
  control = list(adapt_delta = .99),
  file = here::here("models", "metanalyis")
)

Show code

options(width = 120)

# Shows the prior
summary(m_meta, prior = TRUE)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: yi | se(sei) ~ 1 + (1 | study) 
   Data: dat (Number of observations: 16) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Priors: 
Intercept ~ student_t(3, 0.2, 2.5)
sd ~ student_t(3, 0, 2.5)

Group-Level Effects: 
~study (Number of levels: 16) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.10      0.04     0.04     0.18 1.01     1045     1100

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.15      0.04     0.08     0.22 1.00     1491     2200

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.00      0.00     0.00     0.00 1.00     4000     4000

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Test hypothesis of a greater than .2 effect size:

Probably that the effect size is great than .2 is .08 (or 8%)

Show code

options(width = 120)
hypothesis(m_meta, "Intercept > 0.2")

Hypothesis Tests for class b:
             Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (Intercept)-(0.2) > 0    -0.05      0.04    -0.11     0.01       0.09      0.08     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Forest plot

The forest plot graphs the posterior distribution of each estimated \(\theta_i\).

The graph shows the meta-analytic effect size \(\mu\) in the bottom row.

Here is Matti Vuorre’s code for making a graph. Note that Matti indicates the observed effect size with the little ‘o.’ A metanalysis uses partial pooling to shrink estimates towards to the population estimate (\(\mu\)), which is the weighted average of the group means.

Show code

library(tidybayes)
library(ggdist)
# Study-specific effects are deviations + average
out_r <- spread_draws(m_meta, r_study[study,term], b_Intercept) %>% 
  mutate(b_Intercept = r_study + b_Intercept) 
# Average effect
out_f <- spread_draws(m_meta, b_Intercept) %>% 
  mutate(study = "Average")

out_f

# A tibble: 4,000 x 5
   .chain .iteration .draw b_Intercept study  
    <int>      <int> <int>       <dbl> <chr>  
 1      1          1     1      0.183  Average
 2      1          2     2      0.182  Average
 3      1          3     3      0.123  Average
 4      1          4     4      0.110  Average
 5      1          5     5      0.0619 Average
 6      1          6     6      0.0612 Average
 7      1          7     7      0.0742 Average
 8      1          8     8      0.0730 Average
 9      1          9     9      0.115  Average
10      1         10    10      0.0639 Average
# … with 3,990 more rows

Show code

# Combine average and study-specific effects' data frames
out_all <- bind_rows(out_r, out_f) %>% 
  ungroup() %>%
  # Ensure that Average effect is on the bottom of the forest plot
  mutate(study = fct_relevel(study, "Average")) %>% 
  # tidybayes garbles names so fix here
  mutate(study = str_replace_all(study, "\\.", " "))


# Data frame of summary numbers
out_all_sum <- group_by(out_all, study) %>% 
  mean_qi(b_Intercept)
# Draw plot
out_all %>%   
  ggplot(aes(b_Intercept, study)) +
  # Zero!
  geom_vline(xintercept = 0, size = .25, lty = 2) +
  stat_halfeye(.width = c(.8, .95), fill = "dodgerblue") +
  # Add text labels
  geom_text(
    data = mutate_if(out_all_sum, is.numeric, round, 2),
    aes(label = str_glue("{b_Intercept} [{.lower}, {.upper}]"), x = 1.75),
    hjust = "inward"
  ) +
  # Observed as empty points
  geom_point(
    data = dat %>% mutate(study = str_replace_all(study, "\\.", " ")), 
    aes(x=yi), position = position_nudge(y = -.2), shape = 1 
  )

Group varying slopes

Last week, we considered how intercepts may vary between groups, and how modelling this variation enables a model to efficiently pool information within and between clusters. In line with this approach, we have just demonstrated how meta-analysis turns on the princible of modelling group-level intercepts. Next we consider how to model variability in slopes of regression coefficients across groups. This model group-varying slopes and intercepts together requires estimating both the slopes and intercepts as drawing from a multivariate distribution in which the two paramters (intercepts and slopes) may be correlated. In a varying intercept/varying slope multi-level model, then, we estimate the covariance of group-level intercepts and group-level slopes in the same model in which we estimate the other parameters (including, of course, the group-level intercept variances and group-level slope variances, else what would there be to co-vary).

\[\begin{pmatrix} \text{variances of the intercepts} & \text{co-variances of the intercepts and slopes} \\ \text{co-variances of the intercepts and slopes} & \text{variances of the slopes} & \end{pmatrix}\]

Using mathematical notation for the matrix:

\[\begin{pmatrix} \sigma_\alpha^2 & \sigma_\alpha\sigma_\beta \rho \\ \sigma_\alpha\sigma_\beta \rho & \sigma_\beta^2 & \end{pmatrix}\]

The varianceses for the intercepts and slopes are along the diagonal, and their co-variances are along the off-diagonal. The co-variances are the products of the standard deviations \(\times\) their correlation

Here we folllow a simulation from Richard McElreath’s Statistical Rethinking, which simulates a dataset in which the average wait times for a coffee in the morning at a cafe are correlated with the average wait times for a coffee in the afternoon at the same cafe.

First the basic set up of the code:

a <- 3.5 # average morning wait time
b <- (-1) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- (-0.7) #correlation between intercepts and slopes”

# mean 

Mu <- c( a , b )

# Covariance of a and b
cov_ab <- sigma_a * sigma_b * rho

# Create a matrix with variances along the diagnols and the covariances on the off diagnols: 

Sigma <- matrix( c(sigma_a^2,cov_ab,cov_ab,sigma_b^2) , ncol=2 )


## Note that we can treat the correlations and the covariances separately, which is useful for assigning priors. We won't need to assign priors today, but it is useful to have the code.
sigmas <- c(sigma_a,sigma_b) # standard deviations
Rho <- matrix( c(1,rho,rho,1) , nrow=2 ) # correlation matrix

# nmatrix multiply to get covariance matrix -- this is the same Sigma matrix as above:

Sigma <- diag(sigmas) %*% Rho %*% diag(sigmas)

Next we simulate the data

Simulate cafes and their average properties:

Show code

n_cafes <- 20 

# For generating multivariate normal distributions
library(MASS)
set.seed(5) # used to replicate example

# Matrix with 20 rows and 2 columns

vary_effects <- mvrnorm( n_cafes , Mu , Sigma )
head(vary_effects)

         [,1]       [,2]
[1,] 4.223962 -1.6093565
[2,] 2.010498 -0.7517704
[3,] 4.565811 -1.9482646
[4,] 3.343635 -1.1926539
[5,] 1.700971 -0.5855618
[6,] 4.134373 -1.1444539

Each row is a cafe. The first column contains the intercepts and the second column contains the slopes. We can graph the correlation

Show code

library(rethinking)
a_cafe <- vary_effects[,1] # intercept
b_cafe <- vary_effects[,2] # slope


plot( a_cafe , 
      b_cafe , 
      col=rangi2 ,
      xlab= "intercepts (a_cafe)" , 
      ylab = "slopes (b_cafe)" 
      )

# overlay population distribution
library(ellipse)
for ( l in c(0.1,0.3,0.5,0.8,0.99) )
lines(
  ellipse(Sigma,
              centre = Mu,
              level=l),
      col=col.alpha("black",0.2)
      )

Next we simulate sampling from these cafes


set.seed(22)
n_visits <- 10

afternoon <- rep(0:1, n_visits * n_cafes/2)

cafe_id <- rep( 1:n_cafes , each = n_visits )

mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon

sigma <- 0.5 # std dev within cafes

wait <- rnorm( n_visits * n_cafes , mu , sigma )

d <- data.frame( cafe=cafe_id , afternoon = afternoon , wait = wait )

head(d)

  cafe afternoon     wait
1    1         0 3.967893
2    1         1 3.857198
3    1         0 4.727875
4    1         1 2.761013
5    1         0 4.119483
6    1         1 3.543652

We can write out the math:

\[wait_i \sim N(\mu_i, \sigma) \\ \mu_i = \alpha_{\text{cafe}_i} + \beta_{\text{cafe}_i}A_i\\ \begin{bmatrix} \alpha_{\text{cafe}} \\ \beta_{\text{cafe}} \end{bmatrix} \sim \text{MVNormal} \begin{pmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix} \boldsymbol{S}, \end{pmatrix} \\ \boldsymbol{S} = \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta}\end{pmatrix} \boldsymbol{R} \begin{pmatrix} \sigma_{\alpha} & 0 \\ 0 & \sigma_{\beta} \end{pmatrix}\\ \boldsymbol{R} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\]

Each cafe has an intercept \(\alpha_i\) and a slope \(\beta_i\) that samples from a two dimensional Gaussian distribution with means \(\alpha\) and \(\beta\) and a covariance matrix \(\boldsymbol{S}\). The \(\boldsymbol{S}\) matrix factors into the a matrix for the separate standard deviations of \(\alpha\) and \(\beta\), \(\sigma_{alpha}\) and \(\sigma_{beta}\) and the correlation matrix \(\boldsymbol{R}\)

Recall that we assume our paramters are sampling from distributions. In bayesian estimation we can be explicit about this:

\[ \alpha \sim N(5,2) \\ \beta \sim N(-1,5) \\ \sigma \sim exp(1) \\ \sigma_a \sim exp(1) \\ \sigma_b \sim exp(1) \\ \boldsymbol{R} \sim LKJcorr(1) \\ \]

The LKJcorr prior assumes:

\[\boldsymbol{R} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}\]

Here are some correlations the LKJ distributions

LJK = 1

Show code

R <- rlkjcorr( 1e4 , K=2 , eta=1)
dens( R[,1,2] , xlab= "correlation" )

LJK = 2

Show code

R <- rlkjcorr( 1e4 , K=2 , eta=2)
dens( R[,1,2] , xlab= "correlation" )

LJK = 4

Show code

library(rethinking)
R <- rlkjcorr( 1e4 , K=2 , eta=4)
dens( R[,1,2] , xlab= "correlation" )

Code for varying intercept/slope model

We can estimate the model (at last):

m_vslopes  <-
  brm(
    wait ~ 1 + afternoon + (1 + afternoon | cafe),
    prior = c(
      prior(normal(5, 2), class = Intercept),
      prior(normal(-1, 10), class = b),
      prior(exponential(1), class = sd),
      prior(exponential(1), class = sigma),
      prior(lkj(2), class = cor)
    ),
    file = here::here("models", "multi-level-var-slopes"),
    data = d,
    family = gaussian
  )

Show code

options(width = 120)
# Shows the prior
summary(m_vslopes, prior = TRUE)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: wait ~ 1 + afternoon + (1 + afternoon | cafe) 
   Data: d (Number of observations: 200) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Priors: 
b ~ normal(-1, 10)
Intercept ~ normal(5, 2)
L ~ lkj_corr_cholesky(2)
sd ~ exponential(1)
sigma ~ exponential(1)

Group-Level Effects: 
~cafe (Number of levels: 20) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                1.00      0.17     0.72     1.37 1.00     1166     1826
sd(afternoon)                0.36      0.12     0.13     0.60 1.00     1366     1072
cor(Intercept,afternoon)    -0.48      0.23    -0.84     0.04 1.00     2972     2553

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.70      0.23     3.23     4.15 1.00      596     1013
afternoon    -1.19      0.11    -1.41    -0.97 1.00     2031     2542

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.52      0.03     0.46     0.58 1.00     3389     2711

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Show code

sjPlot::tab_model(m_vslopes)

	wait
Predictors	Estimates	CI (95%)
Intercept	3.70	3.23 – 4.15
afternoon	-1.18	-1.41 – -0.97
Random Effects
σ²	0.27
τ₀₀ _cafe	1.02
τ₁₁ _{cafe.afternoon}	0.14
ρ₀₁
ρ₀₁
ICC	0.77
N _cafe	20
Observations	200
Marginal R² / Conditional R²	0.254 / 0.810

p1 <- brms::mcmc_plot(m_vslopes, 
               type = "areas",
               prob = .89)
p1

Compare to a model with no prior:

m_vslopes_np  <-
  brm(
    wait ~ 1 + afternoon + (1 + afternoon | cafe),
    file = here::here("models", "multi-level-var-slopes-no-prior"),
    data = d,
    family = gaussian
  )

Show code

options(width = 120)
# Shows the prior
summary(m_vslopes_np, prior = TRUE)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: wait ~ 1 + afternoon + (1 + afternoon | cafe) 
   Data: d (Number of observations: 200) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Priors: 
Intercept ~ student_t(3, 3, 2.5)
L ~ lkj_corr_cholesky(1)
sd ~ student_t(3, 0, 2.5)
sigma ~ student_t(3, 0, 2.5)

Group-Level Effects: 
~cafe (Number of levels: 20) 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                1.05      0.20     0.75     1.52 1.00     1004     1757
sd(afternoon)                0.38      0.13     0.14     0.65 1.00     1192     1507
cor(Intercept,afternoon)    -0.57      0.23    -0.92    -0.04 1.00     2426     1801

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.68      0.24     3.20     4.15 1.01      572      669
afternoon    -1.18      0.11    -1.40    -0.96 1.00     1657     2478

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.52      0.03     0.47     0.58 1.00     2434     2753

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

p2 <- brms::mcmc_plot(m_vslopes_np, 
               type = "areas",
               prob = .89)

# Comparison graph
p1 + p2 + plot_annotation(title = "Priors (a) vs Default Priors (b)", tag_levels = "a")

Just the syntax please

See: glmmFAQ

E.g. intercept varying among crossed grouping effects (e.g. site, year)

lme4::lmer(y ~ intercept + beta1 + beta2  + (1|groupA) + (1|groupB))

Correlated intercepts and slopes varying between crossed grouping effects

lme4::lmer(y ~ intercept + beta1 + beta2  + (beta1|groupA) + (beta2|groupB))

Recall that we have the equatiomatic package:

Show code

library(lme4)
library(equatiomatic)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
equatiomatic::extract_eq(fm1)

\[ \begin{aligned} \operatorname{Reaction}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{Days}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\mu_{\alpha_{j}} \\ &\mu_{\beta_{1j}} \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for Subject j = 1,} \dots \text{,J} \end{aligned} \]

Within and between group estimation

Consider the following example from the parameters package here

And see references on that page.

We first simulate data on the relationship between typing speed and typing accuracy.

Btw, this is an interesting method for simulation, which differs frm the approach above: there is no corrlation of intercept and slope. We can discuss this approach in lab, but I wanted to flag that there are many ways to simulate data.

Show code

library(ggplot2)
library(parameters)
library(dplyr)
library(see)

set.seed(123)
n <- 5
b <- seq(1, 1.5, length.out = 5)
x <- seq(2, 2 * n, 2)

di <- do.call(rbind, lapply(1:n, function(i) {
  data.frame(
    x = seq(1, n, by = .2),
    y = 2 * x[i] + b[i] * seq(1, n, by = .2) + rnorm(21),
    grp = as.factor(2 * i)
  )
}))

dp <- di %>%
  group_by(grp) %>%
  mutate(x = rev(15 - (x + 1.5 * as.numeric(grp)))) %>%
  ungroup()

labs <- c("very slow", "slow", "average", "fast", "very fast")
levels(dp$grp) <- rev(labs)

# Within and between group demeaning. 
dp <- cbind(dp, 
           parameters::demean(dp, 
                  c("x", "y"),
                  group = "grp")
           )
# I center the between group variable to recover an interpretable intercept

d <- dp %>%
  dplyr::mutate(x_c = scale(x, center = TRUE, scale = FALSE))%>%
  dplyr::mutate(x_between_c = scale(x_between, center = TRUE, scale = FALSE))

Graph the response at the individual level, ignoring group-level structure: typing faster predicts fewer errors:

Show code

ggplot2::ggplot(data = d,
                aes(x = x,
                    y = y)) +
  geom_point() +
  geom_smooth(method = "lm") +
  xlab("speed") +
  ylab("errors")

Here’s an ordinary linear model

Show code

m00 <- lm(y ~ x_c, data = d)
parameters::model_parameters(m00)

Parameter   | Coefficient |   SE |         95% CI | t(103) |      p
-------------------------------------------------------------------
(Intercept) |       15.82 | 0.44 | [14.95, 16.69] |  36.09 | < .001
x_c         |       -1.92 | 0.18 | [-2.27, -1.56] | -10.69 | < .001

However when we assess the relationship within a group – one model for each group – we see the opposite response

Show code

ggplot2::ggplot(data = d,
                aes(x = x,
                    y = y,
                    colour = grp)) +
  geom_point() +
  geom_smooth(method = "lm")

We can combine the graphs:

Show code

ggplot2::ggplot(data = d,
                aes(x = x,
                    y = y)) +
  geom_smooth(method = "lm") +
  geom_smooth(aes(x = x,
                  y = y,
                  colour = grp),
              method = "lm")  +
  geom_point(aes(x = x,
                 y = y,
                 colour = grp)) + 
  xlab("speed")  + 
  ylab("accuracy") + 
  scale_colour_viridis_d(option="magma") + theme_classic()

We can use the “demeaned indicators” to produce a model that distinguishes within/beween

head(d)

    x        y       grp x_between y_between x_within
1 8.5 4.439524 very fast      10.5  7.084031     -2.0
2 8.7 4.969823 very fast      10.5  7.084031     -1.8
3 8.9 6.958708 very fast      10.5  7.084031     -1.6
4 9.1 5.670508 very fast      10.5  7.084031     -1.4
5 9.3 5.929288 very fast      10.5  7.084031     -1.2
6 9.5 7.715065 very fast      10.5  7.084031     -1.0
    y_within x_c x_between_c
1 -2.6445067 1.0           3
2 -2.1142086 1.2           3
3 -0.1253227 1.4           3
4 -1.4135227 1.6           3
5 -1.1547433 1.8           3
6  0.6310339 2.0           3

m0 <- lme4::lmer(y ~ x_between_c + x_within + (1|grp), data = d)

parameters::model_parameters(m0)

# Fixed Effects

Parameter   | Coefficient |   SE |         95% CI | t(100) |      p
-------------------------------------------------------------------
(Intercept) |       15.82 | 0.09 | [15.64, 15.99] | 175.85 | < .001
x_between_c |       -2.93 | 0.04 | [-3.02, -2.85] | -69.22 | < .001
x_within    |        1.20 | 0.07 | [ 1.06,  1.35] |  16.22 | < .001

# Random Effects

Parameter           | Coefficient
---------------------------------
SD (Intercept: grp) |        0.00
SD (Residual)       |        0.96

#equatiomatic::extract_eq(m1)

Show code

plot(
  ggeffects::ggpredict(m0,
                       terms = c("grp"),
                       type = "random"), 
  add.data = TRUE
)

We can estimate whether the within-group speedslopes vary by groups:

head(d)

    x        y       grp x_between y_between x_within
1 8.5 4.439524 very fast      10.5  7.084031     -2.0
2 8.7 4.969823 very fast      10.5  7.084031     -1.8
3 8.9 6.958708 very fast      10.5  7.084031     -1.6
4 9.1 5.670508 very fast      10.5  7.084031     -1.4
5 9.3 5.929288 very fast      10.5  7.084031     -1.2
6 9.5 7.715065 very fast      10.5  7.084031     -1.0
    y_within x_c x_between_c
1 -2.6445067 1.0           3
2 -2.1142086 1.2           3
3 -0.1253227 1.4           3
4 -1.4135227 1.6           3
5 -1.1547433 1.8           3
6  0.6310339 2.0           3

m1 <- brms::brm(y ~ x_between_c + x_within +  (1 + x_within| grp), 
                data = d,
                file = here::here("models","ml_within"))

Show code

options(width = 120)
# Shows the prior
parameters::model_parameters(m1, prior = TRUE, effects = "all")

# Fixed effects

Parameter   | Median |         89% CI |   pd | % in ROPE |  Rhat |    ESS |                            Prior
------------------------------------------------------------------------------------------------------------
(Intercept) |  15.81 | [15.59, 16.03] | 100% |        0% | 1.003 | 651.00 | Student_t (df=3) (15.70 +- 8.80)
x_between_c |  -2.94 | [-3.14, -2.83] | 100% |        0% | 1.048 |  79.00 |                          Uniform
x_within    |   1.22 | [ 0.99,  1.46] | 100% |     0.40% | 1.009 | 593.00 |                          Uniform

# Fixed effects sigma

Parameter | Median |       89% CI |   pd | % in ROPE |  Rhat |    ESS |                        Prior
----------------------------------------------------------------------------------------------------
sigma     |   0.92 | [0.82, 1.02] | 100% |        0% | 1.006 | 870.00 | Student_t (df=3) (0 +- 8.80)

# Random effects SD/Cor: grp

Parameter            | Median |        89% CI |     pd | % in ROPE |  Rhat |    ESS | Prior
-------------------------------------------------------------------------------------------
(Intercept)          |   0.18 | [ 0.00, 0.68] |   100% |    88.25% | 1.092 |  35.00 |      
x_within             |   0.21 | [ 0.00, 0.46] |   100% |    96.17% | 1.006 | 680.00 |      
Intercept ~ x_within |  -0.21 | [-1.00, 0.72] | 59.38% |    60.95% | 1.004 | 621.00 |

color_scheme_set("red")
p_m1 <- bayesplot::mcmc_intervals(m1, 
              regex_pars  = c("b_x_between","b_x_within"))
p_m1

Note that we are adjusting for whether within-group speed differs by group, and if so weather these differences are correlated with the group-level intercepts. My head is exploding here…

In any case, graphing these parameters shows nothing is going on, which is to be expected because the simulation did not engineer anything to be going on:

Show code

plot(
  ggeffects::ggpredict(m1, effects= c("b_x_between","b_x_within"),type = "random")[[1]], 
  add.data = TRUE
)

Show code

plot(
  ggeffects::ggpredict(m1, effects = c("b_x_between","b_x_within"),type = "random")[[2]], 
  add.data = TRUE
)

color_scheme_set("red")
p_m1 <- bayesplot::mcmc_intervals(m1, 
              regex_pars  = c("r_"))
p_m1

What if we were to ignored between and within and between indicators and run an ordinary multi-level model? We would then write:


# ordinary multi-level model wih group-varying intercept and slope
m2 <- brms::brm(y ~ x_c +  (1 + x_c| grp), 
                data = d,
                file = here::here("models","ml_no_within"))

Which produces:

Show code

options(width = 120)
# Shows the prior
summary(m2, prior = TRUE)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x_c + (1 + x_c | grp) 
   Data: d (Number of observations: 105) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Priors: 
Intercept ~ student_t(3, 15.7, 8)
L ~ lkj_corr_cholesky(1)
sd ~ student_t(3, 0, 8)
sigma ~ student_t(3, 0, 8)

Group-Level Effects: 
~grp (Number of levels: 5) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         10.53      3.51     5.87    19.21 1.00     1500     1796
sd(x_c)                0.34      0.25     0.05     0.91 1.00      900      959
cor(Intercept,x_c)     0.56      0.37    -0.37     0.99 1.00     2075     2251

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    15.91      4.05     7.95    24.08 1.00     1212     1704
x_c           1.19      0.20     0.82     1.54 1.00     1714     1519

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.01      0.07     0.87     1.16 1.00     3706     2300

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Note that the multi-level model pooled information efficiently, and we have reversed the speed\(\rightarrow\)accuracy illusion that the ordinary regression model gave us above.

Plotting the results tells the story:

color_scheme_set("red")
p_m2 <- bayesplot::mcmc_intervals(m2, 
              regex_pars  = c("r_","b_"))
p_m2

Although the “average” response falls around 15 errors at the centered speed, there’s huge group level variability. In the very fast intercept there are many few errors at this level than there are at the very slow intercept, and we find a gradient in between.

We can generate expectation plots for the groups and this tells the story:

Show code

library(brms)
out_f <- conditional_effects(
  m2,
  "x_c",
    point_args = list(alpha = 0.1)
  )[[1]]
out_r <- conditional_effects(
  m2,
  "x_c", 
  conditions = distinct(d, grp),
  re_formula = NULL,
   points = TRUE,
  point_args = list(alpha = 0.1)
  )[[1]]
out_r %>% 
  ggplot(aes(x_c, estimate__)) +
  geom_ribbon(
    data = out_f,
    aes(ymin = lower__, ymax = upper__),
    alpha = .1
  ) +
  geom_line(data = out_f, size = 2) +
  geom_line(aes(group = grp), alpha =.5) + 
  xlab("speed") + 
  ylab("error") +
  ggtitle("Typing speed predicts fewer errors but faster typers make fewer errors")

Another approach to graphing;

Show code

# uncertainty of the posterior distribution
pp <- posterior_predict(m2)

# graph
plot(
  ggeffects::ggpredict(m2, , 
                       terms = c("x_c","grp"),
                       type = "random"), 
  add.data = pp
)

Show code


# this also
sl <- ggeffects::ggpredict(
  m2,
  ,
  terms = c("x_c", "grp"),
  type = "random",
  add.data = pp
)

plot(sl, add.data = TRUE, ci = TRUE)

Comment on group mean centering: use with caution.

I’m not sure I’m sold yet on within and between group “demeaning” because it would appear to lose the benefits of pooling and shrinkage that we get from multi-level models (the benefits of “regularisation”). I imagine there will be increasing discussion of this approach in the years ahead. For this reason I thought you should know about it.

Meantime, some quotations about group-mean centering:

Unfortunately, group mean centering changes the meaning of the entire regression model in a complicated way. If we use grand mean centering, we get different regression slopes for variables that are involved in an interaction, and different variance components, but we have an equivalent model, with identical deviance and residual errors. Another way to describe this situation is to state that we have specified a different parameterization for our model, meaning that we have essentially the same model, but transformed in a way that makes it easier to interpret. Using straightforward algebra, we can transform the grand mean centered estimates back to the values we would have found by analyzing the raw scores. Group mean centering, on the contrary, is not a simple reparameterization of a model, but a completely different model. We will find a different deviance, and transforming the estimated parameters back to the corresponding raw score estimates is not possible. The reason is that we have subtracted not one single value, but a collection of different values from the raw scores.

Excerpt From: Joop J. Hox, Mirjam Moerbeek & Rens van de Schoot. “Multilevel Analysis.” Apple Books.

“Group mean centering of an explanatory variable implies less effective modeling than using raw scores, simply because all information concerning differences between groups is removed from that variable. It would seem reasonable to restore this information by adding the aggregated group mean again as an additional group-level explanatory variable. But this adds extra information about the group structure, which is not present in the raw scores, and therefore we will obtain a model that fits better than the raw score model. If the effect of group mean centering is analyzed in detail, it appears that group mean centering is an implicit, complicated way to change the meaning of individual- and group-level effects, including the interpretation of cross-level interactions. My recommendation is that novice users of multilevel models should apply a great deal of caution in using group mean centering. If the theory behind the modeling strongly suggests ‘frog pond’ or similar effects (see Enders & Tofighi, 2007; Hofmann & Gavin, 1998), group centering is an attractive approach, but the users should be keenly aware of the issues they are facing in interpretation.”

Excerpt From: Joop J. Hox, Mirjam Moerbeek & Rens van de Schoot. “Multilevel Analysis.” Apple Books.

Kurz, A. Solomon. 2020. Zenodo. https://doi.org/10.5281/zenodo.4080013.

McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. CRC press.

Sorensen, Tanner, and Shravan Vasishth. 2015. “Bayesian Linear Mixed Models Using Stan: A Tutorial for Psychologists, Linguists, and Cognitive Scientists.” arXiv Preprint arXiv:1506.06201.

Vasishth, Shravan, Bruno Nicenboim, Mary E Beckman, Fangfang Li, and Eun Jong Kong. 2018. “Bayesian Data Analysis in the Phonetic Sciences: A Tutorial Introduction.” Journal of Phonetics 71: 147–61.

Mixed-effects/Multi-level models (continued), with excursion into metanalysis

Resources

Objectives

Terminological confusions

Meta-analysis

Model

Code for meta-analytic model

Test hypothesis of a greater than .2 effect size:

Forest plot

Group varying slopes

Code for varying intercept/slope model

Just the syntax please

Within and between group estimation

Comment on group mean centering: use with caution.

References

Corrections

Reuse

Citation