Mixed effects models

Tip

Mixed effects models are a way of doing repeated measures regression.1 You use random effects to separate out between-subject variance you're not interested in so you can focus, e.g., on differences within subjects, between conditions. You can also easily eliminate (subject-invariant) variance in the degree to which items induce an effect, which is particularly useful if you have a load of different stimuli (sentences, faces) which superficially look different but have the same (you hope — with respect to your experimental hypothesis) underlying form.

More concretely, here's linear regression with one predictor

(1)
\begin{align} y_i = b_0 + b_1 x_{i1} + \epsilon_i \end{align}

where the i's are observations and $\epsilon \in N(0,\sigma_\epsilon^2)$.2

Here's the extension with a random intercept, $c_j$, per subject, j. The i's are still observations.

(2)
\begin{align} y_{ij} = b_0 + b_1 x_{ij1} + c_j + \epsilon_{ij} \end{align}

Note how the residual term is at the level of individual items whereas the random intercept is at the level of people (hence the term "multilevel model" often used). If you've got a heap of between-subject variation in intercept then that disappears into $c_j$, thus reducing $\epsilon_{ij}$ and possibly improving fit. The fixed effects are the b's. They're fixed in the sense that they're group means.

With an additional random slope, $c_{1j}$ , per subject, j:

(3)
\begin{align} y_{ij} = b_0 + (b_1 + c_{1j}) x_{ij1} + c_{0j} + \epsilon_{ij} \end{align}

You'd use this if there is between-subject variation in effect size and you're most interested only in whether there is indeed an effect. Funny Things can happen here if some folk have a positive slope and some folk negative.

One cool thing about mixed effects models is that you can extract estimates for these random effects; analogously to how you obtain residuals for standard linear regression.

Relationship with ANOVA

See here.

Which function to use

lmer in lme4.

Example from the helpfile (type ?lmer):

require(lme4)

(fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
(fm2 <- lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), sleepstudy))
anova(fm1, fm2)

Model fm1 regresses Reaction on Days, with a random intercept for Subject and slope for Days grouped by Subject. So you get the group mean slope for Days, the group average intercept, plus subject deviations from this mean.

Note: (1|A/B) expands to (1|A) + (1|A:B). Look at examples to see what this means.

See over here for further hints on specifying random effects.

Models with multinomial outcome variables may be fitted using the drm package.

Assessing fit

  • Compare nested models using the log-likelihood ratio test (confusingly, using the anova function). This feels new at first, but recall that in linear regression generally the test you do is comparing the null model (with only an intercept) with your particular model of interest. SPSS hides this from you. Sometimes you see $\Delta F$'s reported in SPSS-land; the associated p-values test the effect of individual predictors.
  • Look at the slopes. Are they significantly different to zero?

Things that will annoy you

P-values

lmer doesn't give you them for t-tests in gaussian GLMMs as the statisticians aren't sure how many degrees of freedom to use and Doug Bates refuses to copy what SAS does. Use the pvals.fnc function from the languageR package. It uses Markov Chain Monte Carlo (MCMC) methods.

Or you can use the log-likelihood ratio test. Compare the models with and without the predictor of interest.

Categorical predictors with more than 2 levels

By default they're given a dummy encoding. Use relevel to change the base category. You can also use your own encoding. See a good text on ANOVA for properties the encoding should ideally have.

Whether to include a random effect

  • Supposed structure of the data can be used to decide whether you include a random effect. (By analogy, one would use a repeated measures ANOVA for repeated measures designs even if the between-subject variation was minimal.)
  • If you want to make a test, then use the log likelihood ratio test (LRT), not MCMC-based methods as none of the MCMC samples will have zero variance.
  • A significant p from LRT means you should definitely leave the random effect in.
  • A non statistically significant p might mean you can leave it out.
  • Though you can’t really trust the LRT to have a $\chi^2$ distribution; you’d be better off using a bootstrapping approach.

Bootstrapping approach

An example from Faraway (2006):

require(faraway)
require(lme4)

smod    <- lmer(bright ~ 1 + (1|operator), pulp, method="ML")
nullmod <- lm(bright ~ 1, pulp)

lr = as.numeric(2*(logLik(smod)-logLik(nullmod)))
   # gives the likelihood ratio

lrstat <- numeric(1000)
for(i in 1:1000){
  y <- unlist(simulate(nullmod))
  bnull <- lm(y ~ 1)
  balt <- lmer(y ~ 1 + (1|operator),pulp,method="ML")
  lrstat[i] <- as.numeric(2*(logLik(balt)-logLik(bnull)))
}

# p-value for the test:

mean(lrstat > lr)

If you're using anything other than a gaussian GLM (e.g., binomial) then you're snookered — simulate won't work for you :(

Other resources

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License