ANOVA vs. mixed effects models

# The basics

This is a lightly edited version of a mailing list posting. Nobody corrected the basic logic… (It's worth also following the thread of the discussion on the list.)

Often it's argued that ANOVA is just regression; clearly this is not true when it's a repeated measures ANOVA, unless "regression" is interpreted broadly. I think Andrew Gelman argues this somewhere [probably here]. I don't see how to get aov to give me a (regression) formula, and lm doesn't fit stuff with an Error() term, but if it could, logically I would expect the formula to resemble closely the sort of thing you get with a mixed effects models.

The closest analogy I can find is that doing this…

> aov1 = aov(yield ~  N+P+K + Error(block), npk)
> summary(aov1)

Error: block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals  5    343      69

Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
N          1  189.3   189.3   11.82 0.0037 **
P          1    8.4     8.4    0.52 0.4800
K          1   95.2    95.2    5.95 0.0277 *
Residuals 15  240.2    16.0
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


[I believe the F's and p's here come from a linear construction of nested models (i.e., N versus intercept only; N+P versus N; plus N+P+K versus N+P).]

… is a bit like doing this with lmer…

> require(lme4)
> lmer0 = lmer(yield ~  1 + (1|block), npk)
> lmer1 = lmer(yield ~  N + (1|block), npk)
> lmer2 = lmer(yield ~  N+P + (1|block), npk)
> lmer3 = lmer(yield ~  N+P+K + (1|block), npk)
> anova(lmer0,lmer1)
Data: npk
Models:
lmer0: yield ~ 1 + (1 | block)
lmer1: yield ~ N + (1 | block)
Df   AIC   BIC logLik Chisq Chi Df Pr(>Chisq)
lmer0  2 157.5 159.8  -76.7
lmer1  3 151.5 155.1  -72.8  7.93      1     0.0049 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> anova(lmer1,lmer2)
Data: npk
Models:
lmer1: yield ~ N + (1 | block)
lmer2: yield ~ N + P + (1 | block)
Df   AIC   BIC logLik Chisq Chi Df Pr(>Chisq)
lmer1  3 151.5 155.1  -72.8
lmer2  4 153.0 157.8  -72.5  0.47      1       0.49
> anova(lmer2,lmer3)
Data: npk
Models:
lmer2: yield ~ N + P + (1 | block)
lmer3: yield ~ N + P + K + (1 | block)
Df   AIC   BIC logLik Chisq Chi Df Pr(>Chisq)
lmer2  4 153.0 157.8  -72.5
lmer3  5 149.0 154.9  -69.5  6.02      1      0.014 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Namely, fitting the models, and making comparisons using likelihood ratio tests. Okay, the LLR is asymptotically $\chi^2$ distributed with df the difference in number of parameters whereas F is a different distribution - I always get confused with the df and what bits of the residuals plug in where.

# On logistic, Poisson, etc

page revision: 9, last edited: 03 Apr 2008 12:17