ANOVA vs. mixed effects models

The basics

This is a lightly edited version of a mailing list posting. Nobody corrected the basic logic… (It's worth also following the thread of the discussion on the list.)

Often it's argued that ANOVA is just regression; clearly this is not true when it's a repeated measures ANOVA, unless "regression" is interpreted broadly. I think Andrew Gelman argues this somewhere [probably here]. I don't see how to get aov to give me a (regression) formula, and lm doesn't fit stuff with an Error() term, but if it could, logically I would expect the formula to resemble closely the sort of thing you get with a mixed effects models.

The closest analogy I can find is that doing this…

> aov1 = aov(yield ~  N+P+K + Error(block), npk)
> summary(aov1)

Error: block
           Df Sum Sq Mean Sq F value Pr(>F)
Residuals  5    343      69

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
N          1  189.3   189.3   11.82 0.0037 **
P          1    8.4     8.4    0.52 0.4800
K          1   95.2    95.2    5.95 0.0277 *
Residuals 15  240.2    16.0
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[I believe the F's and p's here come from a linear construction of nested models (i.e., N versus intercept only; N+P versus N; plus N+P+K versus N+P).]

… is a bit like doing this with lmer…

> require(lme4)
> lmer0 = lmer(yield ~  1 + (1|block), npk)
> lmer1 = lmer(yield ~  N + (1|block), npk)
> lmer2 = lmer(yield ~  N+P + (1|block), npk)
> lmer3 = lmer(yield ~  N+P+K + (1|block), npk)
> anova(lmer0,lmer1)
Data: npk
Models:
lmer0: yield ~ 1 + (1 | block)
lmer1: yield ~ N + (1 | block)
       Df   AIC   BIC logLik Chisq Chi Df Pr(>Chisq)
lmer0  2 157.5 159.8  -76.7
lmer1  3 151.5 155.1  -72.8  7.93      1     0.0049 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 > anova(lmer1,lmer2)
Data: npk
Models:
lmer1: yield ~ N + (1 | block)
lmer2: yield ~ N + P + (1 | block)
       Df   AIC   BIC logLik Chisq Chi Df Pr(>Chisq)
lmer1  3 151.5 155.1  -72.8
lmer2  4 153.0 157.8  -72.5  0.47      1       0.49
 > anova(lmer2,lmer3)
Data: npk
Models:
lmer2: yield ~ N + P + (1 | block)
lmer3: yield ~ N + P + K + (1 | block)
       Df   AIC   BIC logLik Chisq Chi Df Pr(>Chisq)
lmer2  4 153.0 157.8  -72.5
lmer3  5 149.0 154.9  -69.5  6.02      1      0.014 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Namely, fitting the models, and making comparisons using likelihood ratio tests. Okay, the LLR is asymptotically $\chi^2$ distributed with df the difference in number of parameters whereas F is a different distribution - I always get confused with the df and what bits of the residuals plug in where.

On balanced designs, etc

On logistic, Poisson, etc

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License