Generalized Linear Models (GLMs)

[FIXME: needs pointers to documentation + additional hints from here and there]

# Assessing fit

Useful discussion of hypothesis testing over here on the R-list.

## Proportion of deviance explained (logistic regression)

Takes a binomial glm fit object.

`d2 <- function(model) { round(1-(model$deviance/model$null.deviance),4) }`

($D^2$, analogue of $R^2$. I found it in Rossiter and Loza (2008).)

[FIXME: add the other pseudo-$R^2$ measures.]

## Classification accuracy of binomial models

If you run a logistic regression in SPSS, one of the outputs is a "Classification Table" showing how accurately (binary) outcomes are predicted by the model. **This is not particularly useful statistically**, but for the record, here's how to do something similar in R. The code shows how to muck about with object internals (which you can look at using `str(object)`). It will also work for lmer logistic regressions.

Function definition:

```
binomial.accuracy <- function(model,criterion=0.5) {
if (class(model)[1] == "glm") {
if (model$family$family != "binomial") {
stop ("argument must be a binomial glm")
}
# here we have a binomial glm
depvar <- as.character(names(model$model)[1])
df <- as.data.frame(model$data)
} else {
if (class(model)[1] != "glmer") {
stop ("model must be glm or lmer")
}
if (model@family$family != "binomial") {
stop ("argument must be a binomial lmer")
}
# but here we have a binomial lmer
depvar <- as.character(formula(attr(model@frame, "terms")))[2]
df <- as.data.frame(model@frame)
# in the lmer case, the fitted() function gives beta, so convert probability
# to beta
criterion=log(criterion/(1-criterion))
}
# everything else is the same, whatever type of model
# score '1' if value is greater than criterion value, zero otherwise
predicted <- as.numeric(fitted(model) > criterion)
# things are in sequence, so it's OK to remove the NA's from the depvar
observed <- (df[,depvar])[!(is.na(df[,depvar]))]
# the observed score may be a factor, so convert to zeroes and ones
observed <- as.numeric(as.factor(observed))-1
# then score 1 for each time predictions match
matches <- as.numeric(predicted == observed)
# and report matching rate as percentage
round((mean(matches))*100,2)
}
```

Example usage:

```
# data frame with likely relation between predictor and (binary) outcome
testme <- data.frame(subject = gl(10,5,100), predictor=c(rnorm(50,25,15),rnorm(50,35,15)),outcome=c(rbinom(50,1,.2),rbinom(50,1,.8)))
# for glm
glm.bin <- glm(outcome~predictor,data=testme,family="binomial")
binomial.accuracy(glm.bin)
# for lmer
lmer.bin <- lmer(outcome~predictor+(1|subject),data=testme,family="binomial")
binomial.accuracy(lmer.bin)
```

# Useful resources

- Analyzing land cover change with logistic regression in R by Rossiter and Loza (PDF)

page revision: 6, last edited: 14 Sep 2009 08:09