Generalized Linear Models (GLMs)
[FIXME: needs pointers to documentation + additional hints from here and there]
Assessing fit
Useful discussion of hypothesis testing over here on the R-list.
Proportion of deviance explained (logistic regression)
Takes a binomial glm fit object.
d2 <- function(model) { round(1-(model$deviance/model$null.deviance),4) }
($D^2$, analogue of $R^2$. I found it in Rossiter and Loza (2008).)
[FIXME: add the other pseudo-$R^2$ measures.]
Classification accuracy of binomial models
If you run a logistic regression in SPSS, one of the outputs is a "Classification Table" showing how accurately (binary) outcomes are predicted by the model. This is not particularly useful statistically, but for the record, here's how to do something similar in R. The code shows how to muck about with object internals (which you can look at using str(object)). It will also work for lmer logistic regressions.
Function definition:
binomial.accuracy <- function(model,criterion=0.5) {
if (class(model)[1] == "glm") {
if (model$family$family != "binomial") {
stop ("argument must be a binomial glm")
}
# here we have a binomial glm
depvar <- as.character(names(model$model)[1])
df <- as.data.frame(model$data)
} else {
if (class(model)[1] != "glmer") {
stop ("model must be glm or lmer")
}
if (model@family$family != "binomial") {
stop ("argument must be a binomial lmer")
}
# but here we have a binomial lmer
depvar <- as.character(formula(attr(model@frame, "terms")))[2]
df <- as.data.frame(model@frame)
# in the lmer case, the fitted() function gives beta, so convert probability
# to beta
criterion=log(criterion/(1-criterion))
}
# everything else is the same, whatever type of model
# score '1' if value is greater than criterion value, zero otherwise
predicted <- as.numeric(fitted(model) > criterion)
# things are in sequence, so it's OK to remove the NA's from the depvar
observed <- (df[,depvar])[!(is.na(df[,depvar]))]
# the observed score may be a factor, so convert to zeroes and ones
observed <- as.numeric(as.factor(observed))-1
# then score 1 for each time predictions match
matches <- as.numeric(predicted == observed)
# and report matching rate as percentage
round((mean(matches))*100,2)
}
Example usage:
# data frame with likely relation between predictor and (binary) outcome
testme <- data.frame(subject = gl(10,5,100), predictor=c(rnorm(50,25,15),rnorm(50,35,15)),outcome=c(rbinom(50,1,.2),rbinom(50,1,.8)))
# for glm
glm.bin <- glm(outcome~predictor,data=testme,family="binomial")
binomial.accuracy(glm.bin)
# for lmer
lmer.bin <- lmer(outcome~predictor+(1|subject),data=testme,family="binomial")
binomial.accuracy(lmer.bin)
Useful resources
- Analyzing land cover change with logistic regression in R by Rossiter and Loza (PDF)
page revision: 6, last edited: 14 Sep 2009 08:09