Exploring your data

The basics

You can get quite far with hist, mean, sd, summary, boxplot.

Standard error

Here's one, found somewhere online1:

st.err <- function(x)
  {
    nn <- function(x)
      {
        length(na.omit(x))
      }
    if (is.matrix(x))
      n <- apply(x, 2, nn)
    else if (is.data.frame(x))
      n <- sapply(x, nn)
    else if (is.list(x))
      cat("Hmm, this won't work with a list...\n")
    else if (is.vector(x))
      n <- nn(x)
    sd <- sd(x, na.rm = TRUE)
    sd/sqrt(n)
  }

Formatting things

The rround function forces R to round x to d decimal places:

rround = function(x,d) format(round(x, d), nsmall = d)

An example use of this is a function which takes a vector, x, and gives you back a string of its mean and sd in parentheses:

meansd = function(x) paste(rround(mean(x),2)," (", rround(sd(x),2),")", sep="")

Note also the helpful paste function there. Can come in handy for combining factors and other useful things.

Tables

Suppose you have data like this:

ID score1 score2 sex hand
1 5.19 4.09 M L
2 1.66 14.91 M R
3 8.98 14.23 F L
4 4.09 5.35 F R
5 2.67 13.38 M L
6 1.62 5.52 M R

And want a table of means and sds like:

sex F F M M
hand L R L R
score1 5.07 (2.62) 4.91 (3.12) 4.68 (3.30) 4.90 (2.82)
score2 9.99 (2.57) 9.72 (2.70) 10.02 (3.58) 10.55 (2.90)

Handy way to do it (the beef here is summarise, llist, merge, and t):

require(Hmisc)

rround = function(x,d) format(round(x, d), nsmall = d) 
meansd = function(x) paste(rround(mean(x),2)," (", rround(sd(x),2),")", sep="")

d = data.frame(ID = 1:100
              , score1 = rnorm(100,5,3)
              , score2 = rnorm(100,10,3)
              , sex = rep(c("M","M","F","F"),25)
              , hand = rep(c("L","R"),50))

t1 = with(d, summarize(score1, llist(sex,hand), meansd))
t2 = with(d, summarize(score2, llist(sex,hand), meansd))
t3 = merge(t1,t2)
t(t3)

The t function is matrix tranpose. Have a nosy at t3 versus t(t3) to make sense of what it's doing.

Graphs

Means with error bars

Simple example of a barchart with error bars:

barcharteg.png
bardata <- 1:3 # create some data
error <- c(.2,.4,.7) # and error
bar <- barplot(bardata, ylim = c(0,4), col="lightblue", names.arg=c("A","B","C"))
arrows(bar, bardata + error, bar, bardata - error,
       length = 0.10, # width of the arrowhead
       angle = 90,
       code = 3
       )

(Remember to describe what kind of bars they are.)

Another graph

errorbarline.png
require(gplots)

n = 60
g1 = rnorm(n/2, 100, 40)
g2 = rnorm(n/2, 60, 40)

y = c(mean(g1),mean(g2))
ys = c(g1,g2)
group = c(rep("A",n/2),rep("B",n/2))
plotmeans(ys ~ group, xlab="", ylim=c(0,150), main = "", connect=T)

By default you get 95% CIs. Check the help for more options.

Error bars for within-subjects manipulations

See Dan Wright's page on the subject.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License