Exploratory Factor Analysis

Function to use?

factanal does maximum likelihood (not full information) exploratory factor analysis, and allows extraction of various kinds of factor scores (useful, e.g., for g-mongers ;-)). The GPArotation package has additional rotations which are often seen in the literature.

Non-continuous variables - beware

Don't use factanal on binary or ordinal data, especially if they come from a test which has been designed using an IRT type approach. See, e.g., van der Ven and Ellis (2000, p. 47):

Ordinary factor analysis and other classical test theory methods, if applied to binary variables, may entail too many factors, some of which are related to the item di¬Āfficulties (Hattie, 1985; Green, Lissitz & Mulaik, 1977; McDonald & Ahlawat, 1974; McDonald, 1981). This problem has a history dating back to Spearman (1927) and Hertzman (1936).

You can compute heterogenous correlations for input using the polycor::hetcor() function

Simulation stuff here

How many factors?

You can use the paran package to do Horn's parallel analysis, and variations thereof, to help you decide. The paran function gives you a plot, to which you might want to add a legend. Complete example below (using PCA).

require(paran)
paran(USArrests, iterations=5000, centile=95, graph=T)
legend(3, 2.5, c("Unadjusted", "Adjusted", "Random"), lty = 1, col = c("red","black","blue"))
efa.png

BEWARE. Using factor analysis rather than PCA for this can take a long long time (I tried it - took 6 hours vs. a few seconds). You can use EFA using the number of factors suggested by the PCA parallel analysis.

See the references for some guides on how to decide the number of factors to retain.

References

  1. Bollen, K. A. (2002). Latent variables in psychology and the Social Sciences. Annual Reviews of Psychology, 53, 605-634
  2. Borsboom, D.; Mellenbergh, G. J. & Van Heerden, J. (2003) The theoretical status of latent variables. Psychological Review, 110, 203-219
  3. Ehrenberg, A. S. C. (1962). Some Questions About Factor Analysis. The Statistician, 12, 191-208
  4. Ferrando, P. J. & Lorenzo-Seva, U. (2000). Unrestricted versus restricted factor analysis of multidimensional test items: Some aspects of the problem and some suggestions. Psicológica, 21, 301-323
  5. Glorfeld, L. W. (1995). An improvement on Horn's parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55, 377-393
  6. Horn, J. L. (1965). A rationale and a test for the number of factors in factor analysis. Psychometrika, 30, 179-185
  7. Loehlin, J. C. (1990). Component Analysis versus Common Factor Analysis: A Case of Disputed Authorship. Multivariate Behavioral Research, 25, 29-31
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License