Structural Equation Modeling

The SEM package

To do SEM in R, you need to load either OpenMx, or John Fox's sem package. The former is extremely powerful and flexible, the latter allows a range of fairly simple SEM designs to be specified with an easy to learn syntax.


Judea Pearl has made big changes to people's thinking about causality over the last decade, placing thinking in causal terms on a firm mathematical footing. Directed acyclic graphs are key to this change, and SEM uses DAGs.

Because of this, the common stand off between economics and statistics, exemplified in comments such as this

A cynical view of SEMs is that their popularity in the social sciences re‡flects the legitimacy that the models appear to lend to causal interpretation of observational data, when in fact such interpretation is no less problematic than for other kinds of regression models applied to observational data.

John Fox on SEM (from a useful Appendix) is undergoing a re-think. A key change is the recognition that a correlation is not merely "not causation", but rather reflects an unresolved causal structures.

Hypothesis testing

We can choose between models. Thus without knowing where the truth is, we can move towards it by moving to better fitting models, according to their fit.

Fit statistics

"There is a veritable cottage industry in ad-hoc fit indices and their evaluation," says John Fox.

From R's sem package you get:

  • Model $\chi^2$, with a test of the likehood of this given $H_0$. Represents a comparison of the model implied covariance matrix with the original sample covariance matrix. You want a low $\chi^2$, high p. For sufficient sample sizes, it's argued that p will always be less than 0.05 despite being a good fit.
  • $\chi^2$ (null model). This is the same as above, except the implied matrix is just the diagonal of the sample matrix, with other cells set to zero.
  • Goodness-of-fit index (GFI). See AGFI.
  • Adjusted goodness-of-fit index (AGFI; "it is probably fair to say that the GFI and AGFI are of little pratical value," says John Fox's Appendix)
  • RMSEA index, with 90% CIs ("perhaps more attractive than the others," says John Fox.) It compares the model to a saturated population model; $\le 0.05$ is good
  • Bentler-Bonnett NFI
  • Tucker-Lewis NNFI
  • Bentler CFI
  • SRMR
  • Bayesian Information Criterion (BIC) ("In contrast with ad-hoc fi…t indices […] has a sound statistical basis," says Fox)

Non-gaussian variables.

You can fit models of ordinal variables using polychoric correlations and bootstrapping to come up with respectable estimates of standard error. See (Fox 2006, p. 481).

Useful literature

  1. John Fox's Appendix to An R and S-PLUS Companion to Applied Regression
  2. Fox (2006). Structural Equation Modeling With the sem Package in R. Structural Equation Modeling, 13:465-486
  3. Special issue in PAID on SEM.
  4. Arne Henningsen, Jeff D. Hamann (2007). systemfit: A Package for Estimating Systems of Simultaneous Equations in R. Journal of Statistical Software, 23(4).
  5. Notes by William Revelle. Some more, with good examples of model selection for CFA type models.
  6. A helpful website by David A. Kenny
  7. Ed Rigdon's SEM FAQ
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License