Cox regression analysis

(A Note By Alex Weiss On A Problem He Solved)

I recently have been working on proportional hazards (or survival) modeling on the VES data set. Among other things, the study involves modeling time to death as a function of cognitive ability (a latent variable measured by four IQ tests), MMPI-derived Neuroticism, and their interaction. Given the latent variable component to it all, I used Mplus to do the main analysis.

When we got the reviews back, one reviewer asked us to test whether the proportionality assumption had been violated. This is a common request from reviewers in these types of studies. While I could have just done tested for the violation of the assumption in SPSS or SAS, it's not a very straightforward process. You have to create variables, etc. Given that R has a means of testing proportionality in the "survival" library, I decided to try it in R.

The first thing I did was to ask Mplus to output the data, including the estimated latent variable scores and interaction term. Mplus is nice enough to dump out the data in a clean format if you just ask nicely by including the following at the end of your input file:

SAVEDATA:
FILE IS name.dat;
SAVE=fscores;

Note: fscores = factor scores

Mplus outputs this in plain text and gives the ordering and formatting of the variables at the end of the output file. In my case, the format for each variable was F10.3. Also, missing data is denoted as a *.

My first task was to read this data into R. Fortunately, there is a function that lets one read in ye olde FORTRAN formatting and assign column names.

Oddly enough, F10.3 gave me values that were wrong in terms of where the decimal place should be, etc. I fixed this by just using F10.0, instead. A check indicated that this read the data in correctly.

# Reading in VES data set output from Mplus
data<-read.fortran(file="O:/data/Vietnam\ Experience\ Study/g_scores.dat",na.strings="*",c("F10.0","F10.0","F10.0","F10.0","F10.0","F10.0","F10.0","F10.0","F10.0"))

# Assigning column names
colnames(data)<-c("days","zverbal","zarithmetic","zwais1","zwais2","zn","status","f1","f1xzn")

The next step was to recode the status variable so that the occurrence of the event (death) was equal to 1 instead of 0.

# Assigning column names
colnames(data)<-c("days","zverbal","zarithmetic","zwais1","zwais2","zn","status","f1","f1xzn")

I then ran separate Cox regressions for each of the measured variables (the four IQ tests and Neuroticism), the latent cognitive ability variable (F1), and the interaction f1xzn. I also ran a model that included f1, Neuroticism, and f1xzn. I made sure to output the results of each model into a data frame.
You will note that the R code is quite straightforward.

# Running Cox regressions
model_verbal<-coxph(Surv(days,death) ~ zverbal,data=data)
model_arithmetic<-coxph(Surv(days,death) ~ zarithmetic,data=data)
model_wais_1<-coxph(Surv(days,death) ~ zwais1,data=data)
model_wais_2<-coxph(Surv(days,death) ~ zwais2,data=data)
model_neuroticism<-coxph(Surv(days,death) ~ zn,data=data)
model_f1<-coxph(Surv(days,death) ~ f1,data=data)
model_f1xzn<-coxph(Surv(days,death) ~ f1xzn,data=data)
model_full<-coxph(Surv(days,death) ~ f1+zn+f1xzn,data=data)

Finally, I used the cox.zph function to test for violations of the proportionality assumption. If any variables or the whole model is significant (p < .05), it has violated proportionality.

# Testing proportionality assumption for each variable
cox.zph(model_verbal)
cox.zph(model_arithmetic)
cox.zph(model_wais_1)
cox.zph(model_wais_2)
cox.zph(model_neuroticism)
cox.zph(model_f1)
cox.zph(model_f1xzn)
cox.zph(model_full)

The end results were good, i.e., none of these effects violated the proportionality assumption. I will just present the results of the full model.

> cox.zph(model_full)
           rho chisq     p
f1     -0.0508 0.647 0.421
zn      0.0547 0.654 0.419
f1xzn  -0.0327 0.255 0.614
GLOBAL      NA 3.537 0.316
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License