I am trying to build a survival model on a dataset which has large number of covariates (~250). I used a Proportional Hazards model, and used the following formula:
param <- survreg(enrlSurv ~ X, dist = "loglogistic", data = train_df)
I have created X using as.matrix()
function on the train_df dataframe and excluding some columns using select
. I wanted to know if this the correct way to define the formula or there exists a betterway to do this?
I also noticed in summary of the param
object that each covariate had an 'X' subscript.
I am also getting an error when I run the predict function:
pct <- seq(.0,.99,by=.01)
predOv <- predict(param, newdata=test_df, type = "quantile", p = pct)
The predict function returns the same number of rows as the train_df instead of the test_df. Any help would be greatly appreciated.