r - 有没有办法获得具有新数据和固定系数的 Cox PH 模型的部分可能性？

Question

我正在对竞争风险比例风险模型进行交叉验证。在mstatepacakge 的帮助下，我已经准备好我的数据并正在将其与survival::coxph. 我为我的训练数据获得了一个合适的 Cox 模型对象，但我想用我的测试数据评估我的训练系数的部分可能性。

如果需要，我会自己编写部分似然函数，但我宁愿不写（尽管它可能对我有好处）。生存包在此 C 代码中计算，但似然计算嵌入在拟合函数中。也许有一种方法可以修复参数，或者其他一些工具可以轻松获得部分可能性？

最小工作示例

# Adapted from examples in the mstate vignette
# http://cran.r-project.org/web/packages/mstate/vignettes/Tutorial.pdf
# beginning at the bottom of page 28

library(mstate)
library(survival)

# Get data. I add a second explanatory variable (badx) for illustration
# Also divide the data by subject into training and test sets.
data(aidssi)
si <- aidssi # Just a shorter name
si$badx <- sample(c("A", "B"), size = nrow(si), replace = TRUE)
si$fold <- sample(c("train", "test"), size = nrow(si), replace = TRUE, prob = c(0.7, 0.3))
tmat <- trans.comprisk(2, names = c("event-free", "AIDS", "SI"))
si$stat1 <- as.numeric(si$status == 1)
si$stat2 <- as.numeric(si$status == 2)

# Convert the data to a long competing risks format
silong <- msprep(time = c(NA, "time", "time"), 
                 status = c(NA,"stat1", "stat2"),
                 data = si, keep = c("ccr5", "badx", "fold"), trans = tmat)
silong <- na.omit(silong)
silong <- expand.covs(silong, c("ccr5", "badx"))
train.dat <- subset(silong, fold == "train")
test.dat <- subset(silong, fold == "test")

数据如下所示：

> head(silong)
An object of class 'msdata'

Data:
  id from to trans Tstart  Tstop   time status ccr5 badx  fold ccr5WM.1 ccr5WM.2 badxB.1 badxB.2
1  1    1  2     1      0  9.106  9.106      1   WW    A train        0        0       0       0
2  1    1  3     2      0  9.106  9.106      0   WW    A train        0        0       0       0
3  2    1  2     1      0 11.039 11.039      0   WM    B train        1        0       1       0
4  2    1  3     2      0 11.039 11.039      0   WM    B train        0        1       0       1
5  3    1  2     1      0  2.234  2.234      1   WW    B train        0        0       1       0
6  3    1  3     2      0  2.234  2.234      0   WW    B train        0        0       0       1

现在，ccr5可以将变量建模为特定于转换的变量，或者建模为对所有转换具有相等的比例效应。这些型号是：

train.mod.equal <- coxph(Surv(time, status) ~ ccr5 + badx + strata(trans),
                         data = train.dat)
train.mod.specific <- coxph(Surv(time, status) ~ ccr5WM.1 + ccr5WM.2 + badx + strata(trans),
                            data = train.dat)

现在我想使用测试数据来评估变量选择是否ccr5应该是特定于转换的。我有一个庞大的数据集和许多变量——大部分但不是所有的分类变量——这两种方式都可以。评估是我卡住的地方。

# We can fit the same models to the test data,
# this yields new parameter estimates of course,
# but the model matrices might be useful
test.mod.equal <- coxph(Surv(time, status) ~ ccr5 + badx + strata(trans),
                         data = test.dat)
test.mod.specific <- coxph(Surv(time, status) ~ ccr5WM.1 + ccr5WM.2 + badx + strata(trans),
                            data = test.dat)
test.eq.mm <- model.matrix(test.mod.equal)
test.sp.mm <- model.matrix(test.mod.specific)

# We can use these to get the first part of the sum of the partial likelihood:
xbeta.eq <- test.eq.mm[test.dat$status == 1, ] %*% coef(train.mod.equal)
xbeta.sp <- test.sp.mm[test.dat$status == 1, ] %*% coef(train.mod.specific)

# We can also get linear predictors
lp.eq <- predict(train.mod.equal, newdata = test.dat, type = "lp")
lp.sp <- predict(train.mod.specific, newdata = test.dat, type = "lp")

我希望用训练系数估计来计算测试数据上每个模型的部分可能性。也许我应该将问题移至交叉验证并询问线性预测变量的总和（或不包括审查案例的线性预测变量的总和）是否足够接近等效度量。

score 3 · Accepted Answer

这就是我在写信时提出的建议：'你能计算一个“新模型”吗（使用 [新数据] 和一个公式，该公式包括一个偏移 [用] beta 估计 [从原始拟合] 构建，然后summary(mdl)使用为你做繁重的工作？你甚至可以用 predict.coxph 计算偏移量。结果我不需要使用summary.coxph，因为print.coxph它给出了 LLR 统计数据。

 lp.eq <- predict(train.mod.equal, newdata = test.dat, type = "lp")
 eq.test.mod <- coxph(Surv(time, status) ~ ccr5 + badx + strata(trans)+offset(lp.eq), 
   data=test.dat )
eq.test.mod

Call:
coxph(formula = Surv(time, status) ~ ccr5 + badx + strata(trans) + 
    offset(lp.eq), data = test.dat)


           coef exp(coef) se(coef)       z    p
ccr5WM -0.20841     0.812    0.323 -0.6459 0.52
badxB  -0.00829     0.992    0.235 -0.0354 0.97

Likelihood ratio test=0.44  on 2 df, p=0.804  n= 212, number of events= 74

我将其解释为，与基于第一个模型但使用新数据的预测相匹配的类似模型没有显着差异（与空模型相比），并且在对数似然尺度上，它是 0.44“距离”从一个精确的配合。

正如@Gregor 所指出的，可以访问 coxph 对象的“loglik”节点，但我建议不要对单个值附加太多含义。为了获得他的 LRT 统计数据，可以产生：

> diff(eq.test.mod$loglik)
[1] 0.399137

出于兴趣，还请查看没有偏移量的结果：

> coxph(Surv(time, status) ~ ccr5 + badx + strata(trans), 
+       data=test.dat)
Call:
coxph(formula = Surv(time, status) ~ ccr5 + badx + strata(trans), 
    data = test.dat)


          coef exp(coef) se(coef)      z      p
ccr5WM -0.8618     0.422    0.323 -2.671 0.0076
badxB  -0.0589     0.943    0.235 -0.251 0.8000

Likelihood ratio test=8.42  on 2 df, p=0.0148  n= 212, number of events= 74

在针对原始数据进行测试时，您确实得到了预期的结果：

> lp.eq2 <- predict(train.mod.equal, newdata = train.dat, type = "lp")
> coxph(Surv(time, status) ~ ccr5 + badx + strata(trans)+offset(lp.eq2), 
+       data=train.dat)
Call:
coxph(formula = Surv(time, status) ~ ccr5 + badx + strata(trans) + 
    offset(lp.eq2), data = train.dat)


            coef exp(coef) se(coef)         z p
ccr5WM -4.67e-12         1    0.230 -2.03e-11 1
badxB   2.57e-14         1    0.168  1.53e-13 1

Likelihood ratio test=0  on 2 df, p=1  n= 436, number of events= 146

r - 有没有办法获得具有新数据和固定系数的 Cox PH 模型的部分可能性？

最小工作示例

1 回答 1

Related

Reference