是的,您可以将两组的对数似然相加(如果它们是单独计算的)。就像您将观察向量的对数似然相加一样,其中每个观察具有不同的生成参数。
我更喜欢考虑一个大向量(即形状参数),它包含根据协变量结构(即组成员资格)而变化的值。在线性模型上下文中,该向量可以等于线性预测变量(一旦通过链接函数适当转换):设计矩阵的点积和回归系数的向量。
这是一个(非功能化)示例:
## setup true values
nobs = 50 ## number of observations
a1 = 1 ## shape for first group
b1 = 2 ## scale parameter for both groups
beta = c(a1, a1 * 1.5) ## vector of linear coefficients (group shapes)
## model matrix for full, null models
mm_full = cbind(grp1 = rep(c(1,0), each = nobs), grp2 = rep(c(0,1), each = nobs))
mm_null = cbind(grp1 = rep(1, nobs*2))
## shape parameter vector for the full, null models
shapes_full = mm_full %*% beta ## different shape parameters by group (full model)
shapes_null = mm_null %*% beta[1] ## same shape parameter for all obs
scales = rep(b1, length(shapes_full)) ## scale parameters the same for both groups
## simulate response from full model
response = rweibull(length(shapes_full), shapes_full, scales)
## the log likelihood for the full, null models:
LL_full = sum(dweibull(response, shapes_full, scales, log = T))
LL_null = sum(dweibull(response, shapes_null, scales, log = T))
## likelihood ratio test
LR_test = function(LL_null, LL_full, df) {
LR = -2 * (LL_null - LL_full) ## test statistic
pchisq(LR, df = df, ncp = 0, lower = F) ## probability of test statistic under central chi-sq distribution
}
LR_test(LL_null, LL_full, 1) ## 1 degrees freedom (1 parameter added)
要编写对数似然函数来查找 Weibull 模型的 MLE,其中形状参数是协变量的一些线性函数,您可以使用相同的方法:
## (negative) log-likelihood function
LL_weibull = function(par, data, mm, inv_link_fun = function(.) .){
P = ncol(mm) ## number of regression coefficients
N = nrow(mm) ## number of observations
shapes = inv_link_fun(mm %*% par[1:P]) ## shape vector (possibly transformed)
scales = rep(par[P+1], N) ## scale vector
-sum(dweibull(data, shape = shapes, scale = scales, log = T)) ## negative log likelihood
}
那么您的功率模拟可能如下所示:
## function to simulate data, perform LRT
weibull_sim = function(true_shapes, true_scales, mm_full, mm_null){
## simulate response
response = rweibull(length(true_shapes), true_shapes, true_scales)
## find MLE
mle_full = optim(par = rep(1, ncol(mm_full)+1), fn = LL_weibull, data = response, mm = mm_full)
mle_null = optim(par = rep(1, ncol(mm_null)+1), fn = LL_weibull, data = response, mm = mm_null)
## likelihood ratio test
df = ncol(mm_full) - ncol(mm_null)
return(LR_test(-mle_null$value, -mle_full$value, df))
}
## run simulations
nsim = 1000
pvals = sapply(1:nsim, function(.) weibull_sim(shapes_full, scales, mm_full, mm_null) )
## calculate power
alpha = 0.05
power = sum(pvals < alpha) / nsim
在上面的示例中,身份链接可以正常工作,但对于更复杂的模型,可能需要进行某种转换。
而且您不必在对数似然函数中使用线性代数——显然,您可以以任何您认为合适的方式构造形状向量(只要您在向量中明确索引适当的生成参数par
)。
区间删失数据
Weibull 分布(在 R 中)的累积分布函数F(T)pweibull
给出了时间T之前的失效概率。因此,如果观察在时间T[0]和T[1]之间被删失,则对象在T[0]和T[1]之间失败的概率是F(T[1]) - F(T[0 ]):对象在T[1]之前失败的概率减去它在T[0]之前失败的概率( T[0]和T[1]之间的 PDF 的积分)。因为 Weibull CDF 已经在 R 中实现,所以修改上面的似然函数很简单:
LL_ic_weibull <- function(par, data, mm){
## 'data' has two columns, left and right times of censoring interval
P = ncol(mm) ## number of regression coefficients
shapes = mm %*% par[1:P]
scales = par[P+1]
-sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
}
或者,如果您不想使用模型矩阵等,而只是限制自己按组索引形状参数向量,您可以执行以下操作:
LL_ic_weibull2 <- function(par, data, nobs){
## 'data' has two columns, left and right times of censoring interval
## 'nobs' is a vector that contains the num. observations for each group (grp1, grp2, ...)
P = length(nobs) ## number of regression coefficients
shapes = rep(par[1:P], nobs)
scales = par[P+1]
-sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
}
测试两个函数是否给出相同的解决方案:
## generate intervals from simulated response (above)
left = ifelse(response - 0.2 < 0, 0, response - 0.2)
right = response + 0.2
response_ic = cbind(left, right)
## find MLE w/ first LL function (model matrix)
mle_ic_full = optim(par = c(1,1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_full)
mle_ic_null = optim(par = c(1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_null)
## find MLE w/ second LL function (groups only)
nobs_per_group = apply(mm_full, 2, sum) ## just contains number of observations per group
nobs_one_group = nrow(mm_null) ## one group so only one value
mle_ic_full2 = optim(par = c(1,1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_per_group)
mle_ic_null2 = optim(par = c(1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_one_group)