我想ggplot2
通过创建新的stats
函数和ggproto
对象来实现对 Cox 比例风险模型的诊断。这个想法是从分组(按color
,facet_grid
等)中受益,以对所需统计数据(例如,马丁格尔残差)进行条件计算。在下面的示例中,对空模型进行了重新拟合,并为每个辅因子水平计算了鞅。问题是:
- 拟合是基于'方法
data
内部提供的参数完成的,而不是实际数据集。这意味着,该数据框被剥离了未明确定义的列,因此每次由用户至少提供“事件时间”和“事件指示器”列。(除非它们在功能包装器中硬编码)。可以以某种方式访问原始数据集中的相应行吗?ggproto
compute_group
aes
stat_...
compute_group
compute_group
的行名data
是唯一的PANEL
,这意味着它们不反映提供给 的原始数据集的实际行名ggplot
,并且我们不能再排除完整模型省略的不完整情况,除非明确指定/硬编码某些 id 变量。问题和上面一样。geom
图层可以访问由另一个自定义统计数据计算的'y
,该统计数据也被绘制吗?考虑一个更平滑的残差散点图,例如。ggplot(data, aes(x = covariate, time = time, event = event)) + stat_funcform() + geom_smooth(aes(y = ..martingales..))
, 其中..martingales..
实际由 计算stat_funcform
。
.
# Load & Set data ---------------------------------------------------------
require(data.table)
require(dplyr)
require(ggplot2)
require(survival)
set.seed(1)
dt <- data.table(factor = sample(c(0, 2), 1e3, T),
factor2 = sample(c(0, 2, NA), 1e3, T),
covariate = runif(1e3, 0, 1))
dt[, `:=`(etime = rexp(1e3, .01 * (.5 + factor + covariate)),
ctime = runif(1e3, 0, 1e2))]
dt[, `:=`(event = ifelse(etime < ctime, 1, 0),
time = ifelse(etime < ctime, etime, ctime))]
survfit(Surv(time, event) ~ factor + I(covariate > .5), data = dt) %>%
autoplot(fun = 'cloglog')
# Fit coxph model ---------------------------------------------------------
fit <- coxph(Surv(time, event) ~ factor + covariate, data = dt)
# Define custom stat ------------------------------------------------------
StatFuncform <- ggproto(
"StatFuncform", Stat,
compute_group = function(data, scales, params) {
head(data) %>% print
data$y <- update(params$fit, ~ 1, data = data) %>%
resid(type = 'martingale')
return(data)
}
)
stat_funcform <- function(mapping = NULL, data = NULL, geom = "point",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
stat = StatFuncform, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
# Plot --------------------------------------------------------------------
# computation unsuccessfull - no 'time' and 'event' variables
ggplot(data = dt,
mapping = aes(x = covariate,
color = factor(event))) +
stat_funcform(params = list(fit = fit)) +
facet_grid(~ factor, labeller = label_both, margins = TRUE)
# ok - 'time' and 'event' specified explicitly
ggplot(data = dt,
mapping = aes(x = covariate, time = time, event = event,
factor2 = factor2,
color = factor(event))) +
stat_funcform(params = list(fit = fit)) +
facet_grid(~ factor, labeller = label_both, margins = TRUE)
我知道已经有一些包实现了生存分析(例如survminer
)或ggplot2
's autoplot
方法的诊断,但它们宁愿提供包装器而不是利用默认ggplot2() + stat_
语义。