r - 根据生存数据绘制剂量反应曲线

Question

我想从中制作剂量反应曲线library(drc)，并坚持如何正确准备我的数据集以制作绘图。特别是，我正在努力如何准备好我的 y 轴。

我制作了一个数据框(df)来帮助阐明我想做什么。

df <- read.table("https://pastebin.com/raw/TZdjp2JX", header=T)

为今天的练习打开必要的库

library(drc)

library(ggplot2)

假设我喜欢蜂鸟，并用不同浓度的糖做一个实验，目的是看看哪种浓度最适合蜂鸟。因此，我在封闭环境（此处为“房间”列）中进行了一项实验，有 4 种不同的糖浓度（列浓度），每个浓度有 10 只单独的鸟。我还用 4 个平行重复运行每个实验，这就是为什么有 4 个“房间”。36 小时后（“时间”列），我进入房间，检查有多少只鸟幸存下来，创建一个“是/否”变量，或 1 和 0（这里，这是我的“状态”列），其中 1= =生存，0 ==死亡。

使用这个数据集，我特别设定了大多数在浓度 0 下存活，50% 在浓度 1 中存活，25% 在浓度 2 中存活，只有 10% 在浓度 3 中存活。

我遇到的第一个问题是：如何将我的“状态”列生成的 y 轴转换为百分比？我在做 kaplan-meier 生存曲线时已经这样做了，但不幸的是，这在这里不起作用。显然，这应该列应该从 0% 到 100%（我们可以将列称为“死亡率”）。在我成功之后，我想制作一个如下所示的剂量反应曲线（我在网上找到了这个例子，将直接复制到这里使用示例。它来自R中包含的黑麦草数据集）

ryegrass.LL.4 <- drm(rootl ~ conc, data = ryegrass, fct = LL.3())

我必须承认，接下来的代码步骤对我来说有点混乱。

# new dose levels as support for the line
newdata <- expand.grid(conc=exp(seq(log(0.5), log(100), length=100)))
# predictions and confidence intervals
pm <- predict(ryegrass.LL.4, newdata=newdata, interval="confidence")
# new data with predictions
newdata$p <- pm[,1]
newdata$pmin <- pm[,2]
newdata$pmax <- pm[,3]

# plot curve

# need to shift conc == 0 a bit up, otherwise there are problems with coord_trans
ryegrass$conc0 <- ryegrass$conc
ryegrass$conc0[ryegrass$conc0 == 0] <- 0.5
# plotting the curve
ggplot(ryegrass, aes(x = conc0, y = rootl)) +
  geom_point() +
  geom_ribbon(data=newdata, aes(x=conc, y=p, ymin=pmin, ymax=pmax), alpha=0.2) +
  geom_line(data=newdata, aes(x=conc, y=p)) +
  coord_trans(x="log") +
  xlab("Ferulic acid (mM)") + ylab("Root length (cm)")

最后，我想生成一条类似的曲线，但死亡率在 y 轴上，从 0 到 100（从低开始，到高），并且还在回归线周围的灰色阴影区域中显示置信区间。意思是，我的第一步代码应该类似于以下内容：

model <- drc(mortality ~ Concentration, data=df, fct = LL.3()) 但我迷失在“死亡”创作部分，还有一点关于 ggplot 的下一步

谁能帮我实现这一目标？从的示例中ryegrass，我很困惑如何将其翻译为对我的假装数据集有所帮助。我希望这里有人能够帮助我解决这个问题！非常感谢，如果有其他方法可以让我的数据集结构化等，我将不胜感激。

-安迪

score 1 · Accepted Answer

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(drc)
#> Loading required package: MASS
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> 
#> 'drc' has been loaded.
#> Please cite R and 'drc' if used for a publication,
#> for references type 'citation()' and 'citation('drc')'.
#> 
#> Attaching package: 'drc'
#> The following objects are masked from 'package:stats':
#> 
#>     gaussian, getInitial

df <- read.table("https://pastebin.com/raw/sH5hCr2J", header=T)

制作mortalityor 就像我在这里做的survival那样，可以很容易地用这个dplyr包来完成。这将有助于执行许多计算。您似乎有兴趣计算四个房间（或重复）中每个浓度的存活百分比。所以第一步是按这些列对数据进行分组，然后计算我们想要的统计量。

df_calc <- df %>%
  group_by(Concentration, room) %>%
  summarise(surv = sum(Status)/n())
#> `summarise()` has grouped output by 'Concentration'. You can override using the `.groups` argument.

我不知道浓度是否代表任意浓度水平，所以我继续进行以下假设：

1 == 糖含量较高，2 == 糖含量较低
浓度在对数空间中编码 - 所以我转换为线性空间

df_calc <- mutate(df_calc, conc = exp(-Concentration))

为了清楚起见，这个conc变量只是我试图让一些东西接近实验的真实已知浓度。如果您的数据具有真实浓度，则不要介意这种计算。

df_calc
#> # A tibble: 12 x 4
#> # Groups:   Concentration [3]
#>    Concentration  room  surv   conc
#>            <int> <int> <dbl>  <dbl>
#>  1             1     1   0.5 0.368 
#>  2             1     2   0.4 0.368 
#>  3             1     3   0.5 0.368 
#>  4             1     4   0.6 0.368 
#>  5             2     1   0   0.135 
#>  6             2     2   0.4 0.135 
#>  7             2     3   0.2 0.135 
#>  8             2     4   0.4 0.135 
#>  9             3     1   0.2 0.0498
#> 10             3     2   0   0.0498
#> 11             3     3   0   0.0498
#> 12             3     4   0.2 0.0498

mod <- drm(surv ~ conc, data =  df_calc, fct = LL.3())

制作新的conc数据点

newdata <- data.frame(conc = exp(seq(log(0.01), log(10), length = 100)))

编辑

为了回应您的评论，我将解释上面的代码块。同样，conc变量预计是单位浓度。在这个假设的情况下，我们有三个浓度水平c(0.049, 0.135, 0.368)。为简洁起见，假设单位为mg of sugar/ml of water。我们的模型适用于这三个剂量水平，每个剂量水平有 4 个数据点。如果我们愿意，我们可以在的这些水平之间绘制曲线c(0.049, 0.368)，但在这个例子中，我选择了c(0.01, 10) mg/ml作为要绘制的域。这只是为了让我们可以根据模型拟合来可视化曲线的最终位置。简而言之，您选择您最感兴趣的范围。正如我稍后展示的那样——即使我们可以选择实验数据范围之外的数据点，置信区间也非常大，表明模型对这些点没有帮助。

使用函数转换这些值的原因log()是为了确保我们是在 log10 尺度上看起来均匀分布的采样点（大多数响应曲线都是使用这种转换绘制的）。一旦我们得到 100 个点的序列，我们就exp()可以返回到线性空间（我们的模型适合该空间）。然后将这些值与拟合模型一起在predict函数中用作新dose水平。

所有这些都保存到newdata变量中，该变量允许绘制线和置信区间。

使用模型和生成的数据点来预测新surv值以及上限和下限

newdata <- cbind(newdata,
                 suppressWarnings(predict(mod, newdata = newdata, interval="confidence")))

情节与ggplot2

ggplot(df_calc, aes(conc)) +
  geom_point(aes(y = surv)) +
  geom_ribbon(aes(ymin = Lower, ymax = Upper), data = newdata, alpha = 0.2) +
  geom_line(aes(y = Prediction), data = newdata) +
  scale_x_log10() +
  coord_cartesian(ylim = c(0, 1))

您可能会注意到，当我们尝试预测没有数据的范围时，置信区间会大大增加。

^{由reprex 包于 2021-10-27 创建(v1.0.0)}

r - 根据生存数据绘制剂量反应曲线

1 回答 1

编辑

Related

Reference