r - 使用 ggplot2 使用 R 绘制各种案例

Question

我正在尝试为我的公司可视化健康保险福利选项，以帮助其他人做出决定。我有一张这样的桌子：

| plan |        ded |  oop | exp_oop |
|------+------------+------+---------|
|    a |        400 | 2100 | 17400   |
|    b |       1300 | 2600 | 14300   |
|    c |       2600 | 5200 | 28600   |

ded= 免赔额；90% 共同保险的费用水平
oop= 最高金额
exp_oopoop=达到的医疗费用金额

我想绘制员工的成本与产生的医疗费用。健康保险的工作范围...

cost = expenses for 0 < expenses < ded
cost = deductible + (0.10 x (expenses - ded)) for ded <= expenses < exp_oop
cost = oop for oop <= expenses <= infinity

我如何绘制每个范围？基本上，每个计划的免赔额都有一条斜率 = 1 的线，然后从 x = 免赔额到 x = oop 的斜率 = 0.1 的线，然后从 oop 向上的斜率 = 0 的线。

我不确定如何有条件地使用 ggplot2 进行绘图。如果您想使用上述内容，以下是这些截止值的可重现代码：

dat <- data.frame(plan = c("a", "b", "c"), ded = c(400, 1300, 2600), oop = c(2100, 2600, 5200), exp_oop = c(17400, 14300, 28600))

我必须自己创建 x/y 值吗？换句话说，像这样的中间表？

| plan |     x |    y |
|------+-------+------|
|    1 |     0 |    0 |
|    1 |   400 |  400 |
|    1 | 17400 | 2100 |
|    2 |     0 |    0 |
|    2 |  1300 | 1300 |
|    2 | 14300 | 2600 |
|    3 |     0 |    0 |
|    3 |  2600 | 2600 |
|    3 | 28600 | 5200 |

我正在为几个变体（仅限员工，员工+配偶等）这样做，所以如果我不需要为每个计划单独的数据表，但可以使用已经定义的免赔额和自付费用最大值，那就太好了我已经在数据框中拥有的值...

感谢您的任何建议！

score 1 · Accepted Answer

编写一个向量化函数来计算员工的成本作为发生费用的函数。它必须被矢量化，以便您可以将其提供给ddply.

costFinder <- function(df, oopActual) {
  #df is your 'dat'; we will throw away exp_oop
  #oopActual should be a vector; it is the x axis of your plot
  ded <- df$ded
  oopMax <- df$oop
  cost <- rep(NA, length(oopActual)) #preallocating with NAs will help ID mistakes
  cost[oopActual<ded] <- oopActual[oopActual<ded]
  cost[ded <= oopActual & oopActual < oopMax] <- 0.1 * (oopActual[ded <= oopActual & oopActual < oopMax] - ded) + ded
  cost[oopMax <= oopActual] <- oopMax
  return(cost)
}

然后定义一个expense序列（不要太多数据点，否则计算量会很大），并为每个计划计算每个费用值的实际自付费用：

expense <- seq(0, 50000, by=200)
allCosts <- ddply(dat, .(plan), costFinder, expense)
names(allCosts)[2:ncol(allCosts)] <- expense

现在融化矢量，以便您可以将其与ggplot. 在这里，我使用了用数值重命名 allCosts 数据框的列的阴暗技巧。这可能是一个坏主意，我很想看到一个更好的方法来做到这一点。

costsM <- melt(allCosts, id.vars="plan") 
names(costsM)[2:3] <- c("expense", "actualOOP")
#melt() interprets the column names as a factor. We have to turn them back into numeric,
#    by turning them into characters first and then numerics.
costsM$expense <- as.character(costsM$expense)
costsM$expense <- as.numeric(costsM$expense)

#Plot the data
p <- ggplot() + geom_line(data=costsM, aes(x=expense, y=actualOOP, colour=plan))
print(p)

在此处输入图像描述

#Add vertical lines for the expected OOP, if you like - arguably it makes things more confusing.
p + geom_vline(data=dat, aes(xintercept=exp_oop, colour=plan))

在此处输入图像描述

score 1 · Accepted Answer

我的方法基本上遵循Drew 的方法，只是步骤不同。我从一个函数开始，它接受plan、ded、oop和exp_oop并返回一个函数，该函数给出给定费用的成本（基于这些参数）。[注意：我假设第二层和第三层之间的中断是exp_oop，而不是oop问题中最初说明的那样。]

cost_generator <- function(ded, oop, exp_oop, ...) {
  function(expenses) {
    ifelse(expenses < ded, 
           expenses, 
           ifelse(expenses < exp_oop, 
                  ded + (0.1 * (expenses-ded)),
                  oop))
  }
}

现在使用plyr，我可以创建一个函数列表，将费用映射到成本，每个计划一个

library("plyr")
funs <- mlply(dat, cost_generator)

对于每个功能，确定给定费用范围的成本。在这里，我选择了从 0 到 50,000 美元的范围，以 100 美元为增量。

pts <- ldply(funs, function(f) {
  expenses <- seq(0, 50000, 100)
  data.frame(expenses=expenses, cost=f(expenses))
})

这给出了一个易于绘制的长格式数据框。

library("ggplot2")
ggplot(pts, aes(expenses, cost, colour=plan)) +
  geom_line()

在此处输入图像描述

当然，这并不是真正的成本，而是为给定的费用水平自掏腰包支付的金额。总成本将包括额外的东西（至少保费）。

编辑：

如果您想确保包含每个更改点（不依赖四舍五入到最接近的 100 美元），您可以从中提取点dat并使用这些点：

library("reshape2")
exps <- melt(dat, id.var="plan")$value
exps <- c(0, exps, 1.1*max(exps))

pts <- ldply(funs, function(f) {
  data.frame(expenses=exps, cost=f(exps))
})

我添加了 0 和大于表中最大值的值，以使结果合理。

在此处输入图像描述

r - 使用 ggplot2 使用 R 绘制各种案例

2 回答 2

Related

Reference