1

我有一个包含三列的数据框。Site 和 Provider 是字符串,LOS 是连续的。我的数据集有超过 1500 行,所以我用种子对数据集进行了采样,以便于分析。使用这个数据框:

HECCSV:

Site Provider  LOS
SFH    ASAND  259
SFH    ASAND  343
SJH    LEI,   203
SFH    HARME  182
SJH    KHAN,  303
EMH    DELUH  145
SJH    VISSA  317
EMH    SURIA  266
SJH    MAKOW  113
SJH    HERNA  263
FRK    SMUKO  262
EMH    BRUCE  309
SFH    CESAR  197
FRK    SMUKO  217
SFH    SHALL  200
SJH    NASH,  258
SJH    DOUGH  391
SJH    LACRO  368
FRK    RICHB  196
SJH    DONIE  208
SJH    MONAW  245
SJH    STULL  307
SJH    SCHIL  330
SFH    ABERN  340
SJH    DELUH  420
SJH    DELUH  160
SFH    KORTB  328
SJH    FRAZI  150
FRK    BLANK  281
SJH    KRAUS  109
SJH    ROSIE  279
SJH    MURPH  200
SFH    ENGLI  231
SFH    SMUKO  205
SJH    JOHNS  360
SJH    DONIE  346
EMH    MAURE  102
SJH    MANTH  205
SJH    FRAZI  289
SJOC   MORAN  172
SFH    CESAR  112
SJH    HERNA  111
SJH    LACRO  211
SJH    HARME  343
SJH    DIXON   89
SFH    CULLE  165
SJH    WILSO  239
SJH    CULLE  200
SFH    SMUKO  178
SFH    BINDA   98
SJH    ABERN  178 
EMH    MAURE  352
SFH    BERGS  201
SJH    ANDER  255
SJH    HUBBA  107
SFH    ASAND 1102
EMH    MANTH  143
SJH    DELUH  213
EMH    RUVAL  258
SJH    VISSA  350
SJH    FRAZI  364
FRK    PILLA  228
SFH    WENNI  335
SFH    WILSO  214
SJH    CULLE  248
SJH    LACRO  298
TWHH   PARRI  135
SJH    SURIA  234
SFH    ABERN  317
FRK    KRAUS  223
SJH    SURIA  310
EMH    GLINS  318
SJH    ADAR,  308
SJH    MAKOW  253
SJH    MURPH  257
SFH    ABERN  262
SJH    STULL  514
SJH    ANDER  324
SJH    KHAN,  117
SJH    LACRO  151
EMH    PILLA  150
SJH    MUELL  295
SFH    RICHB  149
SFH    MANTH  315
FRK    HERNA  218
FRK    ASAND  167
EMH    DONOF  161
EMH    SWART  243
SJH    FRAZI  392
FRK    DONIE  213
SFH    SMUKO  276
SJH    MAURE  531
FRK    MAURE  241
SJH    LACRO  127
EMH    RUVAL  349
EMH    DONOF  346
SJH    CULLE  399
EMH    ANDER  243
SJH    MAKOW  175
SFH    HONNO  285

我有兴趣绘制每个站点中每个提供商的 LOS 中值。我也想标记异常值。我想使用带有抖动的geom_point。根据其他人的在线评论,我创建了第二个数据框,称为 datJit。对于新的数据框,LOS 现在是指按站点和提供商划分的中间 LOS 组。我为按站点分组的中值 LOS 添加了 SM,xj 是 ggplot 的 x 轴抖动,对于我想在 ggplot 上标记的行,out 是 TRUE。代码如下所示:

data <- ddply(HECCSV, .(Site,Provider), function(x) median(x$LOS))
data <- ddply(data, .(Site), function(g) {g$SM <- quantile(g$V1,0.96);g})
colnames(data) <- c ("Site","Provider","LOS","SM")
datJit <- data
datJit$xj <- jitter(as.numeric(factor(data$Site)))
datJit <- ddply(datJit,.(LOS),.fun=function(g){ g$out <- g$LOS >= g$SM; g } )

datJit 数据框现在看起来像这样:

Site Provider   LOS     SM        xj   out
EMH    MANTH  36.0 312.60 0.8989241 FALSE
EMH    ANDER  62.0 312.60 1.1421376 FALSE
SJH    MAKOW  92.0 402.00 4.1620511 FALSE
FRK    BANIU 101.0 296.08 1.8130307 FALSE
EMH    HARME 104.0 312.60 0.9778059 FALSE
SJH    SMUKO 110.0 402.00 4.0529616 FALSE
SFH    ARBIT 117.5 571.20 2.9281353 FALSE
SJH    DIXON 122.0 402.00 4.0077163 FALSE
SFH    SHALL 124.0 571.20 2.8466912 FALSE
SFH    FELIC 135.0 571.20 3.0444518 FALSE
EMH    DELUH 145.0 312.60 1.0192006 FALSE
EMH    PILLA 150.0 312.60 0.8234848 FALSE
SJOC   SCERP 151.0 206.68 5.0967039 FALSE
SFH    ADAR, 155.0 571.20 3.1976121 FALSE
SJH    STULL 159.5 402.00 3.8986343 FALSE
SJH    DONIE 165.0 402.00 4.1175304 FALSE
EMH    STULL 167.0 312.60 0.8981766 FALSE
FRK    SURIA 177.0 296.08 1.8701017 FALSE
EMH    ROSIE 181.0 312.60 0.8141137 FALSE
FRK    GEDAN 182.0 296.08 1.8771275 FALSE
FRK    HONNO 186.0 296.08 2.1625728 FALSE
SJH    HUBBA 191.5 402.00 3.9899187 FALSE
SJH    SURIA 199.5 402.00 3.9287032 FALSE
FRK    DIXON 207.0 296.08 1.8887039 FALSE
SFH    CHESK 209.0 571.20 2.9543299 FALSE
SJOC   DONIE 209.0 206.68 5.0137046  TRUE
SFH    HICKS 210.0 571.20 2.8389316 FALSE
FRK    DONIE 213.0 296.08 2.1347646 FALSE
FRK    HERNA 218.0 296.08 2.1701933 FALSE
SJH    WENNI 219.0 402.00 4.0968437 FALSE
SJH    HARME 220.0 402.00 4.1628144 FALSE
SFH    ABERN 221.0 571.20 2.9650992 FALSE
SJH    GLINS 222.5 402.00 4.1643052 FALSE
SJH    HERNA 228.0 402.00 4.1503148 FALSE
SJH    KORTB 231.0 402.00 3.8981447 FALSE
SJH    MURPH 237.0 402.00 4.0250026 FALSE
SJH    WILSO 239.0 402.00 3.9719906 FALSE
FRK    MAURE 241.0 296.08 1.8193441 FALSE
SJH    FELIC 243.0 402.00 3.8967263 FALSE
SJH    ANDER 247.0 402.00 3.8460451 FALSE
EMH    RUVAL 247.5 312.60 1.0059763 FALSE
SJH    MANTH 254.5 402.00 4.1880197 FALSE
SJH    CLINE 260.0 402.00 4.0725797 FALSE
SFH    MANTH 262.0 571.20 3.1527713 FALSE
SFH    ROSIE 270.0 571.20 3.1446932 FALSE
SJH    FULA, 271.5 402.00 4.0326154 FALSE
EMH    ABERN 273.0 312.60 1.1911376 FALSE
SJH    DELUH 273.0 402.00 4.1119299 FALSE
EMH    BRUCE 276.0 312.60 1.1240272 FALSE
SJH    WOLF, 280.0 402.00 4.1061772 FALSE
SJH    BERGS 296.0 402.00 3.8062060 FALSE
SFH    NASH, 302.0 571.20 2.8628445 FALSE
SFH    KHAN, 306.0 571.20 3.0914843 FALSE
FRK    SMUKO 322.0 296.08 1.9476877  TRUE
SFH    DELUH 333.0 571.20 3.0930637 FALSE
SJH    FRAZI 333.0 402.00 4.1636525 FALSE
EMH    DONOF 337.0 312.60 1.1003078  TRUE
SFH    FULA, 343.5 571.20 2.9226700 FALSE
SJH    PILLA 361.0 402.00 3.8357858 FALSE
SFH    CESAR 373.0 571.20 3.0936832 FALSE
TWHH   KUMPR 398.5 398.50 5.8299727  TRUE
SJH    CULLE 402.0 402.00 4.0666768  TRUE
SJH    LACRO 411.0 402.00 3.8389552  TRUE
SFH    DONIE 500.0 571.20 2.9377945 FALSE
SFH    BLUM, 678.0 571.20 2.8202060  TRUE

当我在没有 stat_summary 的情况下绘图时,绘图看起来不错,但我不断收到带有 stat_summary 的错误消息:

p <- ggplot(HECCSV,aes(x=Site,y=LOS)) + 
geom_point(data = datJit, alpha=0.8, aes(x=xj,colour=Site), size=4)+
stat_summary(fun.y = mean, fun.ymin = min, fun.ymax = max,colour = "red") +
geom_text(data = subset(datJit,out), aes(x=xj, label = Provider) ,hjust=1.1, size=4) +
labs(title="LOS by Site for Provider", y = "Median LOS (mins)" ) +
ggtitle(expression(atop("LOS by Site for Provider", atop(italic("Decemeber 2012"), ""))))
print(p)

错误信息:

错误:提供给连续刻度的离散值

我究竟做错了什么?

4

0 回答 0