我有一个包含三列的数据框。Site 和 Provider 是字符串,LOS 是连续的。我的数据集有超过 1500 行,所以我用种子对数据集进行了采样,以便于分析。使用这个数据框:
HECCSV:
Site Provider LOS
SFH ASAND 259
SFH ASAND 343
SJH LEI, 203
SFH HARME 182
SJH KHAN, 303
EMH DELUH 145
SJH VISSA 317
EMH SURIA 266
SJH MAKOW 113
SJH HERNA 263
FRK SMUKO 262
EMH BRUCE 309
SFH CESAR 197
FRK SMUKO 217
SFH SHALL 200
SJH NASH, 258
SJH DOUGH 391
SJH LACRO 368
FRK RICHB 196
SJH DONIE 208
SJH MONAW 245
SJH STULL 307
SJH SCHIL 330
SFH ABERN 340
SJH DELUH 420
SJH DELUH 160
SFH KORTB 328
SJH FRAZI 150
FRK BLANK 281
SJH KRAUS 109
SJH ROSIE 279
SJH MURPH 200
SFH ENGLI 231
SFH SMUKO 205
SJH JOHNS 360
SJH DONIE 346
EMH MAURE 102
SJH MANTH 205
SJH FRAZI 289
SJOC MORAN 172
SFH CESAR 112
SJH HERNA 111
SJH LACRO 211
SJH HARME 343
SJH DIXON 89
SFH CULLE 165
SJH WILSO 239
SJH CULLE 200
SFH SMUKO 178
SFH BINDA 98
SJH ABERN 178
EMH MAURE 352
SFH BERGS 201
SJH ANDER 255
SJH HUBBA 107
SFH ASAND 1102
EMH MANTH 143
SJH DELUH 213
EMH RUVAL 258
SJH VISSA 350
SJH FRAZI 364
FRK PILLA 228
SFH WENNI 335
SFH WILSO 214
SJH CULLE 248
SJH LACRO 298
TWHH PARRI 135
SJH SURIA 234
SFH ABERN 317
FRK KRAUS 223
SJH SURIA 310
EMH GLINS 318
SJH ADAR, 308
SJH MAKOW 253
SJH MURPH 257
SFH ABERN 262
SJH STULL 514
SJH ANDER 324
SJH KHAN, 117
SJH LACRO 151
EMH PILLA 150
SJH MUELL 295
SFH RICHB 149
SFH MANTH 315
FRK HERNA 218
FRK ASAND 167
EMH DONOF 161
EMH SWART 243
SJH FRAZI 392
FRK DONIE 213
SFH SMUKO 276
SJH MAURE 531
FRK MAURE 241
SJH LACRO 127
EMH RUVAL 349
EMH DONOF 346
SJH CULLE 399
EMH ANDER 243
SJH MAKOW 175
SFH HONNO 285
我有兴趣绘制每个站点中每个提供商的 LOS 中值。我也想标记异常值。我想使用带有抖动的geom_point。根据其他人的在线评论,我创建了第二个数据框,称为 datJit。对于新的数据框,LOS 现在是指按站点和提供商划分的中间 LOS 组。我为按站点分组的中值 LOS 添加了 SM,xj 是 ggplot 的 x 轴抖动,对于我想在 ggplot 上标记的行,out 是 TRUE。代码如下所示:
data <- ddply(HECCSV, .(Site,Provider), function(x) median(x$LOS))
data <- ddply(data, .(Site), function(g) {g$SM <- quantile(g$V1,0.96);g})
colnames(data) <- c ("Site","Provider","LOS","SM")
datJit <- data
datJit$xj <- jitter(as.numeric(factor(data$Site)))
datJit <- ddply(datJit,.(LOS),.fun=function(g){ g$out <- g$LOS >= g$SM; g } )
datJit 数据框现在看起来像这样:
Site Provider LOS SM xj out
EMH MANTH 36.0 312.60 0.8989241 FALSE
EMH ANDER 62.0 312.60 1.1421376 FALSE
SJH MAKOW 92.0 402.00 4.1620511 FALSE
FRK BANIU 101.0 296.08 1.8130307 FALSE
EMH HARME 104.0 312.60 0.9778059 FALSE
SJH SMUKO 110.0 402.00 4.0529616 FALSE
SFH ARBIT 117.5 571.20 2.9281353 FALSE
SJH DIXON 122.0 402.00 4.0077163 FALSE
SFH SHALL 124.0 571.20 2.8466912 FALSE
SFH FELIC 135.0 571.20 3.0444518 FALSE
EMH DELUH 145.0 312.60 1.0192006 FALSE
EMH PILLA 150.0 312.60 0.8234848 FALSE
SJOC SCERP 151.0 206.68 5.0967039 FALSE
SFH ADAR, 155.0 571.20 3.1976121 FALSE
SJH STULL 159.5 402.00 3.8986343 FALSE
SJH DONIE 165.0 402.00 4.1175304 FALSE
EMH STULL 167.0 312.60 0.8981766 FALSE
FRK SURIA 177.0 296.08 1.8701017 FALSE
EMH ROSIE 181.0 312.60 0.8141137 FALSE
FRK GEDAN 182.0 296.08 1.8771275 FALSE
FRK HONNO 186.0 296.08 2.1625728 FALSE
SJH HUBBA 191.5 402.00 3.9899187 FALSE
SJH SURIA 199.5 402.00 3.9287032 FALSE
FRK DIXON 207.0 296.08 1.8887039 FALSE
SFH CHESK 209.0 571.20 2.9543299 FALSE
SJOC DONIE 209.0 206.68 5.0137046 TRUE
SFH HICKS 210.0 571.20 2.8389316 FALSE
FRK DONIE 213.0 296.08 2.1347646 FALSE
FRK HERNA 218.0 296.08 2.1701933 FALSE
SJH WENNI 219.0 402.00 4.0968437 FALSE
SJH HARME 220.0 402.00 4.1628144 FALSE
SFH ABERN 221.0 571.20 2.9650992 FALSE
SJH GLINS 222.5 402.00 4.1643052 FALSE
SJH HERNA 228.0 402.00 4.1503148 FALSE
SJH KORTB 231.0 402.00 3.8981447 FALSE
SJH MURPH 237.0 402.00 4.0250026 FALSE
SJH WILSO 239.0 402.00 3.9719906 FALSE
FRK MAURE 241.0 296.08 1.8193441 FALSE
SJH FELIC 243.0 402.00 3.8967263 FALSE
SJH ANDER 247.0 402.00 3.8460451 FALSE
EMH RUVAL 247.5 312.60 1.0059763 FALSE
SJH MANTH 254.5 402.00 4.1880197 FALSE
SJH CLINE 260.0 402.00 4.0725797 FALSE
SFH MANTH 262.0 571.20 3.1527713 FALSE
SFH ROSIE 270.0 571.20 3.1446932 FALSE
SJH FULA, 271.5 402.00 4.0326154 FALSE
EMH ABERN 273.0 312.60 1.1911376 FALSE
SJH DELUH 273.0 402.00 4.1119299 FALSE
EMH BRUCE 276.0 312.60 1.1240272 FALSE
SJH WOLF, 280.0 402.00 4.1061772 FALSE
SJH BERGS 296.0 402.00 3.8062060 FALSE
SFH NASH, 302.0 571.20 2.8628445 FALSE
SFH KHAN, 306.0 571.20 3.0914843 FALSE
FRK SMUKO 322.0 296.08 1.9476877 TRUE
SFH DELUH 333.0 571.20 3.0930637 FALSE
SJH FRAZI 333.0 402.00 4.1636525 FALSE
EMH DONOF 337.0 312.60 1.1003078 TRUE
SFH FULA, 343.5 571.20 2.9226700 FALSE
SJH PILLA 361.0 402.00 3.8357858 FALSE
SFH CESAR 373.0 571.20 3.0936832 FALSE
TWHH KUMPR 398.5 398.50 5.8299727 TRUE
SJH CULLE 402.0 402.00 4.0666768 TRUE
SJH LACRO 411.0 402.00 3.8389552 TRUE
SFH DONIE 500.0 571.20 2.9377945 FALSE
SFH BLUM, 678.0 571.20 2.8202060 TRUE
当我在没有 stat_summary 的情况下绘图时,绘图看起来不错,但我不断收到带有 stat_summary 的错误消息:
p <- ggplot(HECCSV,aes(x=Site,y=LOS)) +
geom_point(data = datJit, alpha=0.8, aes(x=xj,colour=Site), size=4)+
stat_summary(fun.y = mean, fun.ymin = min, fun.ymax = max,colour = "red") +
geom_text(data = subset(datJit,out), aes(x=xj, label = Provider) ,hjust=1.1, size=4) +
labs(title="LOS by Site for Provider", y = "Median LOS (mins)" ) +
ggtitle(expression(atop("LOS by Site for Provider", atop(italic("Decemeber 2012"), ""))))
print(p)
错误信息:
错误:提供给连续刻度的离散值
我究竟做错了什么?