1

感谢@Frank 和我之前的帖子(那里有更多详细信息),我可以用它来回答一些关于人们在酒吧饮酒模式的数据集的问题:

bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE
Quark’s Bar,Bender,Black Hole,1,TRUE
Quark’s Bar,Bender,water,1,FALSE
Quark’s Bar,Bender,unspecified,1,TRUE
Quark’s Bar,Homer,Black Hole,2,FALSE
Quark’s Bar,Guinan,unspecified,2,TRUE
Quark’s Bar,Guinan,Black Hole,1,TRUE
Quark’s Bar,Krusty,Black Hole,1,FALSE
Quark’s Bar,Krusty,water,2,FALSE
Quark’s Bar,Rebecca,unspecified,1,FALSE
Maz’s Tavern,Krusty,water,1,TRUE
Maz’s Tavern,Rebecca,water,1,FALSE
Maz’s Tavern,Homer,water,1,TRUE
Maz’s Tavern,Bender,water,2,FALSE

具体来说,@Frank 建议使用以下代码:

DF %>%
  arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
  summarise(
    Ld   = toString(drink_ordered),
    Ldt  = paste(Ld, toString(times_ordered), sep="_"),
    Ldtl = paste(Ldt, toString(liked_it), sep="_")
  ) %>% 
  group_by(bar_name) %>% 
  summarise_each(funs(n_distinct)) %>%
  mutate_each(funs(. == 1), -person, -bar_name)

这会生成分组摘要,说明顾客是否在每个酒吧点了相同的饮料,有多少,以及他们是否喜欢它们:

#        bar_name person    Ld   Ldt  Ldtl
#           (chr)  (int) (lgl) (lgl) (lgl)
# 1        Cheers      3  TRUE  TRUE FALSE
# 2  Moe’s Tavern      3  TRUE FALSE FALSE
# 3    Mos Eisley      3  TRUE  TRUE  TRUE
# 4   Ten Forward      3 FALSE FALSE FALSE
# 5 The Hip Joint      4 FALSE FALSE FALSE

然而,对于这篇文章,我还有一个额外的问题,有些人的饮料订单是unspecified(in Quark's Bar),有些人点的是water

  1. 因为unspecified,我希望它充当“通用”饮料,因此它不会被视为不同的饮料(如果在该酒吧订购了其他饮料)。例如,在Quark's Bar我想要结果是TRUE每个人都点了相同的饮料。当然,如果在酒吧里每个人都只点菜unspecified,结果也会是TRUE

  2. 对于water,我通常希望它被忽略(例如,因为它不是酒精饮料!),所以起初我认为我可以简单地使用 dplyrfilter()删除订单所在的数据行water。复杂之处在于,我希望结果是TRUE人们唯一订购的东西water,例如 in Maz's Tavern。所以我不认为我可以简单地删除行water,我希望他们被考虑!换句话说,我不想water计算,除非它是唯一订购过的东西bar_name

有没有办法有条件地(这是正确的术语吗?)处理“特殊”项目,如wateror unspecified?我更喜欢基于 dplyr(即 Hadley-verse)的解决方案,该解决方案生成的表格类似于 @Frank 使用上面的代码所做的表格,其中考虑了这两个项目,尽管您能想到的任何内容都会受到赞赏。谢谢!

4

0 回答 0