这是我的数据框
Colour = c("red", "blue", "red", "blue", "yellow", "green", "red", "blue", "green", "red", "yellow", "blue")
Volume = c(46,46,57,57,57,57,99,99,99,111,111,122)
Cases = c(7,2,4,2,3,5,1,2,3,2,4,1)
df = data.frame(Colour, Volume, Cases)
"red"
如果颜色是OR"blue"
但如果体积相同,我想总结案例。应保留那些未指定的颜色。如果红色和蓝色因为不同而无法总结,Volume
那么它们也应该保留
结果应该是这样的:
Colour = c("red_or_blue","red_or_blue","yellow","green","red_or_blue","green","red","yellow","blue")
Volume = c(46,57,57,57,99,99,111,111,122)
Cases = c(9,6,3,5,3,3,2,4,1)
df_agg = data.frame(Colour, Volume, Cases)
我已经找到了一种方法,可以创建另一列,为"red_or_blue"
红色或蓝色的行分配一个,为其余行分配一个 x。然后我使用了聚合:
df$test = ifelse(df$Colour %in% c("red", "blue"),"red_or_blue","x")
df_agg = aggregate(df$Cases, list(df$Volume, df$test), sum)
它有效,但我发现这有点麻烦。有没有更方便的方法可以跳过创建额外的列?将来我需要总结第 57/99 卷的红色/蓝色和案例。拥有额外的列似乎使它变得更加棘手。
此外,如果它不是红色也不是蓝色,我没有设法让原始颜色被接管。我尝试过这种方式,但它不起作用:
df$test = ifelse(df$Colour %in% c("red", "blue"),"red_or_blue",df$Colour)
干杯,保罗