r - 如何在 R 中创建具有所有可能性的摘要数据框

Question

如果我有可能的小时数和可能的项目的向量：

possible.items = c(12,13,14,15,16)
possible.hours = 0:23

还有一些关于在购买这些商品的时间购买这些商品的客户的数据：

frame = data.frame(id=101:105, hour=c(0,0,0,1,1), item=c(12,14,12,12,15))

我将如何创建一个汇总数据框，其中每个可能的小时都有一行，项目组合中填充了我数据集中的相关行数？

我知道如何创建一个汇总数据框，但不知道如何创建一个包含不在我的原始数据集“框架”中的行：

summary = aggregate(id~hour+item, data=frame, FUN=length)

我还看到了一种获得所有可能组合的方法：

poss = merge(data.frame(hour=possible.hours), data.frame(item=possible.items), all=TRUE)

我不确定如何将两者结合起来。我也不知道我走的路是否正确。

我想得到一个看起来像这样的数据框：

hour item count
   0   12     2
   0   13     0
   0   14     1
   0   15     0
   0   16     0
   1   12     1
...
  23   16     0

score 3 · Accepted Answer

你快到了。合并hour并item给出你想要的。

使用poss和summary定义它们：

result <- merge(poss, summary, by=c('hour','item'),all=T)
names(result)[3] <- 'count'
result$count[is.na(result$count)] <- 0

> head(result)
  hour item count
1    0   12     2
2    0   13     0
3    0   14     1
4    0   15     0
5    0   16     0
6    1   12     1

如评论中（并在布兰登的回答中建议），expand.grid是生成所有组合的适当方法：

poss <- expand.grid(list(hour=0:23, item=12:16))

score 1 · Accepted Answer

这就是我使用 plyr 的方式

require(plyr)
purchases <- data.frame(id = 101:105, hour = c(0,0,0,1,1), item = c(12,14,12,12,15))
results.table <- merge(expand.grid(list(hour = 0:23, item = 12:16)), purchases, by = c('hour', 'item'), all = TRUE)
summary.table <- ddply(results.table, c("hour", "item"), summarise, count = length(na.omit(id)))

这样您就不需要先创建可能的.* 和汇总表，节省了几个步骤。

r - 如何在 R 中创建具有所有可能性的摘要数据框

2 回答 2

Related

Reference