r - 有效使用 R data.table 和 unique()

Question

是否有比以下更有效的查询

DT[, list(length(unique(OrderNo)) ),customerID]

使用客户 ID、订单号和产品行项目细化 LONG 格式表，这意味着如果客户在该交易中购买了超过 1 个项目，则将存在具有相同订单 ID 的重复行。

尝试制定独特的购买方式。length()按客户 ID 计算所有订单 ID，包括重复项，仅查找唯一编号。

从这里编辑：

这是一些虚拟代码。理想情况下，我正在寻找的是使用unique().

df <- data.frame(
             customerID=as.factor(c(rep("A",3),rep("B",4))),
             product=as.factor(c(rep("widget",2),rep("otherstuff",5))),
             orderID=as.factor(c("xyz","xyz","abd","qwe","rty","yui","poi")),
             OrderDate=as.Date(c("2013-07-01","2013-07-01","2013-07-03","2013-06-01","2013-06-02","2013-06-03","2013-07-01"))
             )

DT.eg <- as.data.table(df)
#Gives unique order counts
DT.eg[, list(orderlength = length(unique(orderID)) ),customerID]
#Gives counts of all orders by customer
DT.eg[,.SD, keyby=list(orderID, customerID)][, .N, by=customerID]

         ^
         |
  This should be .N, not .SD  ~ R.S.

score 13 · Accepted Answer

如果您要计算每个客户的唯一购买次数，请使用

 DT[, .N, keyby=list(customerId, OrderNo)][, .N, by=customerId]

score 1 · Accepted Answer

从 1.9.6 版开始（CRAN 2015 年 9 月 19 日），data.table获得了uniqueN()相当于length(unique(x))但更快的辅助功能（根据data.tableNEWS）。

有了这个，

DT.eg[, list(orderlength = length(unique(orderID)) ),customerID]

和

DT.eg[,.N, keyby=list(orderID, customerID)][, .N, by=customerID]

可以改写为

DT.eg[, .(orderlength = uniqueN(orderID)), customerID]

   customerID orderlength
1:          A           2
2:          B           4

r - 有效使用 R data.table 和 unique()

从这里编辑：

2 回答 2

Related

Reference