2

尝试arulesSequences在 R 中使用包。遇到这个问题我看到很多人遇到但没有好的答案:从数据框或矩阵到事务数据类型。

正如文档明确指出的那样,我已经为 arules 做到了这一点:

a_df3 <- data.frame(TID = c(1,1,2,2,2,3), item=c("a","b","a","b","c", "b"))
a_df3
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")

工作正常。但是,如果我尝试对 3 列数据框做同样的事情,一切都会变得混乱:

a_df4<-data.frame(SEQUENCEID=c("1","1","1","2","2","3","3"),
                  EVENTID=c("1","2","3","1","2","1","2"),
                  ITEM=c("a","b","a","c","a","a","b"))
a_df4
   SEQUENCEID EVENTID ITEM
1    1         1      a
2    1         2      b
3    1         3      a
4    2         1      c
5    2         2      a
6    3         1      a
7    3         2      b

是的,有重复,但这正是重点,不是吗?(寻找频繁的序列集)。

所以,现在我像这样强制:

seqt<-as(split(myseq[,"ITEM"],myseq[,"SEQUENCEID"],myseq[,"EVENTID"]),"transactions")

我得到:

Error in asMethod(object) : 
   can not coerce list with transactions with duplicated items

我一直在试图通过这个简单的障碍:

  1. 更改拆分顺序
  2. 把一切都变成因素
  3. 把一切都变成矩阵
  4. 像这样直接将数据框输入到 arules 函数中
  5. 导出为 .txt,导入为 read.transactions
  6. 导出为 .txt,作为“篮子”导入
  7. 尝试“解决方案”:hereherehere(read_baskets 是一个函数?)

所有错误都是上述错误,或者当我没有得到任何错误时,我得到一个包含两列的事务对象,因为它需要三列,所以当然无法读取arulesSequences:1) SEQUENCE-ID、EVENT-ID、ITEMS。

我认为我的数据库结构再清晰不过了。序列是“客户编号”,事件 ID 是购买编号和商品,嗯,商品。

请感谢任何帮助,包括“as()”希望看到的结构,以便它正确执行强制。

4

3 回答 3

2

尝试这个:

trans4 <- as(a_df3[,"item"], "transactions")
trans4@itemsetInfo$sequnceID = a_df3$SEQUENCEID
trans4@itemsetInfo$eventID = a_df3$EVENTID

transSeq = as(trans4, "timedsequences")
于 2017-06-20T22:13:50.260 回答
0

arules treats transactions as sets not as sequences.

It can detect frequent itemsets but probably not sequences.

Checking for duplicates is a safeguard against using it incorrectly: it ignores multiplicity and sequence, so having more than one item of the same kind is lost information.

The transactions class represents transaction data used for mining itemsets or rules. It is a direct extension of class itemMatrix to store a binary incidence matrix, item labels, and optionally transaction IDs and user IDs.

(from the documentation, emphasis added)

于 2015-02-24T08:44:03.483 回答
0

这个问题已经有一段时间被问到了,但无论如何我都会尝试回答。该错误似乎是因为有以下类型的相同记录

  SEQUENCEID EVENTID ITEM
1    1         1      a
3    1         1      a
4    2         1      c 

如果您在拆分和转换为事务之前检查不同的记录,这可能会解决问题。

于 2015-08-27T18:01:48.453 回答