我有一个列表,其中包含大约 100,000 次一起订购的项目,我已将它们粘贴到一列中,这样我就可以计算每个组合出现的次数。
4845 Curly Fries California Burger 1
4846 French Fries California Burger 1
4847 Hamburger California Burger 1
4848 $1 Fountain Drinks Curly Fries 1
4849 $1 Fountain Drinks Curly Fries 1
4850 California Burger Curly Fries 1
4851 Curly Fries Curly Fries 1
我探索了聚合函数,它给了我以下错误:
aggregate(t1$count,list(t1$pc), sum) <br>
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list? <br>
我也尝试过 ddply 的变体:
ddply(t1,t1$pc,transform,occurances=sum(t1$count))
但我得到这个错误
Error in UseMethod("as.quoted") :
no applicable method for 'as.quoted' applied to an object of class "c('matrix', 'list')"
我假设我得到了这个,因为我试图基本上按字符值“分组”。我也探索tapply
并recast
基于类似问题的答案,但无济于事。
我怎样才能得到这个组合计数?
供考虑,单独列出的项目示例(再次为格式问题道歉):
Var1 Var2 Var3
>2 Onion Rings Onion Rings 1
>3 Pineapple Cheddar Burger Onion Rings 1
>4 Onion Rings Pineapple Cheddar Burger 1
>5 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
>5 Onion Rings Onion Rings 1
>6 Pineapple Cheddar Burger Onion Rings 1
>7 Onion Rings Pineapple Cheddar Burger 1
>8 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
>9 Fountain Soda Fountain Soda 1
>10 French Fries Fountain Soda 1