4

我有一个列表,其中包含大约 100,000 次一起订购的项目,我已将它们粘贴到一列中,这样我就可以计算每个组合出现的次数。

4845   Curly Fries California Burger   1
4846   French Fries California Burger  1
4847   Hamburger California Burger     1
4848   $1 Fountain Drinks Curly Fries  1
4849   $1 Fountain Drinks Curly Fries  1
4850   California Burger Curly Fries   1
4851   Curly Fries Curly Fries         1

我探索了聚合函数,它给了我以下错误:

aggregate(t1$count,list(t1$pc), sum) <br>
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list? <br>

我也尝试过 ddply 的变体:

ddply(t1,t1$pc,transform,occurances=sum(t1$count))

但我得到这个错误

Error in UseMethod("as.quoted") : 
no applicable method for 'as.quoted' applied to an object of class "c('matrix', 'list')"

我假设我得到了这个,因为我试图基本上按字符值“分组”。我也探索tapplyrecast基于类似问题的答案,但无济于事。

我怎样才能得到这个组合计数?

供考虑,单独列出的项目示例(再次为格式问题道歉):

                   Var1                     Var2 Var3
>2               Onion Rings              Onion Rings    1
>3  Pineapple Cheddar Burger              Onion Rings    1
>4               Onion Rings Pineapple Cheddar Burger    1
>5  Pineapple Cheddar Burger Pineapple Cheddar Burger    1
>5              Onion Rings              Onion Rings     1
>6  Pineapple Cheddar Burger              Onion Rings    1
>7               Onion Rings Pineapple Cheddar Burger    1
>8  Pineapple Cheddar Burger Pineapple Cheddar Burger    1
>9             Fountain Soda            Fountain Soda    1
>10             French Fries            Fountain Soda    1
4

2 回答 2

4

table()功能在这里很有帮助:

with(t1, table(pc)) ## or equivalently table(t1$pc)

这假设pc是您要计算其出现次数的因子变量。(如果它不是一个因素,它将被强制为一个因素。)

于 2013-02-26T19:54:28.697 回答
1

您最初的方法非常接近我认为您想要的方法。将它们组合成一个因素肯定会奏效,前提是您以相同的顺序组合它们,这样您就不会得到“薯条,汉堡”和“汉堡,薯条”。

可能有一种更简单的方法可以做你想做的事,但我没有想到那是什么。不过,我认为这可以满足您的需求:

# Let's assume your data looks like this:
> df
                       Var1                      Var2 Var3
1               Onion Rings               Onion Rings    1
2  Pineapple Cheddar Burger               Onion Rings    1
3               Onion Rings  Pineapple Cheddar Burger    1
4  Pineapple Cheddar Burger  Pineapple Cheddar Burger    1
5               Onion Rings               Onion Rings    1
6  Pineapple Cheddar Burger               Onion Rings    1
7               Onion Rings  Pineapple Cheddar Burger    1
8  Pineapple Cheddar Burger  Pineapple Cheddar Burger    1
9             Fountain Soda             Fountain Soda    1
10             French Fries             Fountain Soda    1

# Now, for each row
#     1. sort the Var1 and Var2,
#     2. combine the sorted vars, and
#     3. convert them back into a factor

df$sortcomb <- as.factor(apply(df[,1:2], 1, function(x) paste(sort(x), collapse=", ")))

table(df$sortcomb) # then use table as per normal

ddply(df, .(sortcomb), summarize, count=length(sortcomb)) # or ddply
于 2013-02-27T01:26:50.973 回答