3

问题

我无法理解如何将列表转换为事务以通过先验算法进行进一步处理。我有一个有效的合成示例,而真实的(嗯,Foodmart 数据库的一个子集)不起作用;在系统级别上,它们对我来说看起来是一样的。请帮我将列表转换为交易对象。

系统设置

> version
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          0.2                         
year           2013                        
month          09                          
day            25                          
svn rev        63987                       
language       R                           
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing        

要复制的代码

有效的代码

> a_list <- list(
    c("a","b","c"),
    c("a","b"),
    c("a","b","d"),
    c("c","e"),
    c("c","e"),
    c("a","b","d","e")
)

> a_trans <- as(a_list,"transactions")

> summary(a_trans)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
5 columns (items) and a density of 0.5333333 
... and so on ...
2      b
3      c

> a_rules <- apriori(a_trans)

parameter specification:
confidence minval smax arem  aval originalSupport support minlen maxlen target   ext
... and so on ...
writing ... [17 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

不起作用的代码

> b_list <- list(
    c("PigTail Frozen Pepperoni Pizza","Bird Call Childrens Cold Remedy","Steady Silky Smooth Hair Conditioner","CDR Regular Coffee"),
    c("Horatio Graham Crackers","Excellent Apple Drink","Blue Medal Small Eggs","Cormorant Copper Cleaner","High Quality Copper Cleaner","Fast Apple Fruit Roll"),
    c("Toucan Canned Mixed Fruit","Landslide Salt","Gorilla Sour Cream","Hermanos Firm Tofu"),
    c("Swell Canned Mixed Fruit","Washington Diet Soda","Super Apple Jam","Plato Strawberry Preserves","Steady Whitening Toothpast","Steady Whitening Toothpast","Better Beef Soup","Hermanos Squash","Carrington Frozen Cheese Pizza","Fort West Fondue Mix","Best Choice Mini Donuts","Cormorant Copper Pot Scrubber","Ebony Cantelope","Denny D-Size Batteries","Akron Eyeglass Screwdriver"),
    c("Big Time Ice Cream Sandwich","Musial Mints","Portsmouth Imported Beer","CDR Vegetable Oil","Just Right Rice Soup","Carrington Frozen Peas","High Quality 100 Watt Lightbulb","Fort West Dried Dates"),
    c("Consolidated Tartar Control Toothpaste","Plato Tomato Sauce","Quick Seasoned Hamburger")
)

> b_trans <- as(b_list,"transactions")
Error in asMethod(object) : 
    can not coerce list with transactions with duplicated items

> summary(b_trans)
Error in summary(b_trans) : 
   error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'b_trans' not found

有趣的事

> duplicated(a_list)
[1] FALSE FALSE FALSE FALSE  TRUE FALSE

> duplicated(b_list)
[1] FALSE FALSE FALSE FALSE FALSE FALSE

任何想法为什么会发生这种神话般的(WTF)事情?

4

1 回答 1

3

joran 和 DWin 提到:

  • a_list 中字符向量的元素是唯一的。
  • b_list 的其中一个向量中有重复项。

它看起来如何。如果我将第二个“b”添加到 a_list2 的第一个向量中

> a_list2 <- list(
    c("a","b","b","c"),
    c("a","b"),
    c("a","b","d"),
    c("c","e"),
    c("c","e"),
    c("a","b","d","e")
)

在以下尝试转换数据时,我得到了错误

> a_trans2 <- as(a_list2,"transaction")
Error in as(a_list2, "transaction") : 
   no method or default for coercing “list” to “transaction”

似乎 b_list 在第四个向量中两次提到了“Steady Whitening Toothpast”。手动删除此重复项解决了该问题。

> b_trans2 <- as(b_list2,"transactions")
> summary(b_trans2)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
... and so on ...
2    Best Choice Mini Donuts
3           Better Beef Soup

谈到实际数据处理的解决方案,以下代码没有错误。

aggrData <- split(selData$product_name,selData$transaction_id)

listData <- list()
for (i in 1:length(aggrData)) {
    listData[[i]] <- as.character(aggrData[[i]][!duplicated(aggrData[[i]])])
}

trnsData <- as(listData,"transactions")

但是,以下行或使用其他参数的尝试都不会提供任何规则。

> rules <- apriori(trnsData)

parameter specification:
... and so on ...
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

然而,这是一个完全不同的故事。

于 2013-12-01T09:19:54.370 回答