我必须在 R 中执行关联规则,我在这里找到了示例
http://www.salemmarafi.com/code/market-basket-analysis-with-r/
在此示例中他们使用data(Groceries)
但他们提供了原始数据集 Groceries.csv
structure(list(chocolate = structure(c(9L, 13L, 1L, 8L, 16L,
2L, 14L, 11L, 7L, 15L, 17L, 5L, 10L, 4L, 3L, 6L, 2L, 18L, 12L
), .Label = c("bottled water", "canned beer", "chicken,citrus fruit,tropical fruit,root vegetables,whole milk,frozen fish,rollsbuns",
"chicken,pip fruit,other vegetables,whole milk,dessert,yogurt,whippedsour cream,rollsbuns,pasta,soda,waffles",
"citrus fruit,pip fruit,root vegetables,other vegetables,whole milk,cream cheese ,domestic eggs,brown bread,margarine,baking powder,waffles",
"frankfurter,citrus fruit,onions,other vegetables,whole milk,rollsbuns,sugar,soda",
"frankfurter,rollsbuns,bottled water,fruitvegetable juice,hygiene articles",
"frankfurter,sausage,butter,whippedsour cream,rollsbuns,margarine,spices",
"fruitvegetable juice", "hamburger meat,other vegetables,whole milk,curd,yogurt,rollsbuns,pastry,semi-finished bread,margarine,bottled water,fruitvegetable juice",
"meat,citrus fruit,berries,root vegetables,whole milk,soda",
"packaged fruitvegetables,whole milk,curd,yogurt,domestic eggs,brown bread,mustard,pickled vegetables,bottled water,misc. beverages",
"pickled vegetables,coffee", "root vegetables", "tropical fruit,margarine,rum",
"tropical fruit,pip fruit,onions,other vegetables,whole milk,domestic eggs,sugar,soups,tea,soda,hygiene articles,napkins",
"tropical fruit,root vegetables,herbs,whole milk,butter milk,whippedsour cream,flour,hygiene articles",
"turkey,pip fruit,salad dressing,pastry"), class = "factor")), .Names = "chocolate", class = "data.frame", row.names = c(NA,
-19L))
我加载这些数据
g=read.csv("g.csv",sep=";")
所以我必须将其转换为 arule 要求的交易
#'@importClassesFrom arules transactions
trans = as(g, "transactions")
让我们检查数据(杂货)
> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
>
和我从原始 csv 转换的数据
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
.. .. ..@ p : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
.. .. ..@ Dim : int [1:2] 7011 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 7011 obs. of 3 variables:
.. ..$ labels : chr [1:7011] "tr=abrasive cleaner" "tr=abrasive cleaner,napkins" "tr=artif. sweetener" "tr=artif. sweetener,coffee" ...
.. ..$ variables: Factor w/ 1 level "tr": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ levels : Factor w/ 7011 levels "abrasive cleaner",..: 1 2 3 4 5 6 7 8 9 10 ...
..@ itemsetInfo:'data.frame': 9835 obs. of 1 variable:
.. ..$ transactionID: chr [1:9835] "1" "2" "3" "4" ...
>
我们在数据中看到(杂货)
transactions in sparse format with
9835 transactions (rows) and
169 items (columns)
在我的反式数据中
9835 transactions (rows) and
7011 items (columns)
即我从 Groceries.csv 获得 7011 列,同时在嵌入式示例中(169 列)
为什么会这样?此文件如何正确转换。我必须理解它,因为,我无法处理我的文件
我尝试找到类似的主题,但这两个帖子对我没有帮助 How to prep transaction data into basket for arules R (arules) Convert dataframe into transactions and remove NA