R Basket analysis using arules package with unique order number but duplicate order combinations
Just learning R. I'm trying to do a basket analysis using the arules package (but I'm totally open to any other package suggestions!) to compare all possible combinations of 6 different item types being purchased.
My original data set looked like this:
OrderNo, ItemType, ItemCount
111, Health, 1
111, Leisure, 2
111, Sports, 1
222, Health, 3
333, Food, 7
333, Clothing, 1
444, Clothing, 2
444, Health, 1
444, Accessories, 2
. . .
the list goes on and has about 3,000 observations.
I collapsed the data into a matrix that contains one row for each unique order containing counts of specific ItemType:
OrderNo, Accessories, Clothing, Food, Health, Leisure, Sports
111, 0, 0, 0, 1, 2, 1
222, 0, 0, 0, 3, 0, 0
333, 0, 1, 7, 0 , 0, 0
444, 2, 2, 0, 1, 0, 0
. . .
Every time I try to read in the transactions using the following command (and a million attempted variations of it):
tr <- read.transactions("dataset.csv", rm.duplicates=FALSE, format="basket", sep=",")
I get the error message: Error in asMethod(object): can not coerce list with transactions with duplicated items.
I'm assuming this is because I have 3,000 observations and inevitably certain combinations are going to show up more than once (i.e., more than one person is purchasing only one piece of Clothing and nothing else: OrderNo, 0, 1, 0, 0, 0, 0). I know I could collapse the data set on counts of unique combinations, but I'm worried that if I do that, there will be no weights to show the most frequent combinations.
I thought that using format="basket" would account for different orders containing the same item combinations, but apparently that's not the case. I'm so lost. All the documentation I've read implies that this is possible but I can't find any examples or advice on how to approach the problem.
Any advice would be so appreciated! My head is spinning on this one.
Extra info: For my end result, I'm looking to get the top five most significant combinations of purchase combinations. I don't know if that helps.