我正在尝试生成所有用于先验的 k 项集,我正在遵循以下伪代码:
L1= {frequent items};
for (k= 2; Lk-1 !=∅; k++) do begin
Ck= candidates generated from Lk-1 (that is: cartesian product Lk-1 x Lk-1 and eliminating any
k-1 size itemset that is not frequent);
for each transaction t in database do
increment the count of all candidates in
Ck that are contained in t
Lk = candidates in Ck with min_sup
return U_k Lk;
-- d transactions, threshold
kItemSets d thresh = kItemSets' 2 $ frequentItems d thresh
kItemSets' _ [] = [[]]
kItemSets' k t = ck ++ (kItemSets' (k+1) ck)
-- those (k-1) length sets that meet the threshold of being a subset of the transactions in d
ck = filter (\x->(countSubsets x d) >= thresh) $ combinations k t
-- length n combinations that can be made from xs
combinations 0 _ = [[]]
combinations _ [] = []
combinations n xs@(y:ys)
| n < 0 = []
| otherwise = case drop (n-1) xs of
[ ] -> []
[_] -> [xs]
_ -> [y:c | c <- combinations (n-1) ys]
++ combinations n ys
-- those items of with frequency o in the dataset
frequentItems xs o = [y| y <- nub cs, x<-[count y cs], x >= o]
cs = concat xs
isSubset a b = not $ any (`notElem` b) a
-- Count how many times the list y appears as a subset of a list of lists xs
countSubsets y xs = length $ filter (isSubset y ) xs
count :: Eq a => a -> [a] -> Int
count x [] = 0
count x (y:ys) | x == y = 1+(count x ys)
| otherwise = count x ys
transactions =[["Butter", "Biscuits", "Cream", "Newspaper", "Bread", "Chocolate"],
["Cream", "Newspaper", "Tea", "Oil", "Chocolate"] ,
["Chocolate", "Cereal", "Bread"],
["Chocolate", "Flour", "Biscuits", "Newspaper"],
["Chocolate", "Biscuits", "Newspaper"] ]
Occurs check: cannot construct the infinite type: a0 = [a0]
Expected type: [a0]
Actual type: [[a0]]
In the second argument of kItemSets', namely `ck'
In the second argument of `(++)', namely `(kItemSets' (k + 1) ck)'
Failed, modules loaded: none.
*Main> mapM_ print $ filter (\x->(countSubsets x transactions ) >= 2 ) $ combinations 2 $ frequentItems transactions 2
这是正确的,因为在交易集中满足发生阈值的是那些 2 项集。但我需要的 3 件套是
[["Biscuits", "Chocolate", "Newspaper" ],
["Chocolate", "Cream", "Newspaper"]]
并将其附加到 2 项集列表中。我将如何更改我当前的代码来实现这一点?我知道它可以从 2 件套装中构建,但我不知道如何去做。