algorithm - Frequent Itemsets & Association Rules - Apriori Algorithm

Question

I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,

It's best I explain the complication i'm having with an example:

Here is a transactional dataset:

t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes

The minsup for the above is 0.5 or 50%.

Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:

F1:

Milk = 4
Chicken = 4
Beer = 5
Cheese = 4

I then created my candidates for the second refinement (C2) and narrowed it down to:

F2:

{Milk, Beer} = 4

This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1 and F2 or just F2? F1 to me aren't "sets".

I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:

Milk -> Beer = 100% confidence
Beer -> Milk = 80% confidence

It seems superfluous to put F1's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1 are indeed "frequent"?

score 2 · Accepted Answer

如果支持合适的话，大小为 1 的项集被认为是频繁的。但在这里你必须考虑最小阈值。例如，如果您的示例中的最小阈值为2，则F1不会考虑。但如果最小阈值是1，那么你必须这样做。

您可以在这里和这里查看更多想法和示例。

希望我有所帮助。

score 0 · Accepted Answer

如果最小支持阈值 (minsup) 是 4 / 7，那么如果它们出现在 7 个事务中不少于 4 个事务中，则应该在频繁项集中包含单个项目。因此，在您的示例中，您应该包括它们：

牛奶 = 4 鸡肉 = 4 啤酒 = 5 奶酪 = 4

对于关联规则，它们具有 X ==> Y 的形式，其中 X 和 Y 是不相交的项集，并且通常假设 X 和 Y 不是空集（这是 Apriori 假设的）。因此，您至少需要两个项目来生成关联规则。

algorithm - Frequent Itemsets & Association Rules - Apriori Algorithm

2 回答 2

Related

Reference