问题标签 [apriori]

ruby - 查找仅在最后一项不同的所有频繁项集对

我正在尝试实现先验算法,并且在编写生成候选项目集的方法时遇到了麻烦。这是此功能的屏幕截图。 在此处输入图像描述

主要问题是第 2-5 行。我不知道如何获得 f1 和 f2。f1 和 f2 是最后一项不同的数组,f1 的最后一项小于 f2 的最后一项。

有人知道如何用 Ruby 编写这个吗?

python - Algorithms for Mining Tuples of Data on huge sample space

I read that Apriori algorithm is used to fetch association rules from the dataset like a set of tuples. It helps us to find the most frequent 1-itemsets, 2-itemsets and so-on. My problem is bit different. I have a dataset, which is a set of tuples, each of varying size - as follows :

(1, 234, 56, 32) (25, 4575, 575, 464, 234, 32) . . . different size tuples

The domain for entries is huge, which means that I cannot have a binary vector for each tuple, that tells me if item 'x' is present in tuple. Hence, I do not see Apriori algorithm fitting here.

My target is to answer questions like :

  1. Give me the ranked list of 5 numbers, that occur with 234 most of the time
  2. Give me the top 5 subsets of size 'k' that occur most frequently together

Requirements : Exact representation of numbers in output (not approximate), Domain of numbers can be thought of as 1 to 1 billion.

I have planned to use the simple counting methods, if no standard algorithm fits here. But, if you guys know about some algorithm that can help me, please let me know

data-mining - 决策树 vs 朴素贝叶斯 vs Apriori 算法和多元回归模型

这些算法有什么区别?决策树 - 朴素贝叶斯 - Apriori 算法 - 多元回归模型

algorithm - Apriori算法-A->B和B->A应用规则的区别

A->B 和 B->A 应用规则有什么区别..


T1 面包、果冻、黄油

T2 面包、黄油

T3 面包、黄油、牛奶

T4 啤酒、面包

T5 啤酒、牛奶



algorithm - Frequent Itemsets & Association Rules - Apriori Algorithm

I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,

It's best I explain the complication i'm having with an example:

Here is a transactional dataset:

The minsup for the above is 0.5 or 50%.

Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:


I then created my candidates for the second refinement (C2) and narrowed it down to:


This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1 and F2 or just F2? F1 to me aren't "sets".

I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:

It seems superfluous to put F1's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1 are indeed "frequent"?

weka - Weka - 如何删除所有值都丢失的属性?

我有一个 CSV 文件,其中包含用于市场篮子分析的数据。我已经成功地将文件导入到Weka,但是我发现有些属性没有任何值,即所有值都丢失了。Weka 不允许我对这些数据使用 Apriori 算法,所以我想知道是否有办法从导入的数据中删除这些属性。


algorithm - 数据挖掘:先验问题。最小支撑



我的小测试数据是 5 笔交易和 10 种产品。

我的大测试数据是 1100 万笔交易和大约 2700 种产品。

问题:最小支持和过滤非频繁项。假设我们对频率为 60% 或更高的项目感兴趣。 frequency = 0.60;

当我Min-support以 60% 的频率计算一个小数据集时,算法将删除所有购买次数少于 3 次的商品。Min-support = numberOfTransactions * frequency;


所以我开始把那架飞机越来越低,运行算法很多次。但甚至没有 5% 的人给出预期的结果。我必须将频率百分比降低到 0.0005 才能获得至少 50% 的第一次迭代中涉及的项目。




algorithm - 如何生成以下序列?


一般来说,给定一组 n 个数字,我必须找到 (n-1) 个数字的所有可能子集,并限制它们按字母顺序排列(数字按顺序排列)。


machine-learning - 给定物品列表,预测要出售的物品



假设对于 cx 客户,我们需要推荐产品,因为我们有 cx 从上述集合中购买的数据,并且我们运行 apriori 来找出推荐,但是对于大数据集,它非常慢?


python - Python:DIY将此“all_subsets”函数推广到任何大小的子集

数据关联规则矿实现一个玩具Apriori 算法,我需要一个函数来返回所有子集。

子集的长度由参数给出i我需要为任何i. 1 或 2的情况i是微不足道的,可以看到一般模式:一个长度的元组列表,i其中强加了顺序以防止重复。

