问题:
arules包的apriori函数从输入事务中推断关联规则,并报告每个规则的支持度、置信度和提升度。关联规则来源于频繁项集。我想获得输入事务中最频繁的项集。具体来说,我想获得具有给定最小支持的所有项目集。项集的支持度是包含该项集的事务数与事务总数的比值。
要求:
- 我强烈希望从apriori函数的中间结果中找到最频繁的项集。也就是说,我不希望仅仅为了计算最频繁的项集而从头开始编写程序,因为apriori函数已经将其计算为中间步骤。尽管如此,如果真的没有一种合理的方式来访问apriori函数的中间结果,我愿意接受其他解决方案。
- 我宁愿不对apriori函数的结果进行字符串操作,因为这种方法将过于依赖apriori函数结果的字符串表示。同样,如果事实证明没有更好的选择,我可能会采用这种方法。
- 我知道arules包
itemFrequency
提供的功能。不幸的是,这个函数只报告带有单个项目的项目集。我对具有最低支持的任何长度的所有项目集感兴趣。 - 我希望输出按支持数字排序,然后按项目集按字典顺序排序。
示例输入:
a,b
a,b,c
程序:
# The following is how I'm using apriori to infer the association rules.
library(package = "arules")
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",")
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001))
WRITE(rules, file = "", sep = ",", quote = TRUE, col.names = NA)
电流输出:
"","rules","support","confidence","lift"
"1","{} => {c}",0.5,0.5,1
"2","{} => {b}",1,1,1
"3","{} => {a}",1,1,1
"4","{c} => {b}",0.5,1,1
"5","{b} => {c}",0.5,0.5,1
"6","{c} => {a}",0.5,1,1
"7","{a} => {c}",0.5,0.5,1
"8","{b} => {a}",1,1,1
"9","{a} => {b}",1,1,1
"10","{b,c} => {a}",0.5,1,1
"11","{a,c} => {b}",0.5,1,1
"12","{a,b} => {c}",0.5,0.5,1
期望的输出:
"itemset","support"
"{a}",1
"{a,b}",1
"{b}",1
"{a,b,c}",0.5
"{a,c}",0.5
"{b,c}",0.5
"{c}",0.5