mysql - 是否可以在 mysql 语句中运行先验关联规则？

Question

数据库：

Transacation#   Items List
T1              butter
T1              jam
T2              butter
T3              bread
T3              ice cream
T4              butter
T4              jam

在上表中，是否可以在 mysql 语句中运行先验关联规则？

比如buys(T, butter)的支持-->buys(T, jam) = 50%

因为有 4 笔交易并且 T1、T4 满足“支持”规则。

我可以只使用一条sql语句来找出这样的结果吗？

score 1 · Accepted Answer

是的，您可以使用 SQL 来查找单个项目的支持。但是如果你想找到包含多个项目的项目集，那就很难了。

例如，如果您的交易包含多个项目，并且您想找到“牛奶”和“面包”一起出现的“果酱”的支持，那么最好使用像 Apriori 这样的算法，或者像 FPGrowth 这样更快的算法。

score 0 · Accepted Answer

对于您提供的样本数据，我达到了 66%？“黄油”有 3 笔交易，其中只有 2 笔交易包括“果酱”。

我使用了以下测试表。

create table transactions(
   trans_no     varchar(5)  not null
  ,item         varchar(20) not null
  ,primary key(trans_no, item)
);

insert into transactions(trans_no, item)
values ('T1', 'butter')
      ,('T1', 'jam')
      ,('T2', 'butter')
      ,('T3', 'bread')
      ,('T3', 'ice cream')
      ,('T4', 'butter')
      ,('T4', 'jam');

以下是我的回答尝试。内部选择查找所有包含“黄油”的交易。对于每个这样的交易，它还设置一个标志（bought_jam），说明该交易是否还包括“jam”。（having 子句不包括包含“jam”但不包含“butter”的交易）。
在外层选择中，我基本上对所有的行进行计数（计数对应于包括黄油在内的事务数），并对 jam flag 求和，它对应于包括黄油和 jam 的事务数。

select sum(bought_jam) as jams_bought
      ,count(*) as num_trans
      ,100 * sum(bought_jam) / count(*) as correlation_pct
  from (select trans_no
              ,max(case when item = 'jam' then 1 else 0 end) as bought_jam
          from transactions
         where item in('butter', 'jam')
         group 
            by trans_no
        having min(case when item = 'butter' then item end) = 'butter'
       ) butter_trans;

上面的查询给出以下结果：

+-------------+-----------+-----------------+
| jams_bought | num_trans | correlation_pct |
+-------------+-----------+-----------------+
|           2 |         3 |         66.6667 |
+-------------+-----------+-----------------+
1 row in set (0.00 sec)

让我知道这对你有什么影响。

编辑：
以下查询会给出相同的结果，但更容易阅读。但是，如果 transactions 表非常大，并且item = x不是很有选择性（返回很多行），这个查询几乎肯定会更慢。

select count(t2.trans_no) as jams_bought
      ,count(*) as num_trans
      ,count(t2.trans_no) / count(*) as correlation_pct
  from transactions t1
  left join transactions t2 on(t2.trans_no = t1.trans_no and t2.item = 'jam')
 where t1.item = 'butter';

mysql - 是否可以在 mysql 语句中运行先验关联规则？

2 回答 2

Related

Reference