sql - 用于查找表中最普遍的关联值的 SQL 查询

Question

我有一个简单的 SQL 表，它关联两个值，如下所示：

table1(column1 varchar (32), column2 varchar(32));

对于 column1 中的每个不同值，在它具有的值列表中，我想找到该表中出现次数最多的值。

一个例子来澄清：

假设我有以下值：

a1, b1
a2, b2
a3, b3
a4, b1
a3, b1
a3, b2
a5, b1
a6, b2

我希望的结果是：

a1, b1
a2, b2
a3, b1
a4, b1
a5, b1
a6, b2

因为b1并且b2在表中出现次数最多。

score 3 · Accepted Answer

这是一个很好的窗口函数应用程序。有不止一种方法可以接近它。这是一种方法。获取column2每行的频率。然后，使用对所有这些频率进行排名row_number()：

select column1, column2
from (select t.*,
             row_number() over (partition by column1 order by col2cnt desc) as seqnum
      from (select t.*, count(*) over (partition by column2) as col2cnt
            from t 
           ) t
     ) t
where seqnum = 1

最后一步（由最外层的查询完成）是选择一个排名最高的（即计数最高的）。

如果出现平局（即如果b2出现频率为b1），则此版本将选择任意值。

sql - 用于查找表中最普遍的关联值的 SQL 查询

1 回答 1

Related

Reference