sql - 用于返回 Group By 中多个列的“最常见值”的 SQL 函数

Question

我正在寻找最简单的方法来返回分组的 select 语句的多列结果中最常见的值。我在网上找到的所有内容都指向选择中单个项目的排名，或者在 GROUP BY 之外单独处理每一列。

样本数据：

SELECT 100 as "auser", 
'A' as "instance1", 'M' as "instance2" 
union all select 100, 'B', 'M' 
union all select 100,'C', 'N' 
union all select 100, 'B', 'O'
union all select 200,'D', 'P' 
union all select 200, 'E', 'P' 
union all select 200,'F', 'P' 
union all select 200, 'F', 'Q'

样本数据结果：

auser   instance1   instance2
100     A           M
100     B           M
100     C           N
100     B           O
200     D           P
200     E           P
200     F           P
200     F           Q

查询逻辑（我如何在脑海中看到它）：

SELECT auser, most_common(instance1), most_common(instance2)
FROM datasample
GROUP BY auser;

期望的结果：

100     B           M
200     F           P

score 4 · Accepted Answer

这种解决此问题的方法使用嵌套窗口函数。最里面的子查询计算每列的计数。下一个子查询对这些进行排名（使用row_number()）。然后，外部查询使用条件聚合来获得您想要的结果：

select auser, MAX(case when seqnum1 = 1 then instance1 end),
       MAX(case when seqnum2 = 1 then instance2 end)
from (select t.*,
             ROW_NUMBER() over (partition by auser order by cnt1 desc) as seqnum1,
             ROW_NUMBER() over (partition by auser order by cnt2 desc) as seqnum2
      from (select t.*,
                   count(*) over (partition by auser, instance1) as cnt1,
                   COUNT(*) over (partition by auser, instance2) as cnt2
            from t
           ) t
     ) t
group by auser

score 1 · Accepted Answer

我不确定是否能找到更优雅的东西，但如果您使用的是 SQL 2005+（因为我使用的是排名函数和CTE），这可能会做到：

with instance1 as (
    select auser, instance1
        , row_number() over (partition by auser order by count(*) desc, instance1) as row_num
    from datasample
    group by auser, instance1
), instance2 as (
    select auser, instance2
        , row_number() over (partition by auser order by count(*) desc, instance2) as row_num
    from datasample
    group by auser, instance2
)
select a.auser, a.instance1, b.instance2
from instance1 as a 
    join instance2 as b on a.auser = b.auser
where a.row_num = 1
    and b.row_num = 1
order by a.auser;

我不确定您希望如何处理空值，并且将 row_num 等效项移动到连接条件不会更改我的盒子上的执行计划。

如果您使用的是 SQL Server 2000，那么您可以用派生表替换这些 CTE，并通过使用 count 和“三角连接”来伪造 row_number() 。

score 0 · Accepted Answer

只需简单

Select auser, instance1, instance2 FROM datasample GROUP BY auser,instance1, instance2 ;

sql - 用于返回 Group By 中多个列的“最常见值”的 SQL 函数

3 回答 3

Related

Reference