23

假设我们有一个包含两列的表,一列包含一些人的姓名,另一列包含与每个人相关的一些值。一个人可以有多个值。每个值都有一个数字类型。问题是我们要从表中为每个人选择前 3 个值。如果一个人的值少于 3 个,我们选择该人的所有值。

如果表中没有重复,则可以通过本文提供的查询来解决此问题Select top 3 values from each group in a table with SQL。但是如果有重复,解决办法是什么?

例如,如果对于一个名字 John,他有 5 个与他相关的值。它们是 20,7,7,7,4。我需要按每个名称的值降序返回名称/值对,如下所示:

-----------+-------+
| name     | value |
-----------+-------+
| John     |    20 |
| John     |     7 |
| John     |     7 |
-----------+-------+

即使 John 有 3 个 7,也应该只为 John 返回 3 行。

4

6 回答 6

43

在许多现代 DBMS(例如 Postgres、Oracle、SQL-Server、DB2 和许多其他数据库)中,以下内容可以正常工作。它使用 CTE 和排名函数ROW_NUMBER(),这是最新 SQL 标准的一部分:

 WITH cte AS
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  )
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;

没有 CTE,只有ROW_NUMBER()

SELECT name, value, rn
FROM 
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  ) tmp 
WHERE rn <= 3
ORDER BY name, rn ; 

测试:


在 MySQL 和其他没有排名功能的 DBMS 中,必须使用派生表、相关子查询或自连接GROUP BY

(tid)假设是表的主键:

SELECT t.tid, t.name, t.value,              -- self join and GROUP BY
       COUNT(*) AS rn
FROM t
  JOIN t AS t2
    ON  t2.name = t.name
    AND ( t2.value > t.value
        OR  t2.value = t.value
        AND t2.tid <= t.tid
        )
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;


SELECT t.tid, t.name, t.value, rn
FROM
  ( SELECT t.tid, t.name, t.value,
           ( SELECT COUNT(*)                -- inline, correlated subquery
             FROM t AS t2
             WHERE t2.name = t.name
              AND ( t2.value > t.value
                 OR  t2.value = t.value
                 AND t2.tid <= t.tid
                  )
           ) AS rn
    FROM t
  ) AS t
WHERE rn <= 3
ORDER BY name, rn ;

MySQL中测试

于 2013-05-23T18:35:14.970 回答
0

尝试这个 -

CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);

WITH cte
AS (
SELECT NAME
    ,value
    ,ROW_NUMBER() OVER (
        PARTITION BY NAME ORDER BY (value) DESC
        ) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC
于 2013-05-24T01:34:24.307 回答
0

我打算否决这个问题。然而,我意识到它可能真的需要一个跨数据库的解决方案。

假设您正在寻找一种独立于数据库的方法来执行此操作,我能想到的唯一方法是使用相关子查询(或非等值连接)。这是一个例子:

select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3)

但是,您提到的每个数据库(我注意到,Hadoop 不是数据库)都有更好的方法来执行此操作。不幸的是,它们都不是标准的 SQL。

这是它在 SQL Server 中工作的示例:

with t as (
      select 1 as personid, 5 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 7 as val union all
      select 1 as personid, 8 as val
     )
select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3);
于 2013-05-23T17:50:51.487 回答
0

这适用于 MS SQL。应该适用于任何其他能够在 group by 或 over 子句(或等效项)中分配行号的 SQL 方言

if object_id('tempdb..#Data') is not null drop table #Data;
GO

create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Jack', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO

with D as (
SELECT
     name
    ,Value
    ,row_number() over (partition by name order by value desc) rn
From
    #Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc

Name    Value
Jack    5
Jane    30
Jane    21
Jane    18
John    20
John    7
John    7
于 2013-05-24T02:05:57.007 回答
0

如果您的结果集不是那么重,您可以为该问题编写一个存储过程(或匿名 PL/SQL 块),它迭代结果集并通过简单的比较算法找到最大的三个。

于 2013-05-23T18:57:24.543 回答
0

使用GROUP_CONCATFIND_IN_SET你可以做到这一点。检查SQLFIDDLE

SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
                             SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
                           FROM tbl t1
                           WHERE t1.name = t.name
                           GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc
于 2013-05-23T18:42:33.173 回答