我需要找出客户的等级。在这里,我为我的要求添加了相应的 ANSI 标准 SQL 查询。请帮我把它转换成 MySQL 。
SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS [Partition by Gender],
FirstName,
Age,
Gender
FROM Person
是否有任何功能可以在 MySQL 中找出排名?
一种选择是使用排名变量,例如:
SELECT first_name,
age,
gender,
@curRank := @curRank + 1 AS rank
FROM person p, (SELECT @curRank := 0) r
ORDER BY age;
该(SELECT @curRank := 0)
部分允许变量初始化而不需要单独的SET
命令。
测试用例:
CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));
INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
结果:
+------------+------+--------+------+
| first_name | age | gender | rank |
+------------+------+--------+------+
| Kathy | 18 | F | 1 |
| Jane | 20 | F | 2 |
| Nick | 22 | M | 3 |
| Bob | 25 | M | 4 |
| Anne | 25 | F | 5 |
| Jack | 30 | M | 6 |
| Bill | 32 | M | 7 |
| Steve | 36 | M | 8 |
+------------+------+--------+------+
8 rows in set (0.02 sec)
这是一个通用解决方案,将分区上的密集等级分配给行。它使用用户变量:
CREATE TABLE person (
id INT NOT NULL PRIMARY KEY,
firstname VARCHAR(10),
gender VARCHAR(1),
age INT
);
INSERT INTO person (id, firstname, gender, age) VALUES
(1, 'Adams', 'M', 33),
(2, 'Matt', 'M', 31),
(3, 'Grace', 'F', 25),
(4, 'Harry', 'M', 20),
(5, 'Scott', 'M', 30),
(6, 'Sarah', 'F', 30),
(7, 'Tony', 'M', 30),
(8, 'Lucy', 'F', 27),
(9, 'Zoe', 'F', 30),
(10, 'Megan', 'F', 26),
(11, 'Emily', 'F', 20),
(12, 'Peter', 'M', 20),
(13, 'John', 'M', 21),
(14, 'Kate', 'F', 35),
(15, 'James', 'M', 32),
(16, 'Cole', 'M', 25),
(17, 'Dennis', 'M', 27),
(18, 'Smith', 'M', 35),
(19, 'Zack', 'M', 35),
(20, 'Jill', 'F', 25);
SELECT person.*, @rank := CASE
WHEN @partval = gender AND @rankval = age THEN @rank
WHEN @partval = gender AND (@rankval := age) IS NOT NULL THEN @rank + 1
WHEN (@partval := gender) IS NOT NULL AND (@rankval := age) IS NOT NULL THEN 1
END AS rnk
FROM person, (SELECT @rank := NULL, @partval := NULL, @rankval := NULL) AS x
ORDER BY gender, age;
请注意,变量赋值被放置在CASE
表达式内。这(理论上)负责评估问题的顺序。添加它IS NOT NULL
以处理数据类型转换和短路问题。
PS:通过删除所有检查平局的条件,可以轻松地将其转换为分区上的行号。
| id | firstname | gender | age | rank |
|----|-----------|--------|-----|------|
| 11 | Emily | F | 20 | 1 |
| 20 | Jill | F | 25 | 2 |
| 3 | Grace | F | 25 | 2 |
| 10 | Megan | F | 26 | 3 |
| 8 | Lucy | F | 27 | 4 |
| 6 | Sarah | F | 30 | 5 |
| 9 | Zoe | F | 30 | 5 |
| 14 | Kate | F | 35 | 6 |
| 4 | Harry | M | 20 | 1 |
| 12 | Peter | M | 20 | 1 |
| 13 | John | M | 21 | 2 |
| 16 | Cole | M | 25 | 3 |
| 17 | Dennis | M | 27 | 4 |
| 7 | Tony | M | 30 | 5 |
| 5 | Scott | M | 30 | 5 |
| 2 | Matt | M | 31 | 6 |
| 15 | James | M | 32 | 7 |
| 1 | Adams | M | 33 | 8 |
| 18 | Smith | M | 35 | 9 |
| 19 | Zack | M | 35 | 9 |
虽然最受好评的答案排名,但它不会分区,您也可以进行自我加入以将整个事情分区:
SELECT a.first_name,
a.age,
a.gender,
count(b.age)+1 as rank
FROM person a left join person b on a.age>b.age and a.gender=b.gender
group by a.first_name,
a.age,
a.gender
用例
CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));
INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
答案:
Bill 32 M 4
Bob 25 M 2
Jack 30 M 3
Nick 22 M 1
Steve 36 M 5
Anne 25 F 3
Jane 20 F 2
Kathy 18 F 1
Daniel 版本的调整,用于计算百分位数和排名。两个分数相同的人也将获得相同的排名。
set @totalStudents = 0;
select count(*) into @totalStudents from marksheets;
SELECT id, score, @curRank := IF(@prevVal=score, @curRank, @studentNumber) AS rank,
@percentile := IF(@prevVal=score, @percentile, (@totalStudents - @studentNumber + 1)/(@totalStudents)*100),
@studentNumber := @studentNumber + 1 as studentNumber,
@prevVal:=score
FROM marksheets, (
SELECT @curRank :=0, @prevVal:=null, @studentNumber:=1, @percentile:=100
) r
ORDER BY score DESC
样本数据的查询结果 -
+----+-------+------+---------------+---------------+-----------------+
| id | score | rank | percentile | studentNumber | @prevVal:=score |
+----+-------+------+---------------+---------------+-----------------+
| 10 | 98 | 1 | 100.000000000 | 2 | 98 |
| 5 | 95 | 2 | 90.000000000 | 3 | 95 |
| 6 | 91 | 3 | 80.000000000 | 4 | 91 |
| 2 | 91 | 3 | 80.000000000 | 5 | 91 |
| 8 | 90 | 5 | 60.000000000 | 6 | 90 |
| 1 | 90 | 5 | 60.000000000 | 7 | 90 |
| 9 | 84 | 7 | 40.000000000 | 8 | 84 |
| 3 | 83 | 8 | 30.000000000 | 9 | 83 |
| 4 | 72 | 9 | 20.000000000 | 10 | 72 |
| 7 | 60 | 10 | 10.000000000 | 11 | 60 |
+----+-------+------+---------------+---------------+-----------------+
丹尼尔和萨勒曼答案的结合。但是排名不会给出,因为存在并列的序列。相反,它会跳过排名到下一个。所以最大值总是达到行数。
SELECT first_name,
age,
gender,
IF(age=@_last_age,@curRank:=@curRank,@curRank:=@_sequence) AS rank,
@_sequence:=@_sequence+1,@_last_age:=age
FROM person p, (SELECT @curRank := 1, @_sequence:=1, @_last_age:=0) r
ORDER BY age;
架构和测试用例:
CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));
INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
INSERT INTO person VALUES (9, 'Kamal', 25, 'M');
INSERT INTO person VALUES (10, 'Saman', 32, 'M');
输出:
+------------+------+--------+------+--------------------------+-----------------+
| first_name | age | gender | rank | @_sequence:=@_sequence+1 | @_last_age:=age |
+------------+------+--------+------+--------------------------+-----------------+
| Kathy | 18 | F | 1 | 2 | 18 |
| Jane | 20 | F | 2 | 3 | 20 |
| Nick | 22 | M | 3 | 4 | 22 |
| Kamal | 25 | M | 4 | 5 | 25 |
| Anne | 25 | F | 4 | 6 | 25 |
| Bob | 25 | M | 4 | 7 | 25 |
| Jack | 30 | M | 7 | 8 | 30 |
| Bill | 32 | M | 8 | 9 | 32 |
| Saman | 32 | M | 8 | 10 | 32 |
| Steve | 36 | M | 10 | 11 | 36 |
+------------+------+--------+------+--------------------------+-----------------+
从 MySQL 8 开始,您终于可以在 MySQL 中使用窗口函数: https ://dev.mysql.com/doc/refman/8.0/en/window-functions.html
您的查询可以以完全相同的方式编写:
SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS `Partition by Gender`,
FirstName,
Age,
Gender
FROM Person
@Sam,你的观点在概念上很好,但我认为你误解了 MySQL 文档在引用页面上所说的内容——或者我误解了:-)——我只是想添加这个,以便如果有人对 @ 感到不舒服丹尼尔的回答他们会更放心,或者至少会更深入一点。
你看"@curRank := @curRank + 1 AS rank"
里面SELECT
不是“一个语句”,它是语句的一个“原子”部分,所以它应该是安全的。
您引用的文档继续显示在语句的 2 个(原子)部分中存在相同用户定义变量的示例,例如"SELECT @curRank, @curRank := @curRank + 1 AS rank"
.
有人可能会争辩说@curRank
,@Daniel 的回答中使用了两次:(1) the"@curRank := @curRank + 1 AS rank"
和 (2) the"(SELECT @curRank := 0) r"
但是由于第二种用法是该FROM
子句的一部分,我很确定它肯定会首先被评估;本质上使它成为第二个和前面的声明。
事实上,在您引用的同一个 MySQL 文档页面上,您会在评论中看到相同的解决方案——这可能是 @Daniel 的来源;是的,我知道这是评论,但它是官方文档页面上的评论,确实有一定的分量。
确定给定值排名的最直接解决方案是计算它之前的值的数量。假设我们有以下值:
10 20 30 30 30 40
30
值都被认为是第三40
值都被视为第 6(等级)或第 4(密集等级)现在回到最初的问题。以下是一些示例数据,按照 OP 中的描述进行排序(在右侧添加了预期排名):
+------+-----------+------+--------+ +------+------------+
| id | firstname | age | gender | | rank | dense_rank |
+------+-----------+------+--------+ +------+------------+
| 11 | Emily | 20 | F | | 1 | 1 |
| 3 | Grace | 25 | F | | 2 | 2 |
| 20 | Jill | 25 | F | | 2 | 2 |
| 10 | Megan | 26 | F | | 4 | 3 |
| 8 | Lucy | 27 | F | | 5 | 4 |
| 6 | Sarah | 30 | F | | 6 | 5 |
| 9 | Zoe | 30 | F | | 6 | 5 |
| 14 | Kate | 35 | F | | 8 | 6 |
| 4 | Harry | 20 | M | | 1 | 1 |
| 12 | Peter | 20 | M | | 1 | 1 |
| 13 | John | 21 | M | | 3 | 2 |
| 16 | Cole | 25 | M | | 4 | 3 |
| 17 | Dennis | 27 | M | | 5 | 4 |
| 5 | Scott | 30 | M | | 6 | 5 |
| 7 | Tony | 30 | M | | 6 | 5 |
| 2 | Matt | 31 | M | | 8 | 6 |
| 15 | James | 32 | M | | 9 | 7 |
| 1 | Adams | 33 | M | | 10 | 8 |
| 18 | Smith | 35 | M | | 11 | 9 |
| 19 | Zack | 35 | M | | 11 | 9 |
+------+-----------+------+--------+ +------+------------+
RANK() OVER (PARTITION BY Gender ORDER BY Age)
要计算Sarah,您可以使用以下查询:
SELECT COUNT(id) + 1 AS rank, COUNT(DISTINCT age) + 1 AS dense_rank
FROM testdata
WHERE gender = (SELECT gender FROM testdata WHERE id = 6)
AND age < (SELECT age FROM testdata WHERE id = 6)
+------+------------+
| rank | dense_rank |
+------+------------+
| 6 | 5 |
+------+------------+
RANK() OVER (PARTITION BY Gender ORDER BY Age)
要计算所有行,您可以使用此查询:
SELECT testdata.id, COUNT(lesser.id) + 1 AS rank, COUNT(DISTINCT lesser.age) + 1 AS dense_rank
FROM testdata
LEFT JOIN testdata AS lesser ON lesser.age < testdata.age AND lesser.gender = testdata.gender
GROUP BY testdata.id
这是结果(在右侧添加了连接值):
+------+------+------------+ +-----------+-----+--------+
| id | rank | dense_rank | | firstname | age | gender |
+------+------+------------+ +-----------+-----+--------+
| 11 | 1 | 1 | | Emily | 20 | F |
| 3 | 2 | 2 | | Grace | 25 | F |
| 20 | 2 | 2 | | Jill | 25 | F |
| 10 | 4 | 3 | | Megan | 26 | F |
| 8 | 5 | 4 | | Lucy | 27 | F |
| 6 | 6 | 5 | | Sarah | 30 | F |
| 9 | 6 | 5 | | Zoe | 30 | F |
| 14 | 8 | 6 | | Kate | 35 | F |
| 4 | 1 | 1 | | Harry | 20 | M |
| 12 | 1 | 1 | | Peter | 20 | M |
| 13 | 3 | 2 | | John | 21 | M |
| 16 | 4 | 3 | | Cole | 25 | M |
| 17 | 5 | 4 | | Dennis | 27 | M |
| 5 | 6 | 5 | | Scott | 30 | M |
| 7 | 6 | 5 | | Tony | 30 | M |
| 2 | 8 | 6 | | Matt | 31 | M |
| 15 | 9 | 7 | | James | 32 | M |
| 1 | 10 | 8 | | Adams | 33 | M |
| 18 | 11 | 9 | | Smith | 35 | M |
| 19 | 11 | 9 | | Zack | 35 | M |
+------+------+------------+ +-----------+-----+--------+
如果您只想对一个人进行排名,您可以执行以下操作:
SELECT COUNT(Age) + 1
FROM PERSON
WHERE(Age < age_to_rank)
此排名对应于 oracle RANK 功能(如果您有相同年龄的人,他们将获得相同的排名,并且之后的排名是非连续的)。
它比在子查询中使用上述解决方案之一并从中选择来获得一个人的排名要快一点。
这可以用来对每个人进行排名,但它比上述解决方案慢。
SELECT
Age AS age_var,
(
SELECT COUNT(Age) + 1
FROM Person
WHERE (Age < age_var)
) AS rank
FROM Person
为了避免 Erandac 的答案中的“但是”结合 Daniel 和 Salman 的答案,可以使用以下“分区解决方法”之一
SELECT customerID, myDate
-- partition ranking works only with CTE / from MySQL 8.0 on
, RANK() OVER (PARTITION BY customerID ORDER BY dateFrom) AS rank,
-- Erandac's method in combination of Daniel's and Salman's
-- count all items in sequence, maximum reaches row count.
, IF(customerID=@_lastRank, @_curRank:=@_curRank, @_curRank:=@_sequence+1) AS sequenceRank
, @_sequence:=@_sequence+1 as sequenceOverAll
-- Dense partition ranking, works also with MySQL 5.7
-- remember to set offset values in from clause
, IF(customerID=@_lastRank, @_nxtRank:=@_nxtRank, @_nxtRank:=@_nxtRank+1 ) AS partitionRank
, IF(customerID=@_lastRank, @_overPart:=@_overPart+1, @_overPart:=1 ) AS partitionSequence
, @_lastRank:=customerID
FROM myCustomers,
(SELECT @_curRank:=0, @_sequence:=0, @_lastRank:=0, @_nxtRank:=0, @_overPart:=0 ) r
ORDER BY customerID, myDate
此代码片段中的第 3 个变体中的分区排名将返回连续的排名数字。这将导致类似于rank() over partition by
结果的数据结构。例如,请参见下文。特别是,对于每个新的 partitionRank,partitionSequence 将始终从 1 开始,使用以下方法:
customerID myDate sequenceRank (Erandac)
| sequenceOverAll
| | partitionRank
| | | partitionSequence
| | | | lastRank
... lines ommitted for clarity
40 09.11.2016 11:19 1 44 1 44 40
40 09.12.2016 12:08 1 45 1 45 40
40 09.12.2016 12:08 1 46 1 46 40
40 09.12.2016 12:11 1 47 1 47 40
40 09.12.2016 12:12 1 48 1 48 40
40 13.10.2017 16:31 1 49 1 49 40
40 15.10.2017 11:00 1 50 1 50 40
76 01.07.2015 00:24 51 51 2 1 76
77 04.08.2014 13:35 52 52 3 1 77
79 15.04.2015 20:25 53 53 4 1 79
79 24.04.2018 11:44 53 54 4 2 79
79 08.10.2018 17:37 53 55 4 3 79
117 09.07.2014 18:21 56 56 5 1 117
119 26.06.2014 13:55 57 57 6 1 119
119 02.03.2015 10:23 57 58 6 2 119
119 12.10.2015 10:16 57 59 6 3 119
119 08.04.2016 09:32 57 60 6 4 119
119 05.10.2016 12:41 57 61 6 5 119
119 05.10.2016 12:42 57 62 6 6 119
...
select id,first_name,gender,age,
rank() over(partition by gender order by age) rank_g
from person
CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));
INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
INSERT INTO person VALUES (9,'AKSH',32,'M');
SELECT FirstName,Age,Gender, RANK() OVER (partition by Gender order by Age desc) AS 'Partition by Gender' FROM Person