背景:
原来的情况很简单。从最高收入到最低计算每个用户的总收入:
CREATE TABLE t(Customer INTEGER NOT NULL PRIMARY KEY
,"User" VARCHAR(5) NOT NULL
,Revenue INTEGER NOT NULL);
INSERT INTO t(Customer,"User",Revenue) VALUES
(001,'James',500),(002,'James',750),(003,'James',450),
(004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150),
(007,'Sarah',600),(008,'James',150),(009,'James',100);
询问:
SELECT *,
1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY "User") AS percentage,
1.0 * SUM(Revenue) OVER(PARTITION BY "User" ORDER BY Revenue DESC)
/SUM(Revenue) OVER(PARTITION BY "User") AS running_percentage
FROM t;
输出:
╔════╦═══════╦═════════╦════════════╦════════════════════╗
║ ID ║ User ║ Revenue ║ percentage ║ running_percentage ║
╠════╬═══════╬═════════╬════════════╬════════════════════╣
║ 2 ║ James ║ 750 ║ 0.38 ║ 0.38 ║
║ 1 ║ James ║ 500 ║ 0.26 ║ 0.64 ║
║ 3 ║ James ║ 450 ║ 0.23 ║ 0.87 ║
║ 8 ║ James ║ 150 ║ 0.08 ║ 0.95 ║
║ 9 ║ James ║ 100 ║ 0.05 ║ 1 ║
║ 7 ║ Sarah ║ 600 ║ 0.44 ║ 0.44 ║
║ 5 ║ Sarah ║ 500 ║ 0.37 ║ 0.81 ║
║ 6 ║ Sarah ║ 150 ║ 0.11 ║ 0.93 ║
║ 4 ║ Sarah ║ 100 ║ 0.07 ║ 1 ║
╚════╩═══════╩═════════╩════════════╩════════════════════╝
它可以使用特定的窗口函数进行不同的计算。
现在让我们假设我们不能使用 windowedSUM
并重写它:
SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c3.s,0) AS percentage
,1.0 * c2.s / NULLIF(c3.s,0) AS running_percentage
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c."User" = c2."User"
AND c2.Revenue >= c.Revenue) AS c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c."User" = c2."User") AS c3
ORDER BY "User", Revenue DESC;
我使用过CROSS APPLY
是因为我不喜欢SELECT
列列表中的相关子查询并且c3
被使用了两次。
一切正常。但是当我们仔细观察时c2
,又c3
是非常相似的。那么为什么不将它们结合起来并使用简单的条件聚合:
SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END)
AS sum_running
FROM t c2
WHERE c."User" = c2."User") AS c2
ORDER BY "User", Revenue DESC;
不幸的是,这是不可能的。
在包含外部引用的聚合表达式中指定了多个列。如果要聚合的表达式包含外部引用,则该外部引用必须是表达式中引用的唯一列。
当然我可以用另一个子查询来绕过它,但它变得有点“丑陋”:
SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
( SELECT SUM(Revenue) AS sum_total,
SUM(running_revenue) AS sum_running
FROM (SELECT Revenue,
CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END
AS running_revenue
FROM t c2
WHERE c."User" = c2."User") AS sub
) AS c2
ORDER BY "User", Revenue DESC
Postgresql
版本。唯一的区别是LATERAL
代替CROSS APPLY
.
SELECT c.Customer, c."User", c.Revenue
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.running_sum / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
,LATERAL (SELECT SUM(Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
AS running_sum
FROM t c2
WHERE c."User" = c2."User") c2
ORDER BY "User", Revenue DESC;
它工作得很好。
SQLite
/MySQL
版本(这就是我喜欢的原因LATERAL/CROSS APPLY
):
SELECT c.Customer, c."User", c.Revenue,
1.0 * Revenue / (SELECT SUM(Revenue)
FROM t c2
WHERE c."User" = c2."User") AS percentage,
1.0 * (SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
FROM t c2
WHERE c."User" = c2."User") /
(SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
SQLFiddleDemo-SQLite
SQLFiddleDemo-MySQL
我读过带有外部参考的聚合:
限制的来源在
SQL-92
标准中,并SQL Server
从Sybase
代码库继承。问题是 SQL Server 需要确定哪个查询将计算聚合。
我不寻找仅显示如何规避它的答案。
问题是:
- 标准的哪一部分不允许或干扰它?
- 为什么其他 RDBMS 对这种外部依赖没有问题?
- 它们是否按照应有的方式扩展
SQL Standard
和SQL Server
运行,或者SQL Server
没有完全实现(正确?)?
我将非常感谢参考:
ISO standard
(92 或更新)- SQL Server 标准支持
- 来自任何 RDBMS 的官方文档来解释它(
SQL Server/Postgresql/Oracle/...
)。
编辑:
我知道SQL-92
没有LATERAL
. 但是带有子查询的版本(如 in SQLite/MySQL
)也不起作用。
编辑2:
为了简化一点,让我们只检查相关子查询:
SELECT c.Customer, c."User", c.Revenue,
1.0*(SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
FROM t c2
WHERE c."User" = c2."User")
/ (SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
上面的版本在MySQL/SQLite/Postgresql
.
在SQL Server
我们得到错误。在用子查询包装它以将其“展平”到一个级别后,它可以工作:
SELECT c.Customer, c."User", c.Revenue,
1.0 * (
SELECT SUM(CASE WHEN r1 >= r2 THEN r1 ELSE 0 END)
FROM (SELECT c2.Revenue AS r1, c.Revenue r2
FROM t c2
WHERE c."User" = c2."User") AS S) /
(SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
这个问题的重点是如何SQL standard
规范它。