17

背景:

原来的情况很简单。从最高收入到最低计算每个用户的总收入:

CREATE TABLE t(Customer INTEGER  NOT NULL PRIMARY KEY 
              ,"User"   VARCHAR(5) NOT NULL
              ,Revenue  INTEGER  NOT NULL);

INSERT INTO t(Customer,"User",Revenue) VALUES
(001,'James',500),(002,'James',750),(003,'James',450),
(004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150),
(007,'Sarah',600),(008,'James',150),(009,'James',100);

询问:

SELECT *,
    1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY "User") AS percentage,
    1.0 * SUM(Revenue) OVER(PARTITION BY "User" ORDER BY Revenue DESC)
         /SUM(Revenue) OVER(PARTITION BY "User") AS running_percentage
FROM t;

LiveDemo

输出:

╔════╦═══════╦═════════╦════════════╦════════════════════╗
║ ID ║ User  ║ Revenue ║ percentage ║ running_percentage ║
╠════╬═══════╬═════════╬════════════╬════════════════════╣
║  2 ║ James ║     750 ║ 0.38       ║ 0.38               ║
║  1 ║ James ║     500 ║ 0.26       ║ 0.64               ║
║  3 ║ James ║     450 ║ 0.23       ║ 0.87               ║
║  8 ║ James ║     150 ║ 0.08       ║ 0.95               ║
║  9 ║ James ║     100 ║ 0.05       ║ 1                  ║
║  7 ║ Sarah ║     600 ║ 0.44       ║ 0.44               ║
║  5 ║ Sarah ║     500 ║ 0.37       ║ 0.81               ║
║  6 ║ Sarah ║     150 ║ 0.11       ║ 0.93               ║
║  4 ║ Sarah ║     100 ║ 0.07       ║ 1                  ║
╚════╩═══════╩═════════╩════════════╩════════════════════╝

它可以使用特定的窗口函数进行不同的计算。


现在让我们假设我们不能使用 windowedSUM并重写它:

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue / NULLIF(c3.s,0) AS percentage
    ,1.0 * c2.s    / NULLIF(c3.s,0) AS running_percentage
FROM t c
CROSS APPLY
        (SELECT SUM(Revenue) AS s
        FROM t c2
        WHERE c."User" = c2."User"
            AND c2.Revenue >= c.Revenue) AS c2
CROSS APPLY
        (SELECT SUM(Revenue) AS s
        FROM t c2
        WHERE c."User" = c2."User") AS c3
ORDER BY "User", Revenue DESC;

LiveDemo

我使用过CROSS APPLY是因为我不喜欢SELECT列列表中的相关子查询并且c3被使用了两次。

一切正常。但是当我们仔细观察时c2,又c3是非常相似的。那么为什么不将它们结合起来并使用简单的条件聚合:

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage
    ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
        (SELECT SUM(Revenue) AS sum_total,
                SUM(CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END) 
                AS sum_running
        FROM t c2
        WHERE c."User" = c2."User") AS c2
ORDER BY "User", Revenue DESC;

不幸的是,这是不可能的。

在包含外部引用的聚合表达式中指定了多个列。如果要聚合的表达式包含外部引用,则该外部引用必须是表达式中引用的唯一列。

当然我可以用另一个子查询来绕过它,但它变得有点“丑陋”:

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage
    ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
(   SELECT SUM(Revenue) AS sum_total,
           SUM(running_revenue) AS sum_running
     FROM (SELECT Revenue,
                  CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END 
                  AS running_revenue
           FROM t c2
           WHERE c."User" = c2."User") AS sub
) AS c2
ORDER BY "User", Revenue DESC

LiveDemo


Postgresql版本。唯一的区别是LATERAL代替CROSS APPLY.

SELECT c.Customer, c."User", c.Revenue
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage 
    ,1.0 * c2.running_sum / NULLIF(c2.sum_total,0) AS running_percentage 
FROM t c
,LATERAL (SELECT SUM(Revenue) AS sum_total,
                 SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END) 
                 AS running_sum
        FROM t c2
        WHERE c."User" = c2."User") c2
ORDER BY "User", Revenue DESC;

SqlFiddleDemo

它工作得很好。


SQLite/MySQL版本(这就是我喜欢的原因LATERAL/CROSS APPLY):

SELECT c.Customer, c."User", c.Revenue,
    1.0 * Revenue / (SELECT SUM(Revenue)
                     FROM t c2
                     WHERE c."User" = c2."User") AS percentage,
    1.0 * (SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
           FROM t c2
          WHERE c."User" = c2."User")  / 
          (SELECT SUM(c2.Revenue)
           FROM t c2
           WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

SQLFiddleDemo-SQLite SQLFiddleDemo-MySQL


我读过带有外部参考的聚合

限制的来源在SQL-92标准中,并SQL ServerSybase代码库继承。问题是 SQL Server 需要确定哪个查询将计算聚合。

我不寻找显示如何规避它的答案。

问题是:

  1. 标准的哪一部分不允许或干扰它?
  2. 为什么其他 RDBMS 对这种外部依赖没有问题?
  3. 它们是否按照应有的方式扩展SQL StandardSQL Server运行,或者SQL Server没有完全实现(正确?)?

我将非常感谢参考:

  • ISO standard(92 或更新)
  • SQL Server 标准支持
  • 来自任何 RDBMS 的官方文档来解释它(SQL Server/Postgresql/Oracle/...)。

编辑:

我知道SQL-92没有LATERAL. 但是带有子查询的版本(如 in SQLite/MySQL)也不起作用。

LiveDemo

编辑2:

为了简化一点,让我们只检查相关子查询:

SELECT c.Customer, c."User", c.Revenue,
       1.0*(SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
              FROM t c2
              WHERE c."User" = c2."User") 
       / (SELECT SUM(c2.Revenue)
          FROM t c2
          WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

上面的版本在MySQL/SQLite/Postgresql.

SQL Server我们得到错误。在用子查询包装它以将其“展平”到一个级别后,它可以工作:

SELECT c.Customer, c."User", c.Revenue,
      1.0 * (
              SELECT SUM(CASE WHEN r1 >= r2 THEN r1 ELSE 0 END)
              FROM (SELECT c2.Revenue AS r1, c.Revenue r2
                    FROM t c2
                    WHERE c."User" = c2."User") AS S)  / 
             (SELECT SUM(c2.Revenue)
              FROM t c2
              WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;

这个问题的重点是如何SQL standard规范它。

LiveDemo

4

2 回答 2

4

有一个更简单的解决方案:

SELECT c.Customer, c."User", c."Revenue",
       1.0 * Revenue/ NULLIF(c2.sum_total, 0) AS percentage,
       1.0 * c2.sum_running / NULLIF(c2.sum_total, 0) AS running_percentage
FROM t c CROSS APPLY
     (SELECT SUM(c2.Revenue) AS sum_total,
             SUM(CASE WHEN c2.Revenue >= x.Revenue THEN c2.Revenue ELSE 0 END) 
                 as sum_running
      FROM t c2 CROSS JOIN
           (SELECT c.REVENUE) x
      WHERE c."User" = c2."User"
     ) c2
ORDER BY "User", Revenue DESC;

我不确定为什么或是否此限制在 SQL '92 标准中。大约 20 年前,我确实把它记得很好,但我不记得那个特别的限制。

我应该注意:

  • 在 SQL 92 标准的时代,横向连接并没有真正引起人们的注意。Sybase 肯定没有这个概念。
  • 其他数据库确实存在外部引用的问题。特别是,它们通常将范围限制在一个深度。
  • SQL 标准本身倾向于高度政治化(即供应商驱动),而不是由实际的数据库用户需求驱动。好吧,随着时间的推移,它确实朝着正确的方向发展。
于 2016-04-07T16:29:47.910 回答
4

.sql 标准中没有这样的限制LATERALCROSS APPLY是来自 Microsoft 的特定于供应商的扩展(Oracle 后来为了兼容性而采用了它),它的限制显然不是由于 ISO/IEC SQL 标准,因为 MS 功能于标准。

LATERAL在标准 SQL 中,基本上只是一个连接修饰符,以允许连接树中的横向引用。可以引用的列数没有限制。

一开始我看不出奇怪的限制的原因。也许是因为CROSS APPLY最初打算允许表值函数,后来扩展为允许 sub- SELECTs。

Postgres手册是这样解释的LATERAL

LATERAL关键字可以在子项之前SELECT FROM。这允许 sub-SELECT引用列表FROM中出现在它之前的项目FROM列。(没有LATERAL,每个子项SELECT都是独立评估的,因此不能交叉引用任何其他FROM项目。)

您的查询的 Postgres 版本(没有更优雅的窗口函数)可以更简单:

SELECT c.*
     , round(revenue        / c2.sum_total, 2) END AS percentage 
     , round(c2.running_sum / c2.sum_total, 2) END AS running_percentage 
FROM   t c, LATERAL (
   SELECT NULLIF(SUM(revenue), 0)::numeric AS sum_total  -- NULLIF, cast once
        , SUM(revenue) FILTER (WHERE revenue >= c.revenue) AS running_sum
   FROM   t
   WHERE  "User" = c."User"
   ) c2
ORDER  BY c."User", c.revenue DESC;
  • Postgres 9.4+ 具有更优雅FILTER的条件聚合聚合。

  • NULLIF有道理,我只建议稍微简化一下。

  • sum_total一次numeric

  • 舍入结果以匹配您想要的结果。

于 2016-04-11T15:32:37.350 回答