2

2019 年 10 月 8 日更新:

@Gordon Linoff:我尝试应用您的解决方案,但我意识到它没有按预期工作。我在此处添加了一个带有预期结果的示例(https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=1b486476d6aeab25997f25e66ee455e9),如果您能帮助我,我将不胜感激。

--

我有一个带有架构的事务表:

CREATE TABLE Transactions (Id int IDENTITY, SessionId int, TransactionType varchar(50), DateTimeEnd datetime, DateStart datetime, Rank int);

以下是一些行示例:

INSERT INTO Transactions (Id, SessionId, TransactionType, DateTimeEnd, DateStart, Rank)
VALUES
 (1, 1, 'Deposit',    '2017-01-20T11:16:33Z', '2017-01-20T11:16:33Z', 600),
 (2, 1, 'Withdrawal', '2017-01-21T11:16:33Z', '2017-01-20T11:16:33Z', 100),
 (3, 2, 'Deposit',    '2017-02-23T11:16:33Z', '2017-02-23T11:16:33Z', 500),
 (4, 1, 'Withdrawal', '2017-01-24T11:16:33Z', '2017-01-21T11:16:33Z', 150),
 (5, 1, 'Withdrawal', '2017-01-26T11:16:33Z', '2017-01-24T11:16:33Z', 150),
 (6, 2, 'Withdrawal', '2017-02-27T11:16:33Z', '2017-02-23T11:16:33Z', 200),
 (7, 1, 'Withdrawal', '2017-01-28T11:16:33Z', '2017-01-26T11:16:33Z', 10),
 (8, 1, 'Withdrawal', '2017-01-30T11:16:33Z', '2017-01-28T11:16:33Z', 10),
 (9, 1, 'Withdrawal', '2017-01-31T11:16:33Z', '2017-01-30T11:16:33Z', 10);

我想要的是一个 T-SQL 查询,用于按 SessionId、TransactionType 和每个组合并连续行组,以仅保留具有最小 DateTimeEnd 的行。此外,保留的行的 Rank 值必须是来自组的行的 Rank 值的总和。T-SQL 查询需要在 Microsoft Azure SQL 数据仓库的 MS SQL Server 中运行。

期望的结果:

|    Id    |     SessionId    | Transaction |       DateTimeEnd  |      DateStart     |   Rank  |
|----------|------------------|-------------|--------------------|--------------------|---------|
|    1     |         1        |      Deposit|2017-01-20T11:16:33Z|2017-01-20T11:16:33Z|   600   |
|    2     |         1        |   Withdrawal|2017-01-21T11:16:33Z|2017-01-20T11:16:33Z|   100   |
|  4       |         1        |   Withdrawal|2017-01-24T11:16:33Z|2017-01-21T11:16:33Z|   300   |
|  7       |         1        |   Withdrawal|2017-01-28T11:16:33Z|2017-01-26T11:16:33Z|    30   |
|    3     |         2        |      Deposit|2017-02-23T11:16:33Z|2017-02-23T11:16:33Z|   500   |
|    6     |         2        |   Withdrawal|2017-02-27T11:16:33Z|2017-02-23T11:16:33Z|   200   |

我尝试了很多方法,但无法实现。

4

2 回答 2

2

正如 GMB 指出的那样,这是一个孤岛问题。因为您想保留第一行,所以我将建议一种lag()方法,而不是行号的差异:

SELECT SessionId, TransactionType, DateTimeEnd,DateStart, sumRank
FROM (SELECT t.*,
             SUM(Rank) OVER (PARTITION BY SessionId, TransactionType, grp) as sumRank
      FROM (SELECT t.*,
                   SUM(CASE WHEN prev_st_id = prev_id THEN 0 ELSE 1 END) OVER (ORDER BY id) as grp
            FROM (SELECT t.*,
                         LAG(id) OVER (PARTITION BY SessionId, TransactionType ORDER BY id) as prev_st_id,
                         LAG(id) OVER (PARTITION BY SessionId ORDER BY id) as prev_id
                  FROM Transactions t
                 ) t
           ) t
     ) t
WHERE prev_st_id <> prev_id OR prev_st_id IS NULL;

这是做什么的?

  • 最内层的子查询计算 id 的整体和会话/事务类型的滞后。这id是因为它看起来比日期/时间更稳定(其中一列中有重复的日期/时间值)。
  • 当 id 不同时,就会识别出一个新岛。累积和标识组。
  • 然后grp使用窗口函数计算整个组的值。
  • 然后,外部查询只过滤到每个组中的第一行。

是一个 db<>fiddle。

于 2019-09-19T14:26:43.793 回答
0

这是一个差距和岛屿的变体。

我会按如下方式处理它:

1) 首先,识别和合并记录组。以下查询为您DateTimeEnd提供每个组的最小组,以及排名的总和

SELECT 
    SessionId, 
    TransactionType, 
    SUM(Rank) SumRank, 
    MIN(DateTimeEnd) MinDateTimeEnd
FROM (
    SELECT 
        t.*,
        ROW_NUMBER() OVER(ORDER BY DateTimeEnd) rn1,
        ROW_NUMBER() OVER(PARTITION BY SessionId, TransactionType ORDER BY DateTimeEnd) rn2
    FROM Transactions t
 ) x
GROUP BY SessionId, TransactionType, rn1 - rn2

回报:

会话 ID | 交易类型 | 总排名 | 最小日期时间结束     
--------: | :---------------- | ------: | :-----------------
        1 | 存款 | 600 | 20/01/2017 11:16:33
        1 | 提款 | 430 | 21/01/2017 11:16:33
        2 | 存款 | 500 | 23/02/2017 11:16:33
        2 | 提款 | 200 | 27/02/2017 11:16:33

2)然后,将上述查询的结果与原始表连接起来,以提取其余列:

SELECT 
    t.id,
    t.SessionId,
    t.TransactionType,
    t.DateTimeEnd,
    t.DateStart,
    x.SumRank
FROM Transactions t
INNER JOIN (
    SELECT 
        SessionId, 
        TransactionType, 
        SUM(Rank) SumRank, 
        MIN(DateTimeEnd) MinDateTimeEnd
    FROM (
        SELECT 
            t.*,
            ROW_NUMBER() OVER(ORDER BY DateTimeEnd) rn1,
            ROW_NUMBER() OVER(PARTITION BY SessionId, TransactionType ORDER BY DateTimeEnd) rn2
        FROM Transactions t
    ) x
    GROUP BY SessionId, TransactionType, rn1 - rn2
) x 
    ON  x.SessionId = t.SessionId
    AND x.TransactionType = t.TransactionType
    AND x.MinDateTimeEnd = t.DateTimeEnd

产量:

编号 | 会话 ID | 交易类型 | 日期时间结束 | 日期开始 | SumRank
-: | --------: | :---------------- | :----------------- | :----------------- | ------:
 1 | 1 | 存款 | 20/01/2017 11:16:33 | 20/01/2017 11:16:33 | 600
 2 | 1 | 提款 | 21/01/2017 11:16:33 | 20/01/2017 11:16:33 | 430
 3 | 2 | 存款 | 23/02/2017 11:16:33 | 23/02/2017 11:16:33 | 500
 6 | 2 | 提款 | 2017 年 2 月 27 日 11:16:33 | 23/02/2017 11:16:33 | 200

DB Fiddle 上的演示

注意:如评论所述,我认为您显示的预期结果存在故障。带有ids4和的行7不应出现在输出中,因为带有 id2的行具有相同的SessionIdandTransactionType和一个更早的DateTimeEnd

于 2019-09-19T14:00:16.973 回答