sql-server - 如何正确为我的数据池使用 Row_Number()（分区）

Question

我们有下表（输出已经排序并分开以便理解）：

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
----------------------------------------------------------------------------
|  3 | 100 | 500 |       Change | 2011-01-01 02:00:00 |                  Z |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------
|  4 | 100 | 510 |       Create | 2011-01-01 00:30:00 |                  T |
----------------------------------------------------------------------------
|  5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 |                  A |
----------------------------------------------------------------------------

什么是ActionCode？我们在其中使用它，c#它代表一个枚举值

我想达到什么目的？

好吧，我需要以下输出：

| FK1 | FK2 |   ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 |       Create |                  H |
| 100 | 500 |       Create |                  Z |
| 100 | 510 |       Create |                  T |
| 100 | 520 | CreateSystem |                  A |
-------------------------------------------------

那么，实际的逻辑是什么？我们有一些组合键（FK1 + FK2）的逻辑组。这些组中的每一个都可以分成以Create或开头的分区CreateSystem。每个分区以Create,CreateSystem或结尾Change。每个分区的实际值SomeAttributeValue应该是分区最后一行的值。

不可能有以下数据池：

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  7 | 100 | 500 |       Change | 2011-01-02 02:00:00 |                  Z |
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------

然后期望 PK 7 影响 PK 2 或 PK 6 影响 PK 1。

我什至不知道如何/从哪里开始......我怎样才能做到这一点？我们在 mssql 2005+ 上运行

编辑：
有一个可用的转储：

instanceId：我的PK
租户 ID：FK 1
活动 ID：FK 2
callId：FK 3
refillCounter：FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 必须忽略)
ticketType、profileId、contactPersonId、ownerId、handlingStartTime、handlingEndTime、memo、callWasPreselected、creatorId、creationTS、changerId、changeTS 应该取自Create（分组中的第一行）
callState、reasonId、followUpDate、callingAttempts 和 callingAttemptsConsecutivelyNotReached 应该取自最后Create一行（然后将是“单行分区”/与上一行相同）或Change（分区中的最后一行）

score 2 · Accepted Answer

我假设每个分区只能包含一个Create或 CreateSystem，否则您的要求是不明确的。以下内容未经测试，因为我没有示例表，也没有易于使用的格式的示例数据：

;With Partitions as (
     Select
         t1.FK1,
         t1.FK2,
         t1.CreationTS as StartTS,
         t2.CreationTS as EndTS
     From
         Table t1
             left join
         Table t2
             on
                  t1.FK1 = t2.FK1 and
                  t1.FK2 = t2.FK2 and
                  t1.CreationTS < t2.CreationTS and
                  t2.ActionCode in ('Create','CreateSystem')
             left join
         Table t3
             on
                  t1.FK1 = t3.FK1 and
                  t1.FK2 = t3.FK2 and
                  t1.CreationTS < t3.CreationTS and
                  t3.CreationTS < t2.CreationTS and
                  t3.ActionCode in ('Create','CreateSystem')
       where
           t1.ActionCode in ('Create','CreateSystem') and
           t3.FK1 is null
), PartitionRows as (
     SELECT
         t1.FK1,
         t1.FK2,
         t1.ActionCode,
         t2.SomeAttributeValue,
         ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
     from
         Partitions t1
             inner join
         Table t2
             on
                t1.FK1 = t2.FK1 and
                t1.FK2 = t2.FK2 and
                t1.StartTS <= t2.CreationTS and
                (t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1

（请注意，我在这里使用各种保留名称）

基本逻辑是：Partitions CTE 用于定义每个分区的 FK1、FK2、包含开始时间戳和独占结束时间戳。它通过对基表的三重连接来做到这一点。来自的行t2被选择出现在来自的行之后t1，然后来自的行t3被选择出现在来自t1和的匹配行之间t2。然后，在 WHERE 子句中，我们从结果集中排除发生匹配的任何行t3- 结果是来自的行t1和来自的行t2表示两个相邻分区的开始。

然后第二个 CTE 检索Table每个分区的所有行，但在每个分区内分配一个ROW_NUMBER()分数，基于CreationTS, 降序排序，结果ROW_NUMBER()每个分区内的 1 是最后出现的行。

最后，在选择中，我们选择在各自分区中最后出现的那些行。

这确实假设CreationTS每个分区中的值是不同的。如果这个假设不成立，我也可以使用 PK 重新工作。

score 0 · Accepted Answer

它可以用递归 CTE 解决。这里（假设分区中的行按排序CreationTS）：

WITH partitioned AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
  FROM data
),
subgroups AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup = 1,
    Subrank  = 1
  FROM partitioned
  WHERE rn = 1
  UNION ALL
  SELECT
    p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
    Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
    Subrank  = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
  FROM partitioned p
    INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
      AND p.rn = s.rn + 1
),
finalranks AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup, Subrank,
    rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
    /* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
  FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

sql-server - 如何正确为我的数据池使用 Row_Number()（分区）

2 回答 2

Related

Reference