4

我想要做的是总结ID与一组时间的每个“情节”的时间,而不是获得NC到C和C到NC的第一集时间以及从NC到C的最后一集时间和 C 到 NC 在下表中我手动添加了GRP_Time列。我还添加了最终结果表

这是我试图系统地生成的指标。

ID    ASSign_ID  GRP      Time            GRP_Time   
11    1788       NC       6             
11    1802       NC       1               7
11    2995       C        7               7
11    5496       NC       11              11
11    6077       C        2 
11    6216       C        2
11    6226       C        4               8  
11    6790       NC       5               5
12    1234       C        6               6
12    2345       NC       1               
12    3456       NC       8               9
12    4567       C       11              11
14    6789       C        2 
14    7890       C        2
14    8900       C        4               8  
14    8904       NC       5               5

结果表

ID   First_ET_NC_C    First_ET_C_NC   LAST_ET_NC_C   LAST_ET_C_NC
11   7                7               11             8
12   9                6                9             6 
14   -                8                -             8
4

3 回答 3

5

尝试这个:

with seq as 
(    
  select tbl.*, 
      row_number() over(order by assign_id) rn  -- naturalized the order
  from tbl
),
grp as 
(
  select cr.*,          
     sum(case when cr.grp = pr.grp or pr.grp is null then 0 else 1 end)
     over(order by cr.rn) gn      
  from seq cr -- current row
  left join seq pr -- previous row
  on pr.rn = cr.rn - 1
)
,run as
(
  select grp.*,
      sum(time) over(partition by gn order by rn) as run_tot
  from grp
)
select 
   id, assign_id, grp, time,
   case when max(rn) over(partition by gn) <> rn then 
      null
   else
      run_tot
   end as run_total
from run r;

输出:

ID        ASSIGN_ID GRP       TIME      RUN_TOTAL
11        1788      NC        6         (null)
11        1802      NC        1         7
11        2995      C         7         7
11        5496      NC        11        11
11        6077      C         2         (null)
11        6216      C         2         (null)
11        6226      C         4         8
11        6790      NC        5         5

现场测试:http ://www.sqlfiddle.com/#!4/faacc/1


这个怎么运作:

ID        ASSIGN_ID GRP       TIME      RN        GN
11        1788      NC        6         1         0
11        1802      NC        1         2         0
11        2995      C         7         3         1
11        5496      NC        11        4         2
11        6077      C         2         5         3
11        6216      C         2         6         3
11        6226      C         4         7         3
11        6790      NC        5         8         4

我们基本上需要为GN一个连续的 grp 分配一个组号(列)。然后我们可以做一个由 GN 划分的运行总和

您可以在此处查看查询的进度:http ://www.sqlfiddle.com/#!4/faacc/1

每一步都建立在上一步的基础之上。只需向下滚动即可查看解决方案的进展情况


编辑

查询可以缩短,因为您的报告没有显示每行总计的运行,它只显示在最后一行,而不是sum(time) over(partition by gn order by rn) as run_tot,我们可以这样做sum(time) over(partition by gn) as run_tot,即我们删除order by rn; 然后检测该行是否是最后一行,如果是则执行sum over,否则显示null。

最终查询:

with seq as 
(

  select
  
     tbl.*, 
     row_number() over(order by assign_id) rn  -- naturalized the order
  from tbl
),
grp as 
(
  select 
   
     cr.*,
      
     sum(case when cr.grp = pr.grp or pr.grp is null then 0 else 1 end)
     over(order by cr.rn) gn
  
  from seq cr -- current row
  left join seq pr -- previous row
  on pr.rn = cr.rn - 1
)
select
     grp.*,

     case when max(rn) over(partition by gn) <> rn then -- if not last row
        null
     else -- if last row
        sum(time) over(partition by gn) 
     end as running_total
from grp;

现场测试:http ://www.sqlfiddle.com/#!4/faacc/7



编辑

关于多个ID,例如6790:

ID        ASSIGN_ID GRP       TIME
11        1788      NC        6
11        1802      NC        1
11        2995      C         7
11        5496      NC        11
11        6077      C         2
11        6216      C         2
11        6226      C         4
11        6790      NC        5
12        6790      NC        1
12        6791      NC        3
12        6792      NC        1
12        6793      NC        4
12        6794      C         1
12        6795      C         6
12        6797      C         8
13        6793      C         1
13        6794      C         4
13        6795      C         3

有两个相似的 ASSIGN_ID,例如 6790,但它属于一个更大的组(在 ID 上,分别为 11 和 12),因此为了隔离这两个组,我们必须按 ID 对它们进行分区。

这是最终查询,请注意在评论中添加了此内容:http ://www.sqlfiddle.com/#!4/83789/2

with seq as 
(    
  select tbl.*, 

     -- added this: partition by id
     -- naturalized the order: rn       
     row_number() over(partition by id order by assign_id) rn  
  from tbl
)
,grp as 
(
  select cr.*,        

     -- added this: partition by cr.id
     sum(case when cr.grp = pr.grp then 0 else 1 end)
     over(partition by cr.id order by cr.rn) gn      
  from seq cr -- current row
  left join seq pr -- previous row
  on 
    pr.id = cr.id -- added this
    and pr.rn = cr.rn - 1
)
select id, assign_id, grp, time, 

     -- added this: partition by id
     case when max(rn) over(partition by id,gn) <> rn then 
        null
     else
        -- added this: partition by id
        sum(time) over(partition by id,gn) 
     end as running_total
from grp
order by id, rn;

输出:

ID        ASSIGN_ID GRP       TIME      RUNNING_TOTAL
11        1788      NC        6         (null)
11        1802      NC        1         7
11        2995      C         7         7
11        5496      NC        11        11
11        6077      C         2         (null)
11        6216      C         2         (null)
11        6226      C         4         8
11        6790      NC        5         5
12        6790      NC        1         (null)
12        6791      NC        3         (null)
12        6792      NC        1         (null)
12        6793      NC        4         9
12        6794      C         1         (null)
12        6795      C         6         (null)
12        6797      C         8         15
13        6793      C         1         (null)
13        6794      C         4         (null)
13        6795      C         3         8

这是如何工作的,请注意ID 和 GN

ID        ASSIGN_ID GRP       TIME      RN        GN        RUNNING_TOTAL
11        1788      NC        6         1         1         (null)
11        1802      NC        1         2         1         7
11        2995      C         7         3         2         7
11        5496      NC        11        4         3         11
11        6077      C         2         5         4         (null)
11        6216      C         2         6         4         (null)
11        6226      C         4         7         4         8
11        6790      NC        5         8         5         5
12        6790      NC        1         1         1         (null)
12        6791      NC        3         2         1         (null)
12        6792      NC        1         3         1         (null)
12        6793      NC        4         4         1         9
12        6794      C         1         5         2         (null)
12        6795      C         6         6         2         (null)
12        6797      C         8         7         2         15
13        6793      C         1         1         1         (null)
13        6794      C         4         2         1         (null)
13        6795      C         3         3         1         8

在此处查看查询进度:http ://www.sqlfiddle.com/#!4/83789/2


更新尝试使用这个,它更简洁易读:https ://stackoverflow.com/a/10629498

于 2012-05-16T17:55:36.287 回答
1

您只能在 Oracle 上使用以下查询,因为它使用 LAG()

SELECT Table1.ID, 
       Table1.ASSign_ID, 
       Table1.GRP, 
       Table1.TIME, 
       grpSum.GRP_TIME 
FROM   Table1 
       left join (SELECT ID, 
                         MAX(ASSIGN_ID) ASSIGN_ID, 
                         SUM(TIME)      GRP_TIME 
                  FROM   (SELECT ID, 
                                 ASSIGN_ID, 
                                 GRP, 
                                 TIME, 
                                 SUM(GC) over (PARTITION BY GRP ORDER BY ID, ASSIGN_ID ) g 
                          FROM   (SELECT ID, 
                                         ASSIGN_ID, 
                                         GRP, 
                                         TIME, 
                                         CASE 
                                           WHEN GRP = Lag(GRP) over (ORDER BY ID, ASSIGN_ID) 
                                               THEN  0 
                                               ELSE 1 
                                         END gc 
                                  FROM   TABLE1) a) b
                  GROUP  BY ID, 
                            GRP, 
                            g) grpSum 
         ON table1.ID = grpSum.ID 
            AND table1.ASSIGN_ID = grpSum.ASSIGN_ID 
ORDER BY Table1.ID, 
         Table1.ASSign_ID

演示

差距和孤岛解决方案有点难以描述,但这是每个部分的作用

  • 最里面的查询“A”使用 LAG 将 1 分配给“episode”中的第一个项目,然后为每个成员分配 0。

  • 下一个查询“B”使用 SUM OVER 为“剧集”的每个成员分配相同的标识符。请注意,如果 GRP 不同,则相同的标识符将用于不同的剧集

  • 查询 grpSum 只是对每个“episode”的时间进行求和,并将最大的 Assing_ID 标识为“episode”中的最后一次

  • 然后我们在 ID 上加入原始表并进行投影。

我在偷迈克尔的进度演示想法

您可以在此处查看子查询的进度 (向下滚动)

注意:您还可以使用 Micheal 的答案中的CASE MAX OVERandSUM OVER技术来删除 LEFT JOIN 和 grpSUM 查询

SELECT ID, 
       ASSIGN_ID, 
       GRP, 
       TIME, 
       CASE 
         WHEN Max(ASSIGN_ID) OVER (partition BY ID, GRP, G) = ASSIGN_ID THEN 
         SUM (TIME) OVER (partition BY ID, GRP, G) 
         ELSE NULL 
       END GRP_TIME 
FROM   (SELECT ID, 
               ASSIGN_ID, 
               GRP, 
               TIME, 
               Sum(GC) OVER (partition BY GRP ORDER BY ID, ASSIGN_ID ) g 
        FROM   (SELECT ID, 
                       ASSIGN_ID, 
                       GRP, 
                       TIME, 
                       CASE 
                         WHEN GRP = Lag(GRP) OVER (ORDER BY ID, ASSIGN_ID) THEN 
                         0 
                         ELSE 1 
                       END gc 
                FROM   TABLE1) a) b
    ORDER BY ID, 
             ASSign_ID

演示

于 2012-05-16T17:26:32.287 回答
0

我非常坚持我之前使用 row_number 的答案,我假设没有很好的对行进行排序的候选者(事实上,我ORDER BY NULL在第一次代码迭代时进行了代码迭代,以明确代码依赖于行的物理排序) . 当我注意到你的数据有一个自然的顺序时,我应该已经从头开始了。

id + assign_id 非常适合分区和排序,我们可以制定一个更简单的查询,我们可以只使用 LAG。

这是最短和最简单的查询:http ://www.sqlfiddle.com/#!4/b6c14/3

with hm as -- headers marked
(
    select tbl.*,

      case when lag(grp) over(partition by id order by assign_id) = grp then 
        0 
      else 
        1 
      end mark_header

    from tbl
)
,grp as -- grouping
(
    select 
      hm.*,

      -- gn: group number
      sum(mark_header) over(partition by id order by assign_id) as gn
    from hm
)
select -- final query
    id, assign_id, grp, time,

    case when max(assign_id) over(partition by id,gn) = assign_id then
       sum(time) over(partition by id,gn) 
    else
       null
    end as running_total
from grp
order by id, assign_id;

输出:

ID        ASSIGN_ID GRP       TIME      RUNNING_TOTAL
11        1788      NC        6         (null)
11        1802      NC        1         7
11        2995      C         7         7
11        5496      NC        11        11
11        6077      C         2         (null)
11        6216      C         2         (null)
11        6226      C         4         8
11        6790      NC        5         5
12        6790      NC        1         (null)
12        6791      NC        3         (null)
12        6792      NC        1         (null)
12        6793      NC        4         9
12        6794      C         1         (null)
12        6795      C         6         (null)
12        6797      C         8         15
13        6793      C         1         (null)
13        6794      C         4         (null)
13        6795      C         3         8

这是如何工作的,请注意ID 和 GN,这就是我们建立在我们的运行总数之上的地方:

ID        ASSIGN_ID GRP       TIME      MARK_HEADER  GN
11        1788      NC        6         1            1
11        1802      NC        1         0            1
11        2995      C         7         1            2
11        5496      NC        11        1            3
11        6077      C         2         1            4
11        6216      C         2         0            4
11        6226      C         4         0            4
11        6790      NC        5         1            5
12        6790      NC        1         1            1
12        6791      NC        3         0            1
12        6792      NC        1         0            1
12        6793      NC        4         0            1
12        6794      C         1         1            2
12        6795      C         6         0            2
12        6797      C         8         0            2
13        6793      C         1         1            1
13        6794      C         4         0            1
13        6795      C         3         0            1

在此处查看进度:http ://www.sqlfiddle.com/#!4/b6c14/3


可以是 Oracle 优化的(使用 DECODE 而不是 CASE WHEN): http ://www.sqlfiddle.com/#!4/b6c14/6

with hm as -- headers marked
(
    select tbl.*,

      decode(lag(grp) over(partition by id order by assign_id), grp, 0, 1)
         as mark_header

    from tbl
)
,grp as -- grouping
(
    select 
      hm.*,

      sum(mark_header) over(partition by id order by assign_id) as gn
    from hm
)
select -- final query
    id, assign_id, grp, time,

    decode(max(assign_id) over(partition by id,gn), assign_id,           
       sum(time) over(partition by id,gn), null) as running_total 

from grp
order by id, assign_id;
于 2012-05-17T03:36:42.357 回答