4

有人对这个问题有建议吗?我正在尝试使用 oracle SQL 来合并effective_dtexpiration_dtwhere 中的值的范围col_acol_bcol_c保持不变,但仅适用于其中 3 列中的任何一个都没有变化的连续记录。

如果有帮助,可以安全地假设下一个记录(按员工)的生效日期等于上一个记录加上 1 天。

我尝试了min(),max()group by,但问题是下面的场景将返回 12/1-12/31。然后我尝试了lead()函数,但问题是我事先不知道我需要合并多少条记录。

假设我可以将数据转换为以下形式:

+----------+--------------+---------------+---------+---------+---------+
| employee | effective_dt | expiration_dt |  col_a  |  col_b  |  col_c  |
+----------+--------------+---------------+---------+---------+---------+
|     0001 | 12/1/2012    | 12/4/2012     | value_a | value_a | value_a |
|     0001 | 12/5/2012    | 12/6/2012     | value_a | value_a | value_a |
|     0001 | 12/7/2012    | 12/10/2012    | value_a | value_a | value_a |
|     0001 | 12/11/2012   | 12/17/2012    | value_a | value_b | value_a |
|     0001 | 12/18/2012   | 12/31/2012    | value_a | value_a | value_a |
+----------+--------------+---------------+---------+---------+---------+    

预期结果:

+----------+--------------+---------------+---------+---------+---------+
| employee | effective_dt | expiration_dt |  col_a  |  col_b  |  col_c  |
+----------+--------------+---------------+---------+---------+---------+
|     0001 | 12/1/2012    | 12/10/2012    | value_a | value_a | value_a |
|     0001 | 12/11/2012   | 12/17/2012    | value_a | value_b | value_a |
|     0001 | 12/18/2012   | 12/31/2012    | value_a | value_a | value_a |
+----------+--------------+---------------+---------+---------+---------+

尝试1:

SELECT employee,
  MIN(effective_dt),
  MAX(expiration_dt),
  col_a,
  col_b,
  col_c
FROM
  (SELECT employee, effective_dt, ... FROM table_x, table_y, ... where...
  ) table_a
GROUP BY employee,
  col_a,
  col_b,
  col_c;

尝试2:

SELECT employee,
  effective_dt,
  lead(expiration_dt, 1) over (partition BY employee, col_a, col_b, col_c order by effective_dt) expiration_dt,
  col_a,
  col_b,
  col_c
FROM
  (SELECT employee, effective_dt, ... FROM table_x, table_y, ... where...
  ) table_a;

谢谢你!

4

1 回答 1

1

如您所说,如果我们可以安全地假设下一条记录等于上一条记录 + 1 天,那么我们可以将这些记录链接起来:

SQL小提琴

Oracle 11g R2 模式设置

CREATE TABLE t
    (employee int, effective_dt timestamp, expiration_dt timestamp, col_a varchar2(7), col_b varchar2(7), col_c varchar2(7))
;

INSERT ALL 
    INTO t (employee, effective_dt, expiration_dt, col_a, col_b, col_c)
         VALUES (0001, '01-Dec-2012 12:00:00 AM', '04-Dec-2012 12:00:00 AM', 'value_a', 'value_a', 'value_a')
    INTO t (employee, effective_dt, expiration_dt, col_a, col_b, col_c)
         VALUES (0001, '05-Dec-2012 12:00:00 AM', '06-Dec-2012 12:00:00 AM', 'value_a', 'value_a', 'value_a')
    INTO t (employee, effective_dt, expiration_dt, col_a, col_b, col_c)
         VALUES (0001, '07-Dec-2012 12:00:00 AM', '10-Dec-2012 12:00:00 AM', 'value_a', 'value_a', 'value_a')
    INTO t (employee, effective_dt, expiration_dt, col_a, col_b, col_c)
         VALUES (0001, '11-Dec-2012 12:00:00 AM', '17-Dec-2012 12:00:00 AM', 'value_a', 'value_b', 'value_a')
    INTO t (employee, effective_dt, expiration_dt, col_a, col_b, col_c)
         VALUES (0001, '18-Dec-2012 12:00:00 AM', '19-Dec-2012 12:00:00 AM', 'value_a', 'value_b', 'value_a')
    INTO t (employee, effective_dt, expiration_dt, col_a, col_b, col_c)
         VALUES (0001, '20-Dec-2012 12:00:00 AM', '31-Dec-2012 12:00:00 AM', 'value_a', 'value_a', 'value_a')
SELECT * FROM dual
;

查询 1

select employee, min(effective_dt), max(expiration_dt), col_a, col_b, col_c 
  from ( select t.*, 
                case when
                     col_a = lag(col_a) over (partition by employee order by expiration_dt asc)    
                 and col_b = lag(col_b) over (partition by employee order by expiration_dt asc) 
                 and col_c = lag(col_c) over (partition by employee order by expiration_dt asc) 
                then 0 else 1 end start_of_chain 
           from t
  )
 connect by effective_dt = prior expiration_dt + 1  and start_of_chain = 0  
   start with start_of_chain = 1
   group by connect_by_root(effective_dt), employee, col_a, col_b, col_c
order by 2 

结果

| EMPLOYEE |               MIN(EFFECTIVE_DT) |              MAX(EXPIRATION_DT) |   COL_A |   COL_B |   COL_C |
--------------------------------------------------------------------------------------------------------------
|        1 | December, 01 2012 00:00:00+0000 | December, 10 2012 00:00:00+0000 | value_a | value_a | value_a |
|        1 | December, 11 2012 00:00:00+0000 | December, 19 2012 00:00:00+0000 | value_a | value_b | value_a |
|        1 | December, 20 2012 00:00:00+0000 | December, 31 2012 00:00:00+0000 | value_a | value_a | value_a |
于 2012-12-12T04:14:07.007 回答