2

这个问题与我最近发布的另一个问题非常相关,但我发布了一个新问题,因为这在解决问题时提供了更多的复杂性。我正在寻求一些甲骨文忍者和摇滚明星的帮助,我觉得这是对他们专业知识的一个很好的挑战和锻炼。

基本上我有两个表,TableA 和 TableB。

-- For TableA
CREATE TABLE TableA
(
  ID          VARCHAR2(10),
  LOCN        VARCHAR2(10),
  START_DATE  DATE,
  END_DATE    DATE
)
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
NOMONITORING
/


-- Populate TableA
INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('02/04/1996', 'MM/DD/YYYY'),  TO_DATE('02/22/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('02/23/1996', 'MM/DD/YYYY'),  TO_DATE('05/28/2002', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('05/29/2002', 'MM/DD/YYYY'),  TO_DATE('05/03/2005', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P1',   '01',   TO_DATE('05/04/2005', 'MM/DD/YYYY'),  TO_DATE('05/04/2005', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2',   '30',   TO_DATE('01/31/1996', 'MM/DD/YYYY'),  TO_DATE('02/06/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2',   '02',   TO_DATE('02/07/1996', 'MM/DD/YYYY'),  TO_DATE('02/13/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P2',   '02',   TO_DATE('02/14/1996', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P3',   '03',   TO_DATE('02/07/1996', 'MM/DD/YYYY'),  TO_DATE('02/13/1996', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1P3',   '03',   TO_DATE('02/14/1996', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('1S4',   '42',   TO_DATE('11/06/2001', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');


INSERT INTO TableA(ID, LOCN, START_DATE, END_DATE)
VALUES('3S4',   '42',   TO_DATE('11/06/2001', 'MM/DD/YYYY'),  TO_DATE('01/01/2099', 'MM/DD/YYYY');



-- For TableB
CREATE TABLE TableB
(
  ID           VARCHAR2(10),
  POSTING      VARCHAR2(20),
  DESCRIPTION  VARCHAR2(100),
  OTHER_ID     VARCHAR2(10),
  START_DATE   DATE,
  END_DATE     DATE
)
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
NOMONITORING
/


INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P1', 'PROFESSOR', 'Sch 1 Quad 1 Area', 'P1', '02/04/1996', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P2', 'PROFESSOR', 'Sch 1 Quad 2 Area', 'P2', '01/31/1996', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1P3', 'PROFESSOR', 'Sch 1 Quad 3 Area', 'P3', '02/05/1996', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERVISOR', 'Sch 1 CO Supervisor 4', '1S4', '02/05/1996', '03/18/2002');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERINTENDENT', 'Sch 1 CD Superintendent', '1S4', '03/19/2002', '06/09/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('1S4', 'SUPERVISOR', 'Sch 1 CO Supervisor 4', '1S4', '06/10/2009', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERVISOR', 'Sch 2 CAO Supervisor 5', '2S5', '10/26/2002', '06/09/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERINTENDENT', 'Sch 2 CAO Superintendent 5', '2S5', '06/10/2009', '07/14/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('2S5', 'SUPERINTENDENT', 'Sch 2 CAO Superintendent 5', 'S5', '07/15/2009', '01/01/2099');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERVISOR', 'Sch 3 CO Supervisor 4', '3S4', '02/05/1996', '03/18/2002');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERINTENDENT', 'Sch 3 CD Superintendent', '3S4', '03/19/2002', '06/09/2009');

INSERT INTO TableB(ID, POSTING, DESCRIPTION, OTHER_ID, START_DATE, END_DATE)
VALUES('3S4', 'SUPERVISOR', 'Sch 3 CO Supervisor 4', '3S4', '06/10/2009', '01/01/2099');

过程如下: 在 TableA 中,所有具有相同 ID、LOCN 且具有连续 START_DATE 和 END_DATE 日期的记录将被合并。

ID  LOCN    START_DATE  END_DATE
1P1 01      02/04/1996  05/04/2005
1P2 30      01/31/1996  02/06/1996
1P2 02      02/07/1996  01/01/2099
1P3 03      02/07/1996  01/01/2099
1S4 42      11/06/2001  01/01/2099
3S4 42      11/06/2001  01/01/2099

在 TableB 中,所有具有相同 ID、POSTING、OTHER_ID 且连续 START_DATE 和 END_DATE 的记录也将被合并。(我相信无论如何都没有可以从该表中组合的数据)。

ID  POSTING         DESCRIPTION                 OTHER_ID    START_DATE  END_DATE
1P1 PROFESSOR       Sch 1 Quad 1 Area           P1          02/04/1996  01/01/2099
1P2 PROFESSOR       Sch 1 Quad 2 Area           P2          01/31/1996  01/01/2099
1P3 PROFESSOR       Sch 1 Quad 3 Area           P3          02/05/1996  01/01/2099
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         02/05/1996  03/18/2002
1S4 SUPERINTENDENT  Sch 1 CD Superintendent     1S4         03/19/2002  06/09/2009
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         06/10/2009  01/01/2099
2S5 SUPERVISOR      Sch 2 CAO Supervisor 5      2S5         10/26/2002  06/09/2009
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  2S5         06/10/2009  07/14/2009
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  S5          07/15/2009  01/01/2099
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         02/05/1996  03/18/2002
3S4 SUPERINTENDENT  Sch 3 CD Superintendent     3S4         03/19/2002  06/09/2009
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         06/10/2009  01/01/2099

根据 ID 合并 TableA 和 TableB 中的记录。LOCN 列将添加到表 B 中,并且只会根据表 A 中的日期范围结转。结果数据应如下所示:

ID  UNIT_TYPE       DESCRIPTION                 OTHER_ID    START_DATE  END_DATE    LOCN
1P1 PROFESSOR       Sch 1 Quad 1 Area           P1          02/04/1996  05/04/2005  01
1P1 PROFESSOR       Sch 1 Quad 1 Area           P1          05/05/2005  01/01/2099  {NULL}
1P2 PROFESSOR       Sch 1 Quad 2 Area           P2          01/31/1996  02/06/1996  30
1P2 PROFESSOR       Sch 1 Quad 2 Area           P2          02/07/1996  01/01/2099  02
1P3 PROFESSOR       Sch 1 Quad 3 Area           P3          02/05/1996  02/06/1996  {NULL}
1P3 PROFESSOR       Sch 1 Quad 3 Area           P3          02/07/1996  01/01/2099  03
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         02/05/1996  11/05/2001  {NULL}
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         11/06/2001  03/18/2002  42
1S4 SUPERINTENDENT  Sch 1 CD Superintendent     1S4         03/19/2002  06/09/2009  42
1S4 SUPERVISOR      Sch 1 CO Supervisor 4       1S4         06/10/2009  01/01/2099  42
2S5 SUPERVISOR      Sch 2 CAO Supervisor 5      2S5         10/26/2002  06/09/2009  {NULL}
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  2S5         06/10/2009  07/14/2009  {NULL}
2S5 SUPERINTENDENT  Sch 2 CAO Superintendent 5  S5          07/15/2009  01/01/2099  {NULL}
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         02/05/1996  11/05/2001  {NULL}
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         11/06/2001  03/18/2002  42
3S4 SUPERINTENDENT  Sch 3 CD Superintendent     3S4         03/19/2002  06/09/2009  42
3S4 SUPERVISOR      Sch 3 CO Supervisor 4       3S4         06/10/2009  01/01/2099  42

很想听听任何可行的方法来解决这个问题。非常感谢。

补充:这是我迄今为止编写的用于折叠 TableA 中的记录的查询

SELECT ID, LOCN, TO_CHAR(MIN(START_DATE), 'MM/DD/YYYY') START_DATE, TO_CHAR(MAX(END_DATE), 'MM/DD/YYYY') END_DATE
        FROM
             (
              SELECT ID, LOCN, START_DATE, END_DATE, MAX(GRP) OVER (ORDER BY ID, START_DATE) GRP
              FROM
                  (
                   SELECT ID, LOCN,
                          CASE WHEN START_DATE - LAG(END_DATE) OVER (PARTITION BY ID, LOCN ORDER BY START_DATE ASC) <= 1 THEN
                            NULL
                          ELSE
                            ROWNUM
                          END GRP,
                          START_DATE,
                          NVL(END_DATE, SYSDATE) END_DATE
                   FROM TableA
                   ORDER BY ID ASC, START_DATE ASC
                  )
             )
        GROUP BY ID, LOCN, GRP
        ORDER BY ID ASC, START_DATE ASC;
4

1 回答 1

4

由于摇滚明星忙于过着放荡(如果来之不易)的生活方式,而忍者看起来他们会忙一段时间,我会试一试...

按照您的布局方式,您希望首先折叠连续的记录TableA并将该结果用于 (可能已折叠) TableB。我不确定将其作为一个单独的步骤是否是解决整体问题的理想选择,但我现在会继续这样做。我发现最简单的折叠行的一般方法是:

select id, locn, max(start_date) as start_date, max(end_date) as end_date
from (
    select id, locn,
        case when start_date = lag_end_date  + interval '1' day then null
            else start_date end as start_date,
        case when end_date = lead_start_date - interval '1' day then null
            else end_date end as end_date,
        row_number() over (partition by id order by start_date)
            - row_number() over (partition by id, locn
                order by start_date) as chain
    from (
        select id, locn, start_date, end_date,
            lead(start_date) over (partition by id, locn
                order by start_date) as lead_start_date,
            lag(end_date) over (partition by id, locn
                order by start_date) as lag_end_date
        from TableA
    )   
)
group by id, locn, chain
order by 1, 3, 2;

ID         LOCN       START_DATE END_DATE
---------- ---------- ---------- ----------
1P1        01         02/04/1996 05/04/2005
1P2        02         02/07/1996 01/01/2099
1P2        30         01/31/1996 02/06/1996
1P3        03         02/07/1996 01/01/2099
1S4        42         11/06/2001 01/01/2099
3S4        42         11/06/2001 01/01/2099

最里面的select用途leadlag窥视相邻的行(你在上一个问题中暗示过)。

下一层将连续值(即一行的开始日期是前一行的结束日期之后的第二天)设置为空;如果你只运行那部分,你会看到连续的范围开始和结束出现。它还添加了一个chain伪列,可以处理id切换回以前使用的locn; 说1P2要回去locn=30。(这是我最初在这里看到的一种方法,但也可以看到更多关于间隙和岛屿的信息)。如果没有这个,所有的“岛屿”id/locn都会被视为一个块,你最终会得到重叠的日期范围。

外层用户minmax删除空值并产生最终结果。

使用它——如果你在 11gR2 上——使用递归 CTE以递归方式加入以获得所有组合。这只是我对其中一个的第二次真正的尝试,所以其他人可能会指出缺陷或改进,如果他们能把自己从他们的 M&Ms 中撕下来……虽然可能会给你一些指示。

with a as (
    select id, locn, max(start_date) as start_date, max(end_date) as end_date
    from (
        select id, locn,
            case when start_date = lag_end_date  + interval '1' day then null
                else start_date end as start_date,
            case when end_date = lead_start_date - interval '1' day then null
                else end_date end as end_date,
            row_number() over (partition by id order by start_date)
                - row_number() over (partition by id, locn
                    order by start_date) as chain
        from (
            select id, locn, start_date, end_date,
                lead(start_date) over (partition by id, locn
                    order by start_date) as lead_start_date,
                lag(end_date) over (partition by id, locn
                    order by start_date) as lag_end_date
            from TableA
        )
    )
    group by id, locn, chain
),
b as (
    select id, posting, description, other_id, start_date, end_date,
        row_number() over (partition by id, posting, description,
            other_id order by start_date, end_date) as rn
    from TableB
),
r (id, posting, description, other_id, rn, start_date, end_date, locn) as (
    select b.id, b.posting, b.description, b.other_id, b.rn,
        b.start_date,
        case
            when not (a.start_date > b.end_date or a.end_date < b.start_date)
                and a.start_date <= b.end_date and a.end_date < b.end_date
                then a.end_date
            when not (a.start_date > b.end_date or a.end_date < b.start_date)
                and a.start_date <= b.end_date and a.start_date > b.start_date
                then a.start_date - interval '1' day
            else b.end_date
        end as end_date,
        case
            when a.start_date <= b.start_date and a.end_date >= b.start_date
                then a.locn
        end
    from b
    left join (
        select id, locn, start_date, end_date,
            row_number() over (partition by id order by start_date) as rn
        from a
    ) a on a.id = b.id
        and a.rn = 1
    union all
    select b.id, b.posting, b.description, b.other_id, b.rn,
        case
            when a.start_date is null then r.end_date + interval '1' day
            else a.start_date
        end as start_date,
        case
            when a.start_date is null then b.end_date
            when not (a.start_date > r.end_date or a.end_date < r.start_date)
                then least(a.end_date, b.end_date)
            when a.end_date < b.end_date then a.start_date - interval '1' day
            else b.end_date
        end as end_date,
        a.locn
    from b
    join r on r.id = b.id
        and r.posting = b.posting
        and r.description = b.description
        and r.other_id = b.other_id
        and r.rn = b.rn
        and r.start_date = b.start_date
        and r.end_date < b.end_date
    left join a on a.id = r.id
        and a.start_date > r.end_date
) 
select id, posting as unit_type, description, other_id,
    start_date, end_date, locn
from r
order by id, start_date;

我相信这会得到你想要的结果:

ID         UNIT_TYPE            DESCRIPTION                    OTHER_ID   START_DATE END_DATE   LOCN
---------- -------------------- ------------------------------ ---------- ---------- ---------- ----------
1P1        PROFESSOR            Sch 1 Quad 1 Area              P1         02/04/1996 05/04/2005 01
1P1        PROFESSOR            Sch 1 Quad 1 Area              P1         05/05/2005 01/01/2099
1P2        PROFESSOR            Sch 1 Quad 2 Area              P2         01/31/1996 02/06/1996 30
1P2        PROFESSOR            Sch 1 Quad 2 Area              P2         02/07/1996 01/01/2099 02
1P3        PROFESSOR            Sch 1 Quad 3 Area              P3         02/05/1996 02/06/1996
1P3        PROFESSOR            Sch 1 Quad 3 Area              P3         02/07/1996 01/01/2099 03
1S4        SUPERVISOR           Sch 1 CO Supervisor 4          1S4        02/05/1996 11/05/2001
1S4        SUPERVISOR           Sch 1 CO Supervisor 4          1S4        11/06/2001 03/18/2002 42
1S4        SUPERINTENDENT       Sch 1 CD Superintendent        1S4        03/19/2002 06/09/2009 42
1S4        SUPERVISOR           Sch 1 CO Supervisor 4          1S4        06/10/2009 01/01/2099 42
2S5        SUPERVISOR           Sch 2 CAO Supervisor 5         2S5        10/26/2002 06/09/2009
2S5        SUPERINTENDENT       Sch 2 CAO Superintendent 5     2S5        06/10/2009 07/14/2009
2S5        SUPERINTENDENT       Sch 2 CAO Superintendent 5     S5         07/15/2009 01/01/2099
3S4        SUPERVISOR           Sch 3 CO Supervisor 4          3S4        02/05/1996 11/05/2001
3S4        SUPERVISOR           Sch 3 CO Supervisor 4          3S4        11/06/2001 03/18/2002 42
3S4        SUPERINTENDENT       Sch 3 CD Superintendent        3S4        03/19/2002 06/09/2009 42
3S4        SUPERVISOR           Sch 3 CO Supervisor 4          3S4        06/10/2009 01/01/2099 42

17 rows selected.

这是使用三个 CTE。a如上所述,是 的折叠版本TableAbTableB,但是添加了一个行号列,我想我以后需要在递归期间保持记录的同步。r是乐趣开始的地方。

的第一部分r为每个TableB条目生成初始数据,TableA如果合适,匹配值来自 - 但如果可能有多个匹配记录,则仅来自第一个匹配记录。这里的棘手之处在于弄清楚end_date应该是什么。如果根本没有重叠TableA记录,那么它可能只是TableB结束日期;如果有,但它在TableB记录之后开始,那么这需要在开始之前立即结束TableA。否则,这取决于记录是在TableA记录之前还是之后结束TableB

只运行那部分:

with a as (...), b as (...)
select b.id, b.posting, b.description, b.other_id, b.rn,
    b.start_date,
    case
        when not (a.start_date > b.end_date or a.end_date < b.start_date)
            and a.start_date <= b.end_date and a.end_date < b.end_date
            then a.end_date
        when not (a.start_date > b.end_date or a.end_date < b.start_date)
            and a.start_date <= b.end_date and a.start_date > b.start_date
            then a.start_date - interval '1' day
        else b.end_date
    end as end_date,
    case
        when a.start_date <= b.start_date and a.end_date >= b.start_date
            then a.locn
    end
from b
left join (
    select id, locn, start_date, end_date,
        row_number() over (partition by id order by start_date) as rn
    from a
) a on a.id = b.id
    and a.rn = 1
order by id, start_date;

...给出了这个(为了便于阅读而抑制了描述):

ID         UNIT_TYPE            OTHER_ID   START_DATE END_DATE   LOCN
---------- -------------------- ---------- ---------- ---------- ----------
1P1        PROFESSOR            P1         02/04/1996 05/04/2005 01
1P2        PROFESSOR            P2         01/31/1996 02/06/1996 30
1P3        PROFESSOR            P3         02/05/1996 02/06/1996
1S4        SUPERVISOR           1S4        02/05/1996 11/05/2001
1S4        SUPERINTENDENT       1S4        03/19/2002 06/09/2009 42
1S4        SUPERVISOR           1S4        06/10/2009 01/01/2099 42
2S5        SUPERVISOR           2S5        10/26/2002 06/09/2009
2S5        SUPERINTENDENT       2S5        06/10/2009 07/14/2009
2S5        SUPERINTENDENT       S5         07/15/2009 01/01/2099
3S4        SUPERVISOR           3S4        02/05/1996 11/05/2001
3S4        SUPERINTENDENT       3S4        03/19/2002 06/09/2009 42
3S4        SUPERVISOR           3S4        06/10/2009 01/01/2099 42

12 rows selected.

对于IP3,最初没有匹配的TableA记录,但请注意end_date设置为稍后匹配的记录开始的前一天。

的第二部分r,union all是递归部分。对于每条TableB记录,它都会重新连接到自己寻找生成的记录end_date早于原始记录的记录,就像 的情况一样IP3,这意味着还有一段时间需要填写。然后它会寻找合适的TableA记录并生成和的合适值start_dateend_date同样取决于记录是否重叠以及如何重叠。我完全有可能在这里错过了一些边缘情况。

你提到可能有连续的范围也可以折叠TableB,你可以给它一个与我展示的类似的处理TableA。我不确定这样做是否一定是最好或最清楚的一点,即使只有一张桌子需要它;我只是在那里真正做到了,因为这就是你描述这个过程的方式。

如果您将递归 CTE 修改为针对基表(可能在此过程中稍微简化它),您可以对该结果集而不是单个表应用间隙和孤岛方法,因此哪个表无关紧要差距是由造成的。

于 2012-11-11T18:48:54.170 回答