0

I have a requirement where I will have to split overlapping records on a given table with 2 date fields.

Consider this to be my input table TableT.

ID EFFECTIVE_DATE END_DATE
JKL 2016-01-01 2016-12-31
JKL 2016-04-01 2016-12-31
JKL 2016-01-01 2016-03-04
JKL 2016-04-01 2016-12-31
JKL 2016-01-01 2016-12-31

I would want my output to look like below. I need to achieve this in both SQL Server and Oracle\DB2 so I am looking for a generic solution.

ID EFFECTIVE_DATE END_DATE
JKL 2016-01-01 2016-03-04
JKL 2016-03-05 2016-03-31
JKL 2016-04-01 2016-12-31

This is what I have tried

With EndDates as (
    select END_DATE as END_DATE,TRIM(ID) as ID  FROM TableT  
    union all
    select ADD_DAYS(EFFECTIVE_DATE, -1) as END_DATE,TRIM(ID) as ID FROM TableT 
), Periods as (
    select ID as ID,MIN(EFFECTIVE_DATE) as EFFECTIVE_DATE,
                (select MIN(END_DATE) from EndDates e
                 where e.ID = t.ID and
                 e.END_DATE >= MIN(EFFECTIVE_DATE)) as END_DATE
    from
        TableT t  
    group by ID),
    EXTN_PERIOD as (select p.ID as ID, ADD_DAYS(p.END_DATE, 1) as EFFECTIVE_DATE,e.END_DATE as END_DATE
    from
        Periods p
            inner join
        EndDates e
            on
                p.ID = e.ID and
                p.END_DATE < e.END_DATE
    where
        not exists (select * from EndDates e2 where
                e2.ID = p.ID and
                e2.END_DATE > p.END_DATE and
                e2.END_DATE < e.END_DATE)
)
select * from EXTN_PERIOD
union
select * from PERIODS

It works partially fine but does not give me the desired output.

This is what the output I get when I run the above query:

ID EFFECTIVE_DATE END_DATE
JKL 2016-01-01 2016-03-04
JKL 2016-03-05 2016-03-31

Thanks in advance!

4

2 回答 2

1
WITH 
/*
MY_TAB (ID, EFFECTIVE_DATE, END_DATE) AS
(
VALUES
  ('JKL', DATE('2016-01-01'), DATE('2016-12-31'))
, ('JKL', DATE('2016-04-01'), DATE('2016-12-31'))
, ('JKL', DATE('2016-01-01'), DATE('2016-03-04'))
, ('JKL', DATE('2016-04-01'), DATE('2016-12-31'))
, ('JKL', DATE('2016-01-01'), DATE('2016-12-31'))
)
, 
*/ 
A AS 
(
SELECT DISTINCT T.ID, DECODE(V.I, 1, T.EFFECTIVE_DATE, 2, T.END_DATE + 1) DT
FROM MY_TAB T, (VALUES 1, 2) V(I)
)
, INTL AS 
(
SELECT 
  ID
, LAG(DT) OVER (PARTITION BY ID ORDER BY DT) AS EFF_DT
, DT AS END_DT
FROM A
)
SELECT ID, EFF_DT, END_DT - 1 AS END_DT
FROM INTL
WHERE EFF_DT IS NOT NULL
ORDER BY 1, 2;

Almost universal. The only customization is the way the "virtual" table with the correlation name V of 2 rows (with INTEGERS 1 and 2) is generated.
The idea is to convert your data first to [inclusive, exclusive) form to simplify further calculations. Then we merge all effective and end dates and construct intervals using the OLAP LAG function. Finally we revert to your [inclusive, inclusive] form.

db<>fiddle link to test.

于 2020-12-23T10:12:42.280 回答
0

In Oracle you could do something like this:

with
  tablet (id, effective_date, end_date) as (
    select 'JKL',   date '2016-01-01', date '2016-12-31' from dual union all
    select 'JKL',   date '2016-04-01', date '2016-12-31' from dual union all
    select 'JKL',   date '2016-01-01', date '2016-03-04' from dual union all
    select 'JKL',   date '2016-04-01', date '2016-12-31' from dual union all
    select 'JKL',   date '2016-01-01', date '2016-12-31' from dual
  )
, prep (id, dt) as (
    select  distinct id, case col when 'EFF' then val else val + 1 end
    from    tablet
    unpivot (val for col in (effective_date as 'EFF', end_date as 'END'))
  )
, almost_done (id, effective_date, end_date) as (
    select id, dt, lead(dt) over (partition by id order by dt) - 1
    from   prep
  )
select id, effective_date, end_date
from   almost_done
where  end_date is not null
;

ID  EFFECTIVE_DATE END_DATE  
--- -------------- ----------
JKL 2016-01-01     2016-03-04
JKL 2016-03-05     2016-03-31
JKL 2016-04-01     2016-12-31

Notice the first CTE (tablet, used to generate testing data - you don't need it in your real-life case). Then, the first step is to unpivot the data; I don't know how SQL Server supports unpivoting, worst case you can do it manually with a cross join. (NOT with UNION ALL - that is inefficient.) Then you remove duplicates, and the rest is easy with the LEAD analytic function, which SQL Server should support too.

于 2020-12-23T07:44:37.530 回答