1

我有一个包含人员 ID 和日期范围(开始日期和停止日期)的表格。每个人可能有多个具有多个开始和结束日期的行。

create table #DateRanges (
   tableID   int not null,
   personID  int not null,
   startDate date,
   endDate   date
);
insert #DateRanges (tableID, personID, startDate, endDate)
values (1, 100, '2011-01-01', '2011-01-31') -- Just January
     , (2, 100, '2011-02-01', '2011-02-28') -- Just February
     , (3, 100, '2011-04-01', '2011-04-30') -- April - Skipped March
     , (4, 100, '2011-05-01', '2011-05-31') -- May
     , (5, 100, '2011-06-01', '2011-12-31') -- June through December

我需要一种方法来折叠相邻的日期范围(前一行的结束日期正好是下一行的开始日期前一天)。但它必须包括所有连续的范围,仅当端到端差距大于一天时才拆分。上述数据需要压缩成:

+-----------+----------+--------------+------------+
| SomeNewID | PersonID | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
|        1  |     100  |   2011-01-01 | 2011-02-28 |
+-----------+----------+--------------+------------+
|        2  |     100  |   2011-04-01 | 2011-12-31 |
+-----------+----------+--------------+------------+

只有两行,因为唯一缺少的范围是三月。现在,如果所有 March 都存在,无论是一排还是多排,压缩将导致只有一排。但如果只有 3 月中旬的两天,我们会得到第三行来显示 3 月的日期。

我一直在使用 SQL 2016 中的 LEAD 和 LAG 函数来尝试将其作为记录集操作来完成,但到目前为止都是空的。我希望能够在没有循环和 RBAR 的情况下做到这一点,但我没有看到解决方案。

4

2 回答 2

0

在为此工作了几天之后,我想我有一个我想分享的解决方案,以防其他人需要类似的东西。我使用了一些 CTE 来查找超前、滞后和间隔时间,将行提炼到仅重要的开始和停止日期,然后使用更多的超前和滞后来找到压缩的开始和停止日期。可能有更简单的方法,但我认为这可以很好地处理日级分辨率。

with LeadAndLagAndGap as (
   select
      tableid,
      personID,
      startDate,
      endDate,
      lag(endDate) over (partition by personID order by startDate) as previousEnd,
      lead(startDate) over (partition by personID order by startDate) as nextStart,
      coalesce(datediff(day,endDate,lead(startDate) over (partition by personID order by startDate))-1,0) as gap
   from
      #DateRanges
), OnlyStartAndEndRows as (
   select
      tableid,
      personID,
      startDate,
      endDate,
      previousEnd,
      nextStart,
      gap
   from
      LeadAndLagAndGap
   where
      previousEnd is null  -- Definitely FIRST record in a range
      or nextStart is null -- Definitely LAST record in a range
      or gap > 0           -- Definitely an end of a range, nextStart is definitely the start of a range.
), PreCollapseReaggregate as (
   select
      tableid,
      personID,
      startDate,
      endDate,
      previousEnd,
      nextStart,
      gap,
      case
         when previousEnd is null then startDate
         when gap > 0 then nextStart
      end as DefiniteStart,
      case
         when nextStart is null then endDate
         when gap > 0 then endDate
      end as DefiniteEnd
   from
      OnlyStartAndEndRows
), Collapsed as (
   select
      tableid,
      personID,
      DefiniteStart as startDate,
      case
         when definiteEnd is null or gap > 0 then lead(definiteEnd) over (partition by personid order by startdate)
         when definiteStart is not null and DefiniteEnd is not null then definiteEnd
      end as endDate
     from PreCollapseReaggregate
)
select * from Collapsed
where enddate is not null
于 2017-05-08T17:50:19.453 回答
0

您可以使用滞后并获取正确的存储桶,然后按如下方式进行分组:

;with cte1 as (
    select *,dtdiff = datediff(day, lag(startdate, 1, null) over (partition by personid order by startdate), startDate) --Getting date difference for grouping
     from #DateRanges
        ),
cte2 as (
    select *, grp = sum(case when dtdiff is null or dtdiff>50 then 1 else 0 end) over (order by startdate) -- Creating bucket for min/max
        from cte1
        )
        select SomeNewId = Row_Number() over (order by (select null)), Personid, NewStartDate = min(startdate), NewEndDate = max(enddate) --Getting min/max based on bucket
            from cte2 group by PersonId, grp

你的输出:

+-----------+----------+--------------+------------+
| SomeNewId | Personid | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
|         1 |      100 | 2011-01-01   | 2011-02-28 |
|         2 |      100 | 2011-04-01   | 2011-12-31 |
+-----------+----------+--------------+------------+

我的测试输入:

insert #DateRanges (tableID, personID, startDate, endDate)
values (1, 100, '2011-01-01', '2011-01-31') -- Just January
     , (2, 100, '2011-02-01', '2011-02-28') -- Just February
     , (3, 100, '2011-04-01', '2011-04-30') -- April - Skipped March
     , (4, 100, '2011-05-01', '2011-05-31') -- May
     , (5, 100, '2011-06-01', '2011-06-30') -- More gaps
     , (6, 100, '2011-07-01', '2011-07-31') -- More gaps
     , (7, 100, '2011-08-01', '2011-08-31') -- More gaps
     , (8, 100, '2011-10-01', '2011-10-31') -- More gaps
     , (9, 100, '2011-11-01', '2011-11-30') -- More gaps

测试数据的输出:

+-----------+----------+--------------+------------+
| SomeNewId | Personid | NewStartDate | NewEndDate |
+-----------+----------+--------------+------------+
|         1 |      100 | 2011-01-01   | 2011-02-28 |
|         2 |      100 | 2011-04-01   | 2011-08-31 |
|         3 |      100 | 2011-10-01   | 2011-11-30 |
+-----------+----------+--------------+------------+
于 2017-05-04T19:29:34.283 回答