sql - SQL - 转置间隔

Question

我正在尝试解决一个 SQL 问题（甚至不知道这是否可能）。让我试着解释一下。

我们希望将一个表中基于日期（间隔）的记录“范围”转置到另一个表中，该范围将作为 FROM/TO 结构保存。

例如，我们有以下起始表结构：

ID DATE
100  11-08-2012
100  12-08-2012
100  13-08-2012
100  17-08-2012
100  18-08-2012
101  01-09-2012
...

我们想要下表作为结果：

ID   FROM_DATE   TO_DATE
100  11-08-2012  13-08-2012
100  17-08-2012  18-08-2012
...

间隔保存在 FROM/TO 字段中，在单个日期间隔的情况下，两个字段中保存相同的日期。

有没有办法使用 SQL 来做到这一点？

score 2 · Accepted Answer

这在任何支持 ROW_NUMBER() 的数据库中使用纯 SQL（无过程或用户定义函数）是非常可行的。这是一个带有sql fiddle的 SQLSERVER 2008 实现。

-- Create a virtual table with 2 rows that is used to convert a single row
-- into 2 rows when the range is only a single day
with events as (
  select 'start' event 
  union all 
  select 'stop' event
),
-- Sort the data by date, partioning by ID, and assign a row number
sorted_dates as ( 
  select id, 
         dt, 
         row_number() over(partition by id order by dt) sorted_rownum
    from t
),
-- Find the dates that begin and end the ranges. Assign new row numbers
-- so that the START and STOP row numbers are always consecutive.
-- Convert a date that both starts and ends the range into two rows.
pruned_dates as (
  select d1.id, 
         e.event, 
         d1.dt,
         row_number() over(partition by d1.id order by d1.sorted_rownum, e.event) pruned_rownum
    from sorted_dates d1
    -- Look for a previous date that is the same day or 1 day earlier
    left outer join sorted_dates d0
      on d1.id=d0.id
     and d1.sorted_rownum  = d0.sorted_rownum+1
     and datediff(d, d0.dt, d1.dt)<=1
    -- Look for a next date that is the same day or 1 day later.
    left outer join sorted_dates d2
      on d1.id=d2.id
     and d1.sorted_rownum = d2.sorted_rownum-1
     and datediff(d, d1.dt, d2.dt)<=1
    -- Identify the record as a START date if there does not exist a prior date
    -- that is the same date or 1 day earlier.
    -- Identify the record as a STOP date if there does not exist a subsequent
    -- date that is the same date or 1 day later.
    left outer join events e
      on (d0.id is null and e.event='start')
      or (d2.id is null and e.event='stop')
   -- Ignore records that have not been identified as START or STOP records.
   where e.event is not null
)
-- Pair the START and STOP records and report the results
select d1.id,
       d1.dt from_date,
       d2.dt to_date
  from pruned_dates d1
  join pruned_dates d2
    on d1.id=d2.id
   and d1.pruned_rownum = d2.pruned_rownum-1
 where d1.event='start'
;

使用支持 LEAD() 和 LAG() 的数据库，该解决方案更简单、更高效。这是带有sql fiddle的 SqlServer 2012 实现。

-- Create a virtual table with 2 rows that is used to convert a single row
-- into 2 rows when the range is only a single day
with events as(
  select 'start' event
  union all
  select 'stop' event
),
-- Use LAG() to get the previous date and LEAD() to get the next date.
-- The previous and/or next date may not exist, or it may be more than 
-- one day away.
dates as(
  select id,
         dt,
         lag(dt,1,'01/01/1900')  over(partition by id order by dt) prev_dt,
         lead(dt,1,'12/31/9999') over(partition by id order by dt) next_dt
    from t
),
-- Discard rows where both the previous and next dates are <= 1 day away.
-- Identify the remaining rows as either START or STOP.
-- Convert any date that both starts and stops a range into 2 rows.
-- For each remaining row, use LEAD() to get the subsequent remaining row.
-- At this point there are valid rows that have START in FROM and STOP in TO,
-- but also invalid rows that have STOP in FROM and NULL or START in TO. But
-- the invalid rows are required for LEAD() to give the correct value.
pruned_dates as(
  select id,
         event,
         dt from_date,
         lead(dt,1) over(partition by id order by dt, event) to_date
    from dates d
    join events e
      on (e.event='start' and datediff(d,prev_dt,dt)>1)
      or (e.event='stop'  and datediff(d,dt,next_dt)>1)
)
-- Filter out the unwanted rows, preserving the rows with START in FROM
-- and STOP in TO.
select id,
       from_date,
       to_date
  from pruned_dates
 where event='start'

score 0 · Accepted Answer

有可能的。您需要使用嵌套查询。

如果你会解释日期选择的逻辑，我可以试着给你一个更好的例子

例如：

SELECT A.ID, MIN(date) as FROM_DATE, max(date) as TO_DATE FROM ( select ID, DATE FROM sourceTable) group by A.id

score 0 · Accepted Answer

嗯，这行得通，但它有点乱。

SELECT id, date1 AS 'StartDate', 
    MAX(CASE WHEN date < ISNULL(date2,'1/1/2050') THEN date END) AS 'EndDate'
FROM table1
JOIN (
    SELECT *
    FROM (
        SELECT ROW_NUMBER() OVER (ORDER BY t1.date) AS rn1, t1.id AS 'id1', t1.date AS 'date1' 
        FROM table1 t1
        LEFT OUTER JOIN table1 t2
            ON t1.id = t2.id AND DATEDIFF(dd,t1.date,t2.date) = -1
        WHERE t2.date IS NULL
        ) AS sub1

    LEFT OUTER JOIN (
        SELECT ROW_NUMBER() OVER (ORDER BY t1.date) AS rn2, t1.id AS 'id2', t1.date AS 'date2'
        FROM table1 t1
        LEFT OUTER JOIN table1 t2
            ON t1.id = t2.id AND DATEDIFF(dd,t1.date,t2.date) = -1
        WHERE t2.date IS NULL
        ) AS sub2 ON id1 = id2 AND rn1 = rn2 - 1

    ) AS sub ON id=id1
GROUP BY id, date1

基本上，我将表格加入到自身中，并且只获取没有对应的先前连续日期的日期。这给了我每个范围的开始日期。然后我将该查询加入自身，但加入行号 - 1 以将第二个日期偏移一个，因此每个开始日期都与下一个开始日期在一行中。最后，我发现每个开始日期小于下一个开始日期的最大日期。

这是创建测试表的代码。您需要在其中放入一些数据：

CREATE TABLE [dbo].[table1](
    [pk] [int] IDENTITY(1,1) NOT NULL,
    [id] [varchar](10) NULL,
    [date] [datetime] NULL
) ON [PRIMARY]

score 0 · Accepted Answer

I assume SQL Server.

with 
    pairs(id,start,finish) as (
        select
            id1.Id as ID,id1.[Date] as start,id2.date as finish 
        from
            IdDate as id1 
            inner join IdDate id2 
            on id1.id=id2.id and DATEADD(DAY,1,id1.Date)=id2.date),
    starters(id,start) as (
        select
            pair1.id,pair1.start
        from
            pairs as pair1
        where
            pair1.start not in (select finish from pairs)),
    finishers(id,finish) as (
        select
            pair1.id,pair1.finish
        from 
            pairs as pair1
        where
            pair1.finish not in (select start from pairs))
select 
    s.id,s.start,finishers.finish 
from 
    starters as s, finishers 
where 
    finishers.finish > s.start and 
    (finishers.finish < (select MIN(start) from starters where start>s.start) or 
     (s.start=(select max(start) from starters) and 
      finishers.finish > (select MAX(start) from starters where start=s.start)))

Input

100 2012-08-11 00:00:00.000
100 2012-08-12 00:00:00.000
100 2012-08-13 00:00:00.000
100 2012-08-17 00:00:00.000
100 2012-08-18 00:00:00.000
100 2012-09-01 00:00:00.000
100 2012-09-02 00:00:00.000
100 2012-09-03 00:00:00.000
100 2012-09-04 00:00:00.000
100 2012-09-05 00:00:00.000

Output

id  start   finish
100 2012-08-11 00:00:00.000 2012-08-13 00:00:00.000
100 2012-08-17 00:00:00.000 2012-08-18 00:00:00.000
100 2012-09-01 00:00:00.000 2012-09-05 00:00:00.000

score 0 · Accepted Answer

我认为这不可能直接在查询中实现。您需要用高级语言编写一些代码或为其编写程序。

在这种情况下，您只需要简单地获取特定 ID 的行，按日期（？）对它们进行排序并获取结果的第一行和最后一行。您现在可以填充FROM_DATE并TO_DATE使用此逻辑。

sql - SQL - 转置间隔

5 回答 5

Related

Reference