2

我正在编写脚本以从具有数百万行的数据库中获取数据,并且遇到了周期间隙的问题。我们已经决定,小于 10 天的间隔根本不应被视为间隔。因此,应该删除这些空白(参见下面的示例。粗体日期构成“真实”感兴趣的时期)

  • ID InDate OutDate
  • 1 2008-10-10 2009-02-05
  • 1 2009-02-08 2009-05-13
  • 1 2011-01-01 2011-05-20
  • 2 2007-03-17 2008-10-19
  • 2 2009-05-30 2010-10-12
  • 2 2010-10-14 2010-12-31

因此,出现了几个问题。第一个问题是确定哪些 Outdates 和 Indates 彼此接近,以便将其转换为一个时期。下一个问题是将 Outdate 从较高的行号移动到较低的行号(即在表上)。最后一个问题是识别并删除现在重复的行。

我试图解决下面的问题。表#t4a 解决了前两个问题。表#t4aa 中的策略是通过在新(虚拟)变量中标记有问题的重复行来消除重复,并在稍后阶段消除所有此类值(1:s)。但是,它不起作用!所有行都标有 0,即使是那些应该标有 1 的行。有什么建议吗?

--此临时表测量间隙并创建一个新变量 OutDate2,在间隙很小(小于 11 天)的情况下,该变量在行上写入下一个 Outdate,而不是原始值。

WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)  
SELECT cur.Rownum, cur.Id, cur.InDate CurInDate, cur.OutDate, nxt.InDate NxtInDate, DATEDIFF(day, cur.OutDate, nxt.InDate) Number_of_days,   
  CASE WHEN DATEDIFF(day, cur.OutDate, nxt.InDate)<11 AND DATEDIFF(day, cur.OutDate, nxt.InDate)>0 THEN nxt.OutDate ELSE cur.OutDate END AS OutDate2  
INTO #t4a  
FROM C cur  
LEFT OUTER JOIN C nxt ON (nxt.rownum=cur.rownum+1 AND nxt.Id=cur.Id)

--此临时表创建一个虚拟表,用于标识行的重叠,以便在以后的临时表中消除这些行。正是这张桌子不起作用。

WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) rownum FROM #t4a)  
SELECT cur.Id, cur.InDate, nxt.OutDate2,   
  CASE WHEN cur.OutDate2 < nxt.InDate THEN 1.0 ELSE 0.0
  END AS Overlap  
INTO #t4aa  
FROM C cur  
LEFT OUTER JOIN C nxt on (cur.rownum=nxt.rownum+1 AND cur.Id=nxt.Id)
4

2 回答 2

1

这是一种概念性的,但可能会给你一些想法

WITH C AS 
(SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4) 

    select Cgood.* 
    from c 
    join C as Cgood 
      on Cgood.ID = C1.ID 
     and Cgood.Rownum = C.Rownum + 1
     and DATEDIFF(day, C.OutDate, nxt.InDate)>=11
    group by Cgood.* 
    union 
    select Cgood.*
    from c  
    join C as Cgood 
      on Cgood.ID = C1.ID 
     and Cgood.Rownum = 1 
     and C.Rownum = 2 
     and DATEDIFF(day, C.OutDate, nxt.InDate)>=11
    group by Cgood.* 
    union
    select cMerge.ID, c.Indate, cMerge.OutDate
    from c
    join C as cMerge 
      on cMerge.ID = C1.ID 
     and cMerge.Rownum = C.Rownum + 1
     and DATEDIFF(day, C.OutDate, cMerge.InDate) < 11
    group by cMerge.ID, c.Indate, cMerge.OutDate
    union
    select cMerge.ID, c.Indate, cMerge.OutDate
    from c
    join C as cMerge 
      on cMerge.ID = C1.ID 
     and cMerge.Rownum = 1 
     and C.Rownum = 2
     and DATEDIFF(day, C.OutDate, cMerge.InDate) < 11
    group b
于 2012-10-26T21:12:49.977 回答
1

我昨天解决了我自己的问题。我摆脱了最后一个临时表,并在第一个临时表中创建了虚拟变量。解决方案的核心是向后和向前连接。

WITH C AS (SELECT Id, InDate, OutDate, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY InDate) Rownum FROM #t4 t4)  
SELECT cur.Rownum, cur.Id, cur.InDate CurInDate, cur.OutDate, nxt.InDate NxtInDate, DATEDIFF(day, cur.OutDate, nxt.InDate) Number_of_days,  
CASE  
WHEN DATEDIFF(day, prv.OutDate, cur.InDate)<11  
AND DATEDIFF(day, prv.OutDate, cur.InDate)>0  
THEN 1.0  
ELSE 0.0  
END AS Overlap,      
CASE  
WHEN DATEDIFF(day, cur.OutDate, nxt.InDate)<11  
AND DATEDIFF(day, cur.OutDate, nxt.InDate)>0  
THEN nxt.OutDate  
ELSE cur.OutDate  
END AS OutDate2  
INTO #t4a  
FROM C cur  
LEFT OUTER JOIN C prv ON (prv.rownum=cur.rownum-1 AND prv.Id=cur.Id)  
LEFT OUTER JOIN C nxt ON (nxt.rownum=cur.rownum+1 AND nxt.Id=cur.Id)
于 2012-10-28T07:04:55.640 回答