1

I have a table with 3 column (id(int),date(date),Status(bool)) .

like this

id  date        Status
1   2012-10-18  1
1   2012-10-19  1
1   2012-10-20  0
1   2012-10-21  0
1   2012-10-22  0
1   2012-10-23  0
1   2012-10-24  1
1   2012-10-25  0
1   2012-10-26  0
1   2012-10-27  0
1   2012-10-28  1
2   2012-10-19  0
2   2012-10-20  0
2   2012-10-21  0
2   2012-10-22  1
2   2012-10-23  1

assume that date column are sequential and there is no gap between dates.

How can I find all 3 sequential zeros (in Status column) and their next day status ?

like this

id  startDate     endDate       NextDayStatus
1   2012-10-20    2012-10-22         0
1   2012-10-21    2012-10-23         1
1   2012-10-25    2012-10-27         1
2   2012-10-19    2012-10-21         1

table creation script and sample data

CREATE TABLE [Table1](
    [ID] [smallint] NOT NULL,
    [Date] [date] NOT NULL,
    [Status] [bit] NULL,
 CONSTRAINT [PK_table1] PRIMARY KEY CLUSTERED  (  [ID] ASC,   [Date] ASC ) )

INSERT INTO [Table1]([ID], [Date], [Status])     
SELECT 1, '2012-10-18', 1    UNION ALL
SELECT 1, '2012-10-19', 1    UNION ALL
SELECT 1, '2012-10-20', 0    UNION ALL
SELECT 1, '2012-10-21', 0    UNION ALL
SELECT 1, '2012-10-22', 0    UNION ALL
SELECT 1, '2012-10-23', 0    UNION ALL
SELECT 1, '2012-10-24', 1    UNION ALL 
SELECT 1, '2012-10-25', 0    UNION ALL
SELECT 1, '2012-10-26', 0    UNION ALL
SELECT 1, '2012-10-27', 0    UNION ALL
SELECT 1, '2012-10-28', 1    UNION ALL
SELECT 2, '2012-10-19', 0    UNION ALL
SELECT 2, '2012-10-20', 0    UNION ALL
SELECT 2, '2012-10-21', 0    UNION ALL
SELECT 2, '2012-10-22', 1    UNION ALL
SELECT 2, '2012-10-23', 1

update:

  • if it matters, after this step i only need to filter out the days that are first ,10th or 20th of the month
  • with many thanks to Tomalak and gnb ,in my real task the number of consecutive zeros is 9 instead of 3 in this sample , so using 9 inner joins or cross apply seems inefficient
4

3 回答 3

4

为 ID 分区编辑、更新

如果日期不连续,这也适用

SELECT        T1.id, T1.[Date], MAX(X.[Date]), Y.[Status]
FROM     Table1 T1       
   CROSS APPLY
   (  SELECT TOP 3 *
   FROM            Table1 T2
   WHERE           T2.id = T1.id AND T2.[Date] >= T1.Date
   ORDER BY        T2.[Date]
   ) X
   CROSS APPLY
   ( SELECT TOP 4 *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY T3.[Date]) AS rn
   FROM            Table1 T3
   WHERE           T3.id = T1.id AND T3.[Date] >= T1.Date
   ORDER BY        T3.[Date]
   ) Y
WHERE        y.rn = 4
GROUP BY     T1.id, T1.[Date], Y.[Status]
HAVING       SUM(CAST(X.[Status] AS tinyint)) = 0;

为了完整起见,这是更优雅的 SQL Server 2012 解决方案
这可以与任何具有适当窗口/分析支持的 RDBMS 一起使用

SELECT
    X.id, X.startDate, X.endDate, x.nextStatus
FROM
    ( SELECT        T1.id, T1.[Date] AS startDate,
        LEAD(T1.[Date], 2) OVER (PARTITION BY T1.id ORDER BY T1.[Date]) AS endDate,
        LEAD(T1.[Status], 3) OVER (PARTITION BY T1.id ORDER BY T1.[Date]) AS nextStatus,
        SUM(CAST(T1.[Status] AS tinyint)) OVER (PARTITION BY T1.id ORDER BY T1.[Date] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS SumNext3
    FROM            Table1 T1
    ) X
WHERE        SumNext3 = 0;
于 2013-06-20T07:48:09.980 回答
3
SELECT
  z1.id, z1.[date] AS startDate ,z3.[date] AS endDate, zn.status AS NextDayStatus
FROM 
  Table1 z1
  INNER JOIN Table1 z2 ON z2.[date] = (
    SELECT MIN([date]) FROM Table1 WHERE [date] > z1.[date] AND id = z1.id
  )
  INNER JOIN Table1 z3 ON z3.Date = (
    SELECT MIN([date]) FROM Table1 WHERE [date] > z2.[date] AND id = z1.id
  )
  INNER JOIN Table1 zn ON zn.Date = (
    SELECT MIN([date]) FROM Table1 WHERE [date] > z3.[date] AND id = z1.id
  )
WHERE 
  z1.status = 0
  AND z2.status = 0 AND z2.id = z1.id
  AND z3.status = 0 AND z3.id = z1.id
  AND zn.id = z1.id
ORDER BY
  z1.id, z1.[date]

Table1 上的索引(date, status, id)将是最佳的。

于 2013-06-20T07:51:44.080 回答
2

这是另一种解决方案,它也适用于许多 SQL 产品(那些支持窗口函数的产品),但尤其适用于 SQL Server 2005 和更高版本:

WITH partitioned AS (
  SELECT
    *,
    grp = DATEDIFF(DAY, 0, Date)
        - ROW_NUMBER() OVER (PARTITION BY ID, Status ORDER BY Date)
  FROM Table1
),
grouped AS (
  SELECT
    ID,
    SD = MIN(Date),
    ED = MAX(Date)
  FROM partitioned
  WHERE Status = 0
  GROUP BY
    ID,
    grp
  HAVING COUNT(*) >= 3
)
SELECT
  t.ID,
  StartDate     = t.Date,
  EndDate       = DATEADD(DAY, 2, t.Date),
  NextDayStatus = CASE t.Date WHEN DATEADD(DAY, -2, g.ED) THEN 1 ELSE 0 END
FROM Table1 t
INNER JOIN grouped g
ON t.ID = g.ID AND t.Date BETWEEN g.SD AND DATEADD(DAY, -2, g.ED)
;

这个想法是检测 的所有“孤岛” Status = 0,选择那些至少有 3 行的“孤岛”,然后将聚合孤岛集连接回原始表,以获得符合条件的行作为 3 个连续Status = 0行的所需子集的开始。

不过需要注意的是:此解决方案假定任何 3 个连续的状态 0 行之后至少有一个具有相同 ID 的其他行。换句话说,状态 0 行的最后一个匹配集应该跟在状态 1 行之后,因为无论如何结果集都会表明这一点。

于 2013-06-25T06:48:30.193 回答