0

I am looking at a data set of Emergency Room visits. I only want to keep visits per ID that are 30 days apart. So as an example say I have this below.

If I start with ID=1:

  • In Row 1 I can see that the lag between row 1 and 2 is 15 days so I will exclude, or for now flag, row 2.
  • Then I will continue to use Row 1 to evaluate Row 3. Again this is only 17 days so I will exclude Row 3 and look at Row 4.
  • Row 4 is 30 days away so I keep it and then use Row 4 to evaluate Row 5....and so on.

I have been trying to do this with the lag function but I can't figure out how to utilize the lag when I have to continue to use the 'anchor' row to evaluate several rows.

Top is what I have and bottom is what I want. Any ideas?

I am using AZURE data studio.

HAVE

Row#  ID  DATE
 1    1   1/1/2020
 2    1   1/15/2020
 3    1   1/17/2020
 4    1   2/4/2020
 5    1   3/15/2020
 6    2   1/15/2020
 7    2   3/15/2020
 8    2   3/18/2020

WANT

Row#  ID  DATE
 1    1   1/1/2020
 4    1   2/4/2020
 5    1   3/15/2020
 6    2   1/15/2020
 7    2   3/15/2020
4

2 回答 2

0

教程页面应该让您开始使用基于光标的解决方案。

于 2020-05-26T20:35:33.080 回答
0

你不使用循环。你继续使用 LAG,你一开始是对的。

;WITH dateLagged AS (
    SELECT 
        ID
     ,  Date
     ,  Diff = ISNULL(DATEDIFF(day,LAG(Date,1) OVER(PARTITION BY ID ORDER BY ID, Date), Date),0) 
    FROM dbo.EmergencyRoom),
 DiffCumulated AS (
    SELECT 
       ID
    ,  Date
    ,  CumDiff = SUM(Diff) OVER(PARTITION BY ID  ORDER BY ID, Date) 
    FROM dateLagged
 ),
 AnchorsMarked AS (
    SELECT
       ID
    ,  Date
    ,  Marker =  IIF(CumDiff = 0 
                  OR CumDiff > 30 AND LAG(CumDiff,1) OVER(ORDER BY ID, Date) < 30 
                  OR CumDiff - LAG(CumDiff,1) OVER(ORDER BY ID, Date) > 30, 1,0)
    FROM  DiffCumulated
  )
SELECT 
   ID
,  Date 
FROM AnchorsMarked WHERE Marker = 1

根据经验:如果你想在 SQL 中使用循环,那么你在某个地方走错了路。SQL 中很少有需要循环的问题,它不是其中之一。

于 2020-05-26T21:57:42.053 回答