6

我一直在做以下工作,但没有得到任何结果,而且截止日期很快就要到了。此外,还有超过一百万行,如下所示。感谢您对以下内容的帮助。

目标:按成员对结果进行分组,并通过组合各个日期范围来为每个成员构建连续覆盖范围,这些日期范围可以重叠或彼此连续运行,在范围的开始和结束日期之间没有中断。

我有以下格式的数据:

MemberCode  -----   ClaimID   -----       StartDate   -----       EndDate
00001   -----       012345   -----       2010-01-15   -----       2010-01-20
00001   -----       012350   -----       2010-01-19   -----       2010-01-22
00001   -----       012352   -----       2010-01-20   -----       2010-01-25
00001   -----       012355   -----       2010-01-26   -----       2010-01-30
00002   -----       012357   -----       2010-01-20   -----       2010-01-25
00002   -----       012359   -----       2010-01-30   -----       2010-02-05
00002   -----       012360   -----       2010-02-04   -----       2010-02-15
00003   -----       012365   -----       2010-02-15   -----       2010-02-30

...

在上面的成员(00001)是一个有效的成员,因为有一个从2010-01-152010-01-30的连续日期范围(没有间隙)。请注意,该成员的索赔 ID 012355紧邻索赔 ID 012352的结束日期开始。这仍然有效,因为它形成了一个连续的范围。

但是,成员 ( 00002 ) 应该是无效成员,因为在声明 ID 012357 的结束日期和声明 ID 012359的开始日期之间有 5 天的间隔

我想要做的是只列出那些在连续日期范围(每个成员)的每一天都有声明的成员的列表,每个成员的 MIN(开始日期)和 Max(结束日期)之间没有间隙与众不同的成员。有差距的成员被丢弃。

提前致谢。

更新:

我已经到了这里。笔记:FILLED_DT = Start Date & PresCoverEndDT = End Date

SELECT PresCoverEndDT, FILLED_DT 

FROM 

(

    SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN

    FROM Temp_Claims_PRIOR_STEP_5 T1

    WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

            WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT 

            AND T1.MBR_KEY = T2.MBR_KEY )

) T1

    JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN

        FROM Temp_Claims_PRIOR_STEP_5 T1

        WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

             WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2

     ON T1.RN - 1 = T2.RN

WHERE   PresCoverEndDT < FILLED_DT 

上面的代码似乎有错误,因为我只得到一行,而且它也不正确。我想要的输出只有 1 列,如下所示:

Valid_Member_Code

00001

00007

00009

... ETC。,

4

3 回答 3

5

试试这个:http ://www.sqlfiddle.com/#!3/c3365/20

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

在此处查看查询进度:http ://www.sqlfiddle.com/#!3/c3365/20


它是如何工作的,将当前结束日期与其下一个开始日期进行比较并检查日期间隔:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;

输出:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |  -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |  -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |   1 |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |   5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |  -1 |

然后检查成员是否具有相同的索赔计数,并且其总索赔没有差距:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;

输出:

| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
|          1 |     3 |             3 |
|          2 |     2 |             1 |

最后,过滤他们,在他们的声明中没有空白的成员:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

输出:

| MEMBERCODE |
--------------
|          1 |

请注意,您无需COUNT(*) > 1检测具有 2 个或更多声明的成员。我们不使用LEFT JOIN,而是使用JOIN,这将自动丢弃尚未获得第二次声明的成员。如果您选择使用,这是版本(更长)LEFT JOIN(与上面相同的输出):

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only

以下是在过滤之前如何查看上述查询的数据:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;

输出:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE |    GAP |
-----------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |     -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |     -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |      1 |
|          1 | 2010-01-26 | 2010-01-30 |        (null) | (null) |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |      5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |     -1 |
|          2 | 2010-02-04 | 2010-02-15 |        (null) | (null) |
|          3 | 2010-02-15 | 2010-03-02 |        (null) | (null) |

编辑要求澄清:

在您的澄清中,您还想包括尚未获得第二次索赔的成员,请改为:http ://sqlfiddle.com/#!3/c3365/22

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0; 

输出:

| MEMBERCODE |
--------------
|          1 |
|          3 |

该技术是计算成员的nextstartdate,如果他们没有下一个开始日期日期(即count(nextstartdate) = 0),那么他们只是单一的声明并且也有效,那么只需附加这个OR条件:

or count(nextstartdate) = 0; 

实际上,下面的条件也足够了,不过我想让查询更加自我记录,因此我建议依靠成员的 nextstartdate。这是计算尚未获得第二次索赔的成员的另一种条件:

or count(*) = 1;

顺便说一句,我们还必须改变比较:

sum(case when gap <= 1 then 1 end) = count(*)

对此(正如我们现在使用的那样LEFT JOIN):

sum(case when gap <= 1 then 1 end) = count(gap)
于 2012-08-29T13:19:55.157 回答
1

试试这个,它对行进行分区MemberCode并给它们序号。然后将行与后续num值进行比较,如果一行的结束日期和下一行的开始日期之间的差异大于一天,则它是无效成员:

DECLARE @t TABLE (MemberCode  VARCHAR(100), ClaimID   
    INT,StartDate   DATETIME,EndDate DATETIME)
INSERT @t
VALUES
('00001'   ,       012345   ,        '2010-01-15'   ,       '2010-01-20')
,('00001'   ,       012350   ,       '2010-01-19'   ,       '2010-01-22')
,('00001'   ,       012352   ,       '2010-01-20'   ,       '2010-01-25')
,('00001'   ,       012355   ,       '2010-01-26'   ,       '2010-01-30')
,('00002'   ,       012357   ,       '2010-01-20'   ,       '2010-01-25')
,('00002'   ,       012359   ,       '2010-01-30'   ,       '2010-02-05')
,('00002'   ,       012360   ,       '2010-02-04'   ,       '2010-02-15')
,('00003'   ,       012365   ,       '2010-02-15'   ,       '2010-02-28')
,('00004'   ,       012366   ,       '2010-03-18'   ,       '2010-03-23')
,('00005'   ,       012367   ,       '2010-03-19'   ,       '2010-03-25')
,('00006'   ,       012368   ,       '2010-03-20'   ,       '2010-03-21')

;WITH tbl AS (

    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY MemberCode ORDER BY StartDate) 
                AS num
    FROM    @t
), invalid AS (

    SELECT  tbl.MemberCode
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1
    AND     tbl.MemberCode = _tbl.MemberCode
    WHERE   DATEDIFF(DAY, tbl.EndDate, _tbl.StartDate) > 1  
)

SELECT  MemberCode
FROM    tbl
EXCEPT
SELECT  MemberCode
FROM    invalid
于 2012-08-29T13:26:25.627 回答
0

我认为您的查询会返回误报,因为它只检查连续行之间的时间间隔。在我看来,差距有可能被前面的一条线所补偿。让我举个例子吧:

第 l 行:2010-01-01 | 2010-01-31
第 2 行:2010-01-10 | 2010-01-15
第 3 行:2010-01-20 | 2010-01-25

您的代码将报告第 2 行和第 3 行之间的间隙,而它正在由第 1 行填充。您的代码不会检测到这一点。您应该在 DATEDIFF 函数中使用所有先前行的MAX(EndDate) 。

DECLARE @t TABLE (PersonID  VARCHAR(100), StartDate DATETIME, EndDate DATETIME)
INSERT @t VALUES('00001'   ,       '2010-01-01'   ,       '2010-01-17')
INSERT @t VALUES('00001'   ,       '2010-01-19'   ,       '2010-01-22')
INSERT @t VALUES('00001'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00001'   ,       '2010-01-26'   ,       '2010-01-31')
INSERT @t VALUES('00002'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-05')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-15')
INSERT @t VALUES('00003'   ,       '2010-02-15'   ,       '2010-02-28')
INSERT @t VALUES('00004'   ,       '2010-03-18'   ,       '2010-03-23')
INSERT @t VALUES('00005'   ,       '2010-03-19'   ,       '2010-03-25')
INSERT @t VALUES('00006'   ,       '2010-01-01'   ,       '2010-04-20')
INSERT @t VALUES('00006'   ,       '2010-01-20'   ,       '2010-01-21')
INSERT @t VALUES('00006'   ,       '2010-01-25'   ,       '2010-01-26')

;WITH tbl AS (
    SELECT  
        *, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY StartDate) AS num
    FROM    @t
), invalid AS (
    SELECT  tbl.PersonID 
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1 AND tbl.PersonID = _tbl.PersonID
    WHERE DATEDIFF(DAY, (SELECT MAX(tbl3.EndDate) FROM tbl tbl3 WHERE tbl3.num <= tbl.num AND tbl3.PersonID = tbl.PersonID), _tbl.StartDate) > 1  
)

SELECT  PersonID
FROM    tbl
EXCEPT
SELECT  PersonID
FROM    invalid
于 2013-09-17T15:02:55.323 回答