使用窗口函数而不是递归 CTE 的替代解决方案
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT
employmentid,
startdate,
enddate,
DATEADD(
DAY,
-COALESCE(
SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0
),
startdate
) as grp
FROM @t
) withGroup
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
这通过计算grp
对于所有连续行都相同的值来工作。这是通过以下方式实现的:
- 确定跨度占用的总天数(+1,因为包括日期)
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
- 每个工作的累计天数,按开始日期排序。这为我们提供了所有先前工作跨度的总天数
- 我们与 0 合并以确保在我们的累积天数总和中没有 NULL
- 我们在累积总和中不包括当前行,这是因为我们将使用该值
startdate
而不是enddate
(我们不能使用它,enddate
因为 NULL)
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1
- 从 中减去累计天数
startdate
得到我们的grp
. 这是解决方案的关键。
- 如果开始日期的增长速度与所跨越的天数相同,那么这些天数是连续的,减去这两个天数将得到相同的值。
- 如果 startdate 的增长速度快于所跨越的天数,则存在差距,我们将获得一个
grp
大于前一个值的新值。
- 虽然
grp
是日期,但日期本身是没有意义的,我们只是将其用作分组值
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1
) inner2
随着结果
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| employmentid | startdate | enddate | daysSpanned | cumulativeDaysSpanned | grp |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363 | 0 | 2007-12-03 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL | NULL | 1363 | 2009-08-08 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568 | 0 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574 | 1568 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 30 | 2012-08-13 00:00:00.000 | NULL | NULL | 2142 | 2006-10-02 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL | NULL | 0 | 2007-09-24 00:00:00.000 |
+--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
- 最后我们可以
GROUP BY grp
摆脱连续的日子。
- 使用
MIN
andMAX
获取新的startdate
andendate
- 为了处理 NULL
enddate
,我们给它们一个很大的值来获取它们,然后再将MAX
它们转换回NULL
SELECT
employmentid,
MIN(startdate) as startdate,
NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
FROM (
SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
FROM (
SELECT *, COALESCE(
SUM(daysSpanned) OVER (
PARTITION BY employmentid
ORDER BY startdate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
,0
) as cumulativeDaysSpanned
FROM (
SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
) inner1
) inner2
) inner3
GROUP BY employmentid, grp
ORDER BY employmentid, startdate
为了得到想要的结果
+--------------+-------------------------+-------------------------+
| employmentid | startdate | enddate |
+--------------+-------------------------+-------------------------+
| 5 | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
+--------------+-------------------------+-------------------------+
| 5 | 2013-05-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 30 | 2006-10-02 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
| 66 | 2007-09-24 00:00:00.000 | NULL |
+--------------+-------------------------+-------------------------+
- 我们可以结合内部查询来获取此答案开头的查询。哪个更短,但更难解释
所有这一切的限制要求
- 就业的开始日期和结束日期没有重叠。这可能会在我们的
grp
.
- 开始日期不为 NULL。然而,这可以通过用小日期值替换 NULL 开始日期来克服
- 未来的开发者可以破译你执行的窗口黑魔法