这是一个用 SQL 解决的相当困难的问题,因为需要在数据中找到转换和范围,这并非易事。我已将问题分解为一系列由连续 cte 组成的步骤,这些步骤相互建立并导致最终的工作解决方案:
首先,我为数据添加一个行索引,以提供一个简单的 PK 来识别唯一行:
with NumberedAtt as (
select
row_number() over (partition by [user] order by date, time, reader) as ix,
att.[user],
att.[department],
att.[date] + att.[time] as dt,
att.[reader]
from att
)
然后我获取每个用户的第一个和最后一个索引值,它将用于每个进入/退出范围的最外边界:
, MinMax as (
select [user], min(ix) ixMin, max(ix) ixMax
from NumberedAtt N group by [user]
)
接下来,我将这些放在一起生成所有退出/进入范围的列表,这些范围是值Reader
从2
变为的点1
。这些是准确识别上一个时间范围何时结束以及下一个时间范围何时开始的特定点(并干净地处理连续的重复进入/退出读取)。通过将此与每个用户的第一次进入和最后一次退出时间相结合,生成所有进入/退出转换的列表:
, Transitions as (
select N.[User], 0 as exitIx, M.ixMin as entryIx
from NumberedAtt N
join MinMax M on N.[User] = M.[User]
where N.ix = M.ixMin
union
select N.[User], M.ixMax as exitIx, 0 as entryIx
from NumberedAtt N
join MinMax M on N.[User] = M.[User]
where N.ix = M.ixMax
union
select A1.[User], A1.ix as exitIx, A2.ix as entryIx
from NumberedAtt A1
join NumberedAtt A2 on A1.ix + 1 = A2.ix and A1.[user] = A2.[user]
where A1.[reader] = 2 and A2.[reader] = 1
)
这是此时的输出:
| USER | EXITIX | ENTRYIX |
|------|--------|---------|
| A1 | 0 | 1 |
| A1 | 2 | 3 |
| A1 | 4 | 0 |
| B1 | 0 | 1 |
| B1 | 3 | 0 |
请注意,我们已经巧妙地捕获了时间范围开始和结束的所有行索引。但是,它们偏移了一个 - 即一行中的进入时间对应于下一行中的退出时间。因此,我们需要再进行一次转换,通过向该表添加行索引并将每一行与以下行连接起来,将范围重新组合在一起:
, NumberedTransitions as (
select
row_number() over (partition by [User] order by exitIx) tix,
T.*
from Transitions T
), EntryExit as (
select
aEntry.ix as ixEntry,
aExit.ix as ixExit,
aEntry.[user],
aEntry.[department],
aEntry.[dt] as entryDT,
aExit.[dt] as exitDT
from NumberedTransitions tEntry
join NumberedAtt aEntry on tEntry.entryIx = aEntry.ix and tEntry.[user] = aEntry.[user]
join NumberedTransitions tExit on tEntry.tix + 1 = tExit.tix and tEntry.[user] = tExit.[user]
join NumberedAtt aExit on tExit.exitIx = aExit.ix and tExit.[user] = aExit.[user]
)
将连续的范围连接在一起后,我还将原始详细数据拉回,因为到目前为止我一直只使用行索引值。在这个子查询结束时,我们已经确定了每个用户的所有进入/退出范围,并“吞噬”了任何多次读取:
| IXENTRY | IXEXIT | USER | DEPARTMENT | ENTRYDT | EXITDT |
|---------|--------|------|------------|------------------------------|------------------------------|
| 1 | 2 | A1 | IT | March, 01 2014 11:12:00+0000 | March, 01 2014 13:12:00+0000 |
| 3 | 4 | A1 | IT | March, 02 2014 07:42:15+0000 | March, 02 2014 16:16:15+0000 |
| 1 | 3 | B1 | IT | March, 01 2014 12:28:06+0000 | March, 01 2014 13:28:06+0000 |
现在唯一要做的就是将数据组合在一起,以报告每个用户每月的总小时数。计算总小时数有点棘手,但可以通过取范围之间的分钟总和然后将结果转换回时间值来完成:
, Hours as (
select
[User],
[Department],
Year(EntryDT) Year,
Month(EntryDT) Month,
RIGHT('0' + CAST(SUM(DATEDIFF(Minute, EntryDT, ExitDT)) / 60 as varchar(10)), 2) + ':' +
RIGHT('0' + CAST(SUM(DATEDIFF(Minute, EntryDT, ExitDT)) % 60 as varchar(2)), 2) as TotalHours
from EntryExit EE
group by [User], [Department], Year(EntryDT), Month(EntryDT)
)
这给出了一个非常接近预期结果的最终结果:
| USER | DEPARTMENT | YEAR | MONTH | TOTALHOURS |
|------|------------|------|-------|------------|
| A1 | IT | 2014 | 3 | 10:34:00 |
| B1 | IT | 2014 | 3 | 01:00:00 |
可以根据需要对格式进行一些调整,但这应该很容易在这个框架之上构建。
这是一个工作演示:http ://www.sqlfiddle.com/#!3/f3f37/7