sql - 计算时间段后或基于辅助列的重复出现次数

Question

我目前有一个看起来像这样的访问日志表

LogID  UserID  BuildingID  Date/Time
===========================================
1      1       1           2013-01-01 10:00
2      2       1           2013-01-01 10:00
3      3       1           2013-01-01 10:30
4      3       2           2013-01-01 11:00
5      2       1           2013-01-01 11:00
6      4       1           2013-01-01 11:30
7      5       1           2013-01-01 11:30
8      5       1           2013-01-01 11:31
9      1       3           2013-01-01 12:00
10     1       3           2013-01-01 12:03
11     1       2           2013-01-01 12:05

我需要做的是根据以下两个条件创建一个查询来计算重复用户记录的数量：

大于 X 分钟的时间差 - X 将是用户指定的参数
或用户的每一个不同的建筑

例如，如果我将时差设置为 5 分钟，那么我的结果将是：

UserID   AccessCount
====================
1        3            <-- +1 for timediff (ID 1,10) +1 for building (ID 11)
2        2            <-- +1 for timediff (ID 2,5)
3        2            <-- +1 for building (ID 3,4)
4        1
5        1            <-- duplicate ignored because DateDiff < 5min

希望这是有道理的。

为了提供一些背景信息，这是为了对我们的一些建筑物进行滑动访问，并且对一些分析安全报告的业务需求下降了。本质上，我们希望检查给定时间段内的重复访问（通常在周末完成），但需要考虑到某些滑动点失败并需要用户多次滑动的事实。这就是为什么我希望 datediff 作为滑动错误通常意味着用户会在很短的时间内多次滑动。

非常感谢任何帮助，在此先感谢！

score 3 · Accepted Answer

您可以通过考虑何时数行而不数行来重新表述您的逻辑。当它位于同一建筑物上并且在同一建筑物上的上一个日期时间的特定时间段内时，您不会计算一行。

我想这可能是你想要的：

select userId, count(*)
from (select LogID, UserID, BuildingID, dt,
             lag(dt) over (partition by userid, buildingid) as prevdt
      from t
     ) t
where dt > prevdt + TIMEDIFF or prevdt is NULL

在 SQL 中，添加到日期时间的常量被解释为天数。所以，5分钟就是(5.0/60)/24。

您的数据中没有示例，但如果您有三行：

1   1   1   11:30
2   1   2   11:31
3   1   1   11:32

那么这不会计算第三行，因为第 1 行被第一个条件覆盖。

score 0 · Accepted Answer

这是一种方法：

declare @duplicateMinutes int = 5

select UserID, AccessCount = count(1)
from AccessLogs a
where not exists
  (
    select 1
    from AccessLogs d
    where a.LogID < d.LogID -- add this to try and avoid duplicate times cancelling each other
      and a.UserID = d.UserID
      and a.BuildingID = d.BuildingID
      and a.SwipeTime >= dateadd(mi, -@duplicateMinutes, d.SwipeTime)
      and a.SwipeTime <= d.SwipeTime
  )
group by UserID
order by UserID

SQL Fiddle with demo - 为您的数据提供预期结果。

sql - 计算时间段后或基于辅助列的重复出现次数

2 回答 2

Related

Reference