sql - SQL 帮助：返回行的查询，其中存在后面的行

Question

我正在尝试计算管道特定组件中的项目数。每次它在管道中移动时，都会在此特定表中创建一个条目。

它存储了这样的东西：

ID:int-pk, ObjectId:varchar(25), EventType:int, Time:DateTime

例如。我在看时间 = 10:00

因此，如果对象 1 在上午 9 点有事件 A，在上午 10 点有事件 2，那么我想获得 ObjectId (1)。

特征

ObjectId 是通过管道的项目的唯一 ID，因此它们实际上很少（1 个条目或每个管道组件，其中大约有 10 个）
预计每天约 10K 次插入
性能是一个要求（所以 EXISTS(...) 可能不是一个选项）
硬件很稳固，它是一台数据中心 SQL 机器，但它与许多其他团队/流程共享。

我遇到的问题/我正在尝试的问题：

这是一个设计，所以我没有实际数据。我应该有一个概念证明数据库来测试
这是我一直在尝试的一些尝试：

select  objectid, time, eventtype
from        objects
where       -- can't use time < @t because I won't get the later events
group by    objectid
having      --

或者

select        objectid as oid, time, eventtype
from          objects
where     eventtype = 1
and       time < @t
and       exists (select  objectid, eventtype, time
              where   objectid = oid -- not sure if this is legal
              and     eventtype = 2
              and     time > @t)

正如您可能知道的那样，我没有写很多 SQL，所以我有点忘记了。

例子

ID  objectid    eventtype   time
1   12345   1   09:00 AM
2   12345   2   10:00 AM


eventtypeid     description 
1           "enter house"
2           "leave house"
3           "enter work"

所以，对象 4 上午 9 点进屋，上午 11 点离开，我想看看他们上午 10 点是否在屋里。12345 是主题的“姓名/编号”

在此示例中，我尝试查询对象是否在上午 10:00 在家中。对象完全有可能进入了房子，但从未离开过，我不希望这些用于此查询。

问题

我在正确的轨道上吗？
我如何估计第二个查询的预期性能（假设它有效）？
指针？建议？例子？

一切都值得赞赏。

score 1 · Accepted Answer

对于给定的主题和给定的时间，您可以执行以下操作：

select top 1 o.*
from objects o
where eventtime < @t and
      objectid = @objectid
order by eventtime desc;

使用 windows 函数将其扩展到多个对象是最简单的：

select o.*
from (select o.*,
             row_number() over (partition by objectid order by eventtime desc) as seqnum
      from objects o
      where eventtime < @t
     ) o
where seqnum = 1;

这些都为您提供有关给定时间之前（严格之前）的最后一个事件的信息。

score 0 · Accepted Answer

我对您的 SQL 感到有些困惑，但从您声明的内容看来，您想要在给定时间范围内的最新对象可能已经基于对象引用而存在。当我对 SQL 感到困惑时，我可能会找出错误的树，但从您的要求来看，您需要一个对象，该对象通过具有不总是唯一的公共 objectid 和泛型类型进行分组。

这可能会对您有所帮助，但可能不会。它基本上总是会通过他们的 objectId 计数来解释欺骗，但我不确定你是否也按类型限制范围，所以我留下了它以防万一。然后在第二次迭代中，受骗者按 obj 进行分区，然后如果您只关心一种类型，则可能按类型将范围限制为变量。您也可以在第一次迭代中执行此操作。如果您遇到 null，则假定该类型表示“一切”。我在生产环境中使用过类似的方法，因此只要您在适当的位置有索引，它就应该是可靠的。即类型和日期时间字段的索引。示例是自解压的，将在 SQL Management Studio 2008 及更高版本中运行，并使用自动填充的表变量。

declare @Object Table ( objectId int , typ varchar(2), obj varchar(8), dt datetime);

insert into @Object values (1, 'A', 'Brett', getdate() - 0.8) ,(1,'A','Sean', getdate() - 0.4),(1,'A','Brett', getdate() - 0.08),(2,'A','Michael', getdate() - 0.04)
,(2,'B','Ray', getdate() - 0.008),(3, 'B', 'Erik', getdate() - 0.004),(3, 'C', 'Ray', getdate() - 0.0001);

-- objects as they are
Select *
from @Object
;

-- Find dupe objects by two distinctions
select 
    obj
,   count(objectId) over(partition by obj) as rowOccurencesByTyp
,   count(objectId) over(partition by typ, obj) as rowOccurencesByTypAndObj
from @Object
;


-- limit scope by type
declare 
-- CHANGE LINE AS NEEDED TO TEST HOW IT WORKS FOR 'A', 'B' OR NULL
    @Type varchar(2) = NULL  
-- Scope range of datetime too if you want
,   @dt datetime
;


-- Find dupes first
with dupes as 
    (
        select 
        obj
    ,   typ
    ,   dt
    ,   count(objectId) over(partition by obj) as rowOccurencesByTyp
    ,   count(objectId) over(partition by typ, obj) as rowOccurencesByTypAndObj
    -- I made the Ray occurence be in DIFFERENT Types so this would be an edge case you may not want
    from @Object
    -- WHERE CLAUSE WOULD BE HERE WITH DATE RANGE.  I was lazy in my example and made it small but you could 
    -- easily limit scope of dupes by a date range of 'dt between @Start and @End' or 'dt < @dt' or 'dt > @dt'
    )
-- if you merely want to get the most recent objects you can do a windowed function to get them quite easily
, a as 
    (
    select 
        *
    ,   row_number() over(partition by obj order by dt desc) as rwn
    -- I am find the ranking by shared objectid and then ordering by date descending(most current first). 
    -- You wish to also add the 'typ' before the objectID as I was not sure
    from dupes
    where typ = isnull(@Type, typ)  -- limit scope by type potentially
        and rowOccurencesByTyp > 1
    -- you may set up other rowOccurrences here if that suits you better.
    )
select *
from a
where rwn = 1  
-- recently inserted double is a dupe, determining scope of dupe is done by
-- the most recent 'rwn' finding a repeat insert of a row from part 1 
-- ordered by date descending and grouped by it's object

score 0 · Accepted Answer

很难准确地跟踪您所追求的内容，但是要返回存在后一行的行，简单地说，在 SQL2012 中可以使用 LEAD 函数完成：

DROP TABLE #test
CREATE TABLE #test (VALUE CHAR(25)) 
INSERT INTO #test VALUES('abcde'),('asaf'),('dogs'),(NULL),('')

SELECT Value, LEAD(Value,1,'Last Record') OVER (ORDER BY Value)
FROM #test

SELECT *
FROM (SELECT Value, LEAD(Value,1,'Last Record') OVER (ORDER BY Value)'Last_Flag'
       FROM #test
      )sub
WHERE Last_Flag <> 'Last Record'

从上面创建的测试表中，下一个查询从下一行中提取一个值（第二个参数“1”定义偏移量，即您想要向前查找的行数），并将“最后一条记录”作为种子如果没有下一行，则为默认值（默认情况下为 NULL，除非您的数据有 NULLS，否则这很好，我喜欢播种一个值以防万一）。然后最后一个选择所有内容，除了后面没有行的那个。

如果您想知道每个 objectID IE，可以添加 PARTITION BY 语句：

LEAD(Value,1,'Last Record') OVER (PARTITION BY objectID ORDER BY Value)

sql - SQL 帮助：返回行的查询，其中存在后面的行

特征

例子

问题

3 回答 3

Related

Reference