sql - SQL - 确定当时活动的记录数

Question

我有一个包含以下信息的表：

Conf_Start_Time
Part_Start_Time
Part_End_Time

Part_Start_Time如果 t 介于和之间，则认为记录在时间 t 处于活动状态Part_End_Time。

我想做的是分析所有记录以确定在指定日期有多少记录处于活动状态。我提出的解决方案是循环遍历一天中的每一分钟（例如从早上 6 点到晚上 9 点）并检查当天的每条记录以确定用户是否在指定时间 t 处于活动状态。

SQL中是否有解决方案，或者我应该继续使用代码解决方案？

在代码中，我会将所有记录拉到内存中，遍历时间（早上 6 点到晚上 9 点）并在指定日期测试每条记录以确定它在当前时间是否处于活动状态。如果它处于活动状态，我会增加一个计数器，如果没有，则继续下一条记录。下一次，重新初始化计数器并继续循环一天。

我们使用的是 SQL Server 2005。

更新：我正在寻找的输出将是从早上 6 点到晚上 9 点的最大并发使用数组

Record  Conf_Start_Time    Part_Start_Time     Part_End_Time
1.      6/5/2012 13:40:00  6/5/2012 13:41:23   6/5/2012 13:45:27
2.      6/5/2012 13:40:00  6/5/2012 13:40:23   6/5/2012 13:47:29
3.      6/5/2012 13:40:00  6/5/2012 13:42:55   6/5/2012 13:44:17

所以在时间 13:40:00 0 条记录处于活动状态；在时间 13:41:00 1 条记录处于活动状态；在时间 13:42:00 2 条记录处于活动状态；在时间 13:43:00 3 条记录处于活动状态；

我需要一天中每一分钟的数据。然后是每个月的每一天。这种类型的循环甚至可以在 SQL 中完成吗？

score 2 · Accepted Answer

例如，如果您想要所有在 2012 年 8 月 7 日处于活动状态的记录，请执行以下操作：

select * from your_table
where '2012-08-07' between Part_Start_Time and Part_End_Time

score 0 · Accepted Answer

试试这个：

DECLARE @auxDate datetime

SELECT *
  FROM your_table
 WHERE @auxDate BETWEEN Part_Start_Time AND Part_End_Time

Between 子句包含在内，如果您不想包含某些日期，请考虑使用：

DECLARE @auxDate datetime

SELECT *
  FROM your_table
 WHERE @auxDate >= Part_Start_Time
   AND @auxDate <= Part_End_Time

score 0 · Accepted Answer

以下使用相关子查询来获取您想要的数字。这个想法是计算累积开始和累积结束的次数，最多每次：

with alltimes as
    (select t.*
     from ((select part_start_time as thetime, 1 as IsStart, 0 as IsEnd
            from t
           ) union all
           (select part_end_time, 0 as isStart, 1 as IsEnd
            from t
           )
          ) t
     )
select t.*,
       (cumstarts - cumends) as numactive
from (select alltimes.thetime,
             (select sum(isStart)
              from allStarts as where as.part_start_time <= alltimes.thetime
             ) as cumStarts,
             (select sum(isEnd)
              from allStarts as where as.part_end_time <= alltimes.thetime
             ) as cumEnds
      from alltimes
     ) t

输出基于数据中存在的每个时间。

根据经验，您不希望在应用程序端进行大量数据工作。如果可能，最好在数据库中完成。

当有多个开始和同时结束时，此查询将有重复。在这种情况下，您需要确定如何处理这种情况。但是，想法是一样的。外部选择将是：

select t.thetime, max(cumstarts - cumends) as numactives

你需要一个 group by 子句：

group by t.thetime

“max”给出了开始的优先级（意味着在相同的时间戳下，开始被视为首先发生，所以你在那个时候获得了最大的活动）。“Min”将优先考虑结束。而且，如果您使用平均值，请记住转换为浮点数：

select t.thetime, avg(cumstarts*1.0 - cumends) as avgnumactives

score 0 · Accepted Answer

这就是我解决问题的方式。

我要做的第一件事是创建一个如下所示的序列表。由于各种原因，在 SQL 中拥有一个基本上无限（或至少是大）的数字序列可能非常有用。像这样的东西：

create table dbo.sequence
(
  seq_no int not null primary key clustered ,
)

declare @v int
set @v = -100000
while @v <= 100000
begin
  insert dbo.sequence values ( @v )
  set @v = @v+1
end

实际上，我会使用批量复制以不同的方式为表设置种子，甚至编写一个 CLR 表值函数来生成所需的范围。上述查询将......在加载表时不会表现出理想的性能特征。

一旦我有了类似的东西，我会写一个如下的查询。它将为您提供一份完整的报告，列出指定报告期内每天所需的每个报告桶。通过设置适当的变量，一切都可以调整。如果您想要稀疏报告，请将 final 更改left join为标准内部联接。

免责声明：此代码尚未经过测试，但它类似于我为执行相同操作而编写的代码。这种方法是合理的，尽管代码本身很可能包含错误。

-----------------------------------------------------------------------------
-- define the range of days in which we are interested
-- it might well be more than 1, but for this example, we'll define the start
-- and end days as the same, so we are interested in just one day.
-----------------------------------------------------------------------------
declare @dtFrom datetime
declare @dtThru datetime

set @dateFrom = '2012-06-01'
set @dateThru = '2012-06-01'

------------------------------------------------------------------------------
-- the next thing in which we are interested in are the boundaries of
-- the time period in which we are interested, and the interval length
-- of each reporting bucket, in minutes.
--
-- For this example, we're interesting in the time period
-- running from 6am through 9pm, such that 6am >= x < 9pm.
--
-- We also need a value defining start-of-day (midnight).
--
-- Setting a datetime value to '' will give you the epoch: 1900-01-01 00:00:00.000
-- Setting a datetime value to just a time-of-day string literal will
-- give you the epoch day at the desired time, so '06:00:00' converts to
-- '1900-01-01 06:00:00'. Crazy, but that's SQL Server.
--
------------------------------------------------------------------------------
declare @start_of_day               datetime
declare @timeFrom                   datetime
declare @timeThru                   datetime
declare @interval_length_in_minutes int

set @start_of_day               = '00:00:00'
set @timeFrom                   = '06:00:00'
set @timeThru                   = '21:00:00'
set @interval_length_in_minutes = 15

------------------------------------------------------------------------------
--
-- On to the meat of the matter. This query has three parts to it.
--
-- 1. Generate the set of reporting days, using our sequence table
-- 2. Generate the set of reporting buckets for each day, again, using our sequence table
--
-- The left join of these two virtual tables produces the set of all reporting periods
-- that we will use to match up to the source data that will fill the report.
--
-- 3. Finally, assign each row to 0 or more reporting buckets.
--    A given record has a time range in which it was 'active'.
--    Consequently, it may fall into multiple reporting buckets, and hence,
--    the comparison is a little wonky: A record is assigned to a reporting bucket
--    if both of these are true for the data record:
--
--    * Its active period ended *on or after* the start of the reporting period/bucket.
--    * Its active period began *on or before* the end of the reporting period.
--
--    It take a while to get your head around that, but it works.
--
--  When all that is in place, we use GROUP BY and the aggregate function SUM()
--  to collapse each reporting bucket into a single row and compute the active count.
--  We use SUM() in preference to COUNT() as we want a full report,
--  so we use left joins. Unlike other aggregate functions, COUNT() does not
--  exclude null rows/expressions in its computation.
-- 
--  There you go. Easy!
--
-----------------------------------------------------------------------------------
select timeFrom = dateadd(minute, times.offset                               , days.now ) ,
       timeThru = dateadd(minute, times.offset + @interval_length_in_minutes , days.now ) ,
       N        = sum( case when t.id is null then 0 else 1 end ) -- we sum() here rather than count() since we don't want missing rows from dbo.myFunkyTable to increment the count
from      ( select now = dateadd(day, seq_no , @dateFrom )                   -- get the set of 'interesting' days
            from dbo.sequence                                                -- via our sequence table
            where seq_no >= 0                                                --
              and seq_no <  datediff(day,@dateFrom,@dateThru)                --
          ) days                                                             --
left join ( select offset = seq_no                                           -- get the set of time buckets
            from dbo.sequence                                                -- each bucket is defined by its offset
            where seq_no >= datediff(minute,@start_of_day,@timeFrom)         -- as defined in minutes-since-start-of-day
              and seq_no <  datediff(minute,@start_of_day,@timeThru)         -- and is @interval_length_in_minuts long
              and 0      =  seq_no % @interval_length_in_minutes             --
          ) times
left join dbo.myFunkyTable t on t.Part_StartTime <  dateadd(minute, times.offset + @interval_length_in_minutes , days.now )
                            and t.Part_EndTime   >= dateadd(minute, times.offset                               , days.now )
group by dateadd(minute, times.offset                               , days.now ) ,
         dateadd(minute, times.offset + @interval_length_in_minutes , days.now )
order by 1 , 2

sql - SQL - 确定当时活动的记录数

4 回答 4

Related

Reference