0

i have a table which holds a number of different rows which comes from generated log files. Each row has a timestamp (in epoch format) below is a sample of data, currently there is about 1.5 million rows of data!

EpochTime               Date                    Dbm Source
1370732265.373915000    2013-06-17 11:36:39.477 -85 1
1370732265.376506000    2013-06-17 11:36:39.487 -76 2
1370732265.398012000    2013-06-17 11:37:39.503 -81 1
1370732265.463492000    2013-06-17 11:37:39.520 -94 3
1370732265.692144000    2013-06-17 11:37:39.533 -77 2
1370732265.845195000    2013-06-17 11:38:39.550 -84 4
1370732265.933283000    2013-06-17 11:38:39.580 -84 4
1370732265.935863000    2013-06-17 11:38:39.597 -84 5
1370732265.939143000    2013-06-17 11:39:39.597 -84 2
1370732265.939858000    2013-06-17 11:39:39.613 -84 4
1370732265.965481000    2013-06-17 11:40:39.627 -82 5
1370732266.049712000    2013-06-17 11:40:39.677 -82 3
1370732266.110457000    2013-06-17 11:41:39.690 -84 4
1370732266.110457000    2013-06-17 11:41:39.690 -84 6
1370732266.110457000    2013-06-17 11:42:39.690 -84 3
1370732266.110457000    2013-06-17 11:42:39.690 -84 4
1370732266.110457000    2013-06-17 11:42:39.690 -84 6
1370732266.110457000    2013-06-17 11:43:39.690 -84 2
1370732266.110457000    2013-06-17 11:44:39.690 -84 1

What i need to do is find the start and end time for each source, however there can only be a maximum span of 5 minutes before the source gets counted again. i.e. source 1 would get logged twice, all other sources would still continue to be logged until the source hasnt been seen for 5 minutes, into a table that looks like below.

ID  Duration    Store       Start                   End                     MacID  Dbm  
    7   31          1       2013-06-08 07:46:10.000 2013-06-08 08:17:00.000 1      -84
    4   2           1       2013-06-08 18:42:53.000 2013-06-08 18:44:06.000 2      -83
    2   1           1       2013-06-08 14:31:20.000 2013-06-08 14:32:08.000 3      -89
    11  213         1       2013-06-08 12:43:55.000 2013-06-08 16:16:11.000 4      -86
    6   585         1       2013-06-08 14:03:58.000 2013-06-08 23:48:44.000 5      -75
    28  287         1       2013-06-08 07:15:40.000 2013-06-08 12:02:10.000 6      -88
    28  287         1       2013-06-08 07:15:40.000 2013-06-08 12:02:10.000 1      -81

preferably im looking for a fully SQL solution because of the amount of data that exists, due to performance of looping through that much data. i have had a go but everything i have done so far only counts the source once per period ( currently set to a days worth of data)

The database is running on sql server 2012

EDIT: one thing not mentioned was that the highest DBM value for each 'visit' needs to be logged with the processed data

4

1 回答 1

1

您正在使用 SQL Server 2012。是的。

这是想法。查找源的每个序列的开始位置。这可以是第一次看到源,也可以是在 5 分钟内没有看到源时。您可以使用lag().

接下来,对NewStart标志进行累积总和。序列中的所有内容都具有相同的值,因此可以用于分组。

最终结果来自聚合。输出中的所有字段都不清楚,但这里是 SQL 来完成大部分工作:

select count(*) as numIds, max(date) - MIN(date) as duration, 1 as store,
       MIN(date) as start, MAX(date) as end, source as MacId
from (select t.*,
             SUM(NewStart) over (partition by source order by date) as Grp
      from (select t.*,
                   (case when date - lag(date) over (partition by source order by date) < 5/(60*24.0) then 0
                         else 1
                    end) as NewStart
            from t
           ) t
     ) t
group by grp
于 2013-06-18T16:00:18.910 回答