3

我有一张给定时间的历史巴士位置表,每秒记录一次。架构如下所示:

BusID        int         not null,
BreadcrumbID int         not null identity (1, 1),
BusStopID    int         null,
Timestamp    datetime    not null

我想根据历史旅行生成公交车站时间表。如果公共汽车对应于停靠站,则它是“停靠站”,如果为空BusStopID,则不是“停靠站” 。BusStopID

我需要生成公共汽车在每个站点的平均时间。所以基本上,我需要做以下事情:

  • 确定公共汽车停靠的时间——一个简单的where子句就可以了
  • 确定公共汽车停站的平均时间。出于我的目的,我将离散的“停止时间”定义为正负 10 分钟的窗口;如果公共汽车在一天的 10:04 - 10:08 停靠,另一天在 10:06 - 10:08 停靠,第三天在 10:14 - 10:18 停靠,这些停靠站将是同一个站点,但如果它停止在 10:45 - 10:48,这将是一个不同的停止事件。
  • 过滤掉“噪音”——即停止只发生几次但再也不会发生的时间

我完全不知道如何完成第二个和第三个子弹。请帮忙!

4

4 回答 4

2

刚刚看到的这个帖子可能对你有帮助。(Sql Server 中央)

于 2010-12-07T14:59:58.873 回答
2

在很多情况下,我都做过类似的事情。本质上,基于复杂排序中的分离进行分组。关于这个问题,我使用的方法的基础如下:

  1. 建立一个包含所有感兴趣时间范围的表格。
  2. 找出每组感兴趣的时间范围的开始时间。
  3. 找出每组感兴趣的时间范围的结束时间。
  4. 将开始时间和结束时间加入时间范围列表并分组。

或者,更详细地说:(这些步骤中的每一个都可能是一个大 CTE 的一部分,但为了便于阅读,我已将其分解为临时表......)

第 1 步:找到所有感兴趣的时间范围的列表(我使用的方法类似于 @Brad 链接的方法)。 注意:正如@Manfred Sorg 指出的那样,这假设总线数据中没有“丢失秒数”。如果时间戳中断,此代码会将单个范围解释为两个(或更多)不同的范围。

;with stopSeconds as (
  select BusID, BusStopID, TimeStamp,
         [date] = cast(datediff(dd,0,TimeStamp) as datetime),
         [grp] = dateadd(ss, -row_number() over(partition by BusID order by TimeStamp), TimeStamp)
  from #test
  where BusStopID is not null
)
select BusID, BusStopID, date,
       [sTime] = dateadd(ss,datediff(ss,date,min(TimeStamp)), 0),
       [eTime] = dateadd(ss,datediff(ss,date,max(TimeStamp)), 0),
       [secondsOfStop] = datediff(ss, min(TimeStamp), max(Timestamp)),
       [sOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,min(TimeStamp))),
       [eOrd] = row_number() over(partition by BusID, BusStopID order by datediff(ss,date,max(TimeStamp)))
into #ranges
from stopSeconds
group by BusID, BusStopID, date, grp

第 2 步:找出每个停靠点的最早时间

select this.BusID, this.BusStopID, this.sTime minSTime,
       [stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.sTime)
into #starts
from #ranges this
  left join #ranges prev on this.BusID = prev.BusID
                        and this.BusStopID = prev.BusStopID
                        and this.sOrd = prev.sOrd+1
                        and this.sTime between dateadd(mi,-10,prev.sTime) and dateadd(mi,10,prev.sTime)
where prev.BusID is null

第 3 步:查找每个站点的最晚时间

select this.BusID, this.BusStopID, this.eTime maxETime,
       [stopOrder] = row_number() over(partition by this.BusID, this.BusStopID order by this.eTime)
into #ends
from #ranges this
  left join #ranges next on this.BusID = next.BusID
                        and this.BusStopID = next.BusStopID
                        and this.eOrd = next.eOrd-1
                        and this.eTime between dateadd(mi,-10,next.eTime) and dateadd(mi,10,next.eTime)
where next.BusID is null

第 4 步:将所有内容连接在一起

select r.BusID, r.BusStopID,
       [avgLengthOfStop] = avg(datediff(ss,r.sTime,r.eTime)),
       [earliestStop] = min(r.sTime),
       [latestDepart] = max(r.eTime)
from #starts s
  join #ends e on s.BusID=e.BusID
              and s.BusStopID=e.BusStopID
              and s.stopOrder=e.stopOrder
  join #ranges r on r.BusID=s.BusID
                and r.BusStopID=s.BusStopID
                and r.sTime between s.minSTime and e.maxETime
                and r.eTime between s.minSTime and e.maxETime
group by r.BusID, r.BusStopID, s.stopOrder
having count(distinct r.date) > 1 --filters out the "noise"

最后,要完整,整理一下:

drop table #ends
drop table #starts
drop table #ranges
于 2010-12-07T19:42:18.377 回答
0

新鲜的答案...

试试这个:

DECLARE @stopWindowMinutes INT
SET @stopWindowMinutes = 10

--
;
WITH    test_data
          AS ( SELECT   1 [BusStopId]
                       ,'2010-01-01 10:00:04' [Timestamp]
               UNION SELECT   1,'2010-01-01 10:00:05'
               UNION SELECT   1,'2010-01-01 10:00:06'
               UNION SELECT   1,'2010-01-01 10:00:07'
               UNION SELECT   1,'2010-01-01 10:00:08'
               UNION SELECT   1,'2010-01-02 10:00:06'
               UNION SELECT   1,'2010-01-02 10:00:07'
               UNION SELECT   1,'2010-01-02 10:00:08'
               UNION SELECT   2,'2010-01-01 10:00:06'
               UNION SELECT   2,'2010-01-01 10:00:07'
               UNION SELECT   2,'2010-01-01 10:00:08'
               UNION SELECT   2,'2010-01-01 10:00:09'
               UNION SELECT   2,'2010-01-01 10:00:10'
               UNION SELECT   2,'2010-01-01 10:00:09'
               UNION SELECT   2,'2010-01-01 10:00:10'
               UNION SELECT   2,'2010-01-01 10:00:11'
               UNION SELECT   1,'2010-01-02 10:33:43'
               UNION SELECT   1,'2010-01-02 10:33:44'
               UNION SELECT   1,'2010-01-02 10:33:45'
               UNION SELECT   1,'2010-01-02 10:33:46'
             )
    SELECT DISTINCT
            [BusStopId]
           ,[AvgStop]
    FROM    ( SELECT    [a].[BusStopId]
                       ,( SELECT    MIN([b].[Timestamp])
                          FROM      [test_data] b
                          WHERE     [a].[BusStopId] = [b].[BusStopId]
                                    AND CONVERT(VARCHAR(10), [a].[Timestamp], 120) = CONVERT(VARCHAR(10), [b].[Timestamp], 120)
                                    AND [b].[Timestamp] BETWEEN DATEADD(SECOND, -@stopWindowMinutes * 60,
                                                                        [a].[Timestamp])
                                                        AND     DATEADD(SECOND, @stopWindowMinutes * 60, [a].[Timestamp]) -- w/i X minutes

                        ) [MinStop]
                       ,( SELECT    MAX([b].[Timestamp])
                          FROM      [test_data] b
                          WHERE     [a].[BusStopId] = [b].[BusStopId]
                                    AND CONVERT(VARCHAR(10), [a].[Timestamp], 120) = CONVERT(VARCHAR(10), [b].[Timestamp], 120)
                                    AND [b].[Timestamp] BETWEEN DATEADD(SECOND, -@stopWindowMinutes * 60,
                                                                        [a].[Timestamp])
                                                        AND     DATEADD(SECOND, @stopWindowMinutes * 60, [a].[Timestamp]) -- w/i X minutes

                        ) [MaxStop]
                       ,( SELECT    DATEADD(second,
                                            AVG(DATEDIFF(second, CONVERT(VARCHAR(10), [b].[Timestamp], 120),
                                                         [b].[Timestamp])),
                                            CONVERT(VARCHAR(10), MIN([b].[Timestamp]), 120))
                          FROM      [test_data] b
                          WHERE     [a].[BusStopId] = [b].[BusStopId]
                                    AND CONVERT(VARCHAR(10), [a].[Timestamp], 120) = CONVERT(VARCHAR(10), [b].[Timestamp], 120)
                                    AND [b].[Timestamp] BETWEEN DATEADD(SECOND, -@stopWindowMinutes * 60,
                                                                        [a].[Timestamp])
                                                        AND     DATEADD(SECOND, @stopWindowMinutes * 60, [a].[Timestamp]) -- w/i X minutes

                        ) [AvgStop]
              FROM      [test_data] a
              WHERE     CONVERT(VARCHAR(10), [Timestamp], 120) = CONVERT(VARCHAR(10), [Timestamp], 120)
              GROUP BY  [a].[BusStopId]
                       ,[a].[Timestamp]
            ) subset1
于 2010-12-07T13:29:30.613 回答
0

通常情况下,将这些问题分解成小块更容易解决和维护:

-- Split into Date and minutes-since-midnight
WITH observed(dates,arrival,busstop,bus) AS (
    SELECT
        CONVERT(CHAR(8), TimeStamp, 112),
        DATEPART(HOUR,TimeStamp) * 60 + DATEPART(MINUTE,TimeStamp),
        busstopid,
        busid
    FROM
        History
),
-- Identify times at stop subsequent to arrival at that stop
atstop(dates,stoptime,busstop,bus) AS (
    SELECT
        a.dates,
        a.arrival,
        a.busstop,
        a.bus
    FROM
        observed a 
    WHERE
        EXISTS (
            SELECT 
                *
            FROM
                observed b
            WHERE
                a.dates = b.dates AND
                a.busstop = b.busstop AND
                a.bus = b.bus AND
                a.arrival - b.arrival BETWEEN 1 AND 10
        )
),
-- Isolate actual arrivals at stops, excluding waiting at stops
dailyhalts(dates,arrival,busstop,bus) AS (
    SELECT
        a.dates,a.arrival,a.busstop,a.bus
    FROM
        observed a 
    WHERE
        arrival NOT IN (
            SELECT
                stoptime
            FROM 
                atstop b 
            WHERE
                a.dates = b.dates AND
                a.busstop = b.busstop AND
                a.bus = b.bus 
    )
),
-- Merge arrivals across all dates
timetable(busstop,bus,arrival) AS (
    SELECT
        a.busstop, a.bus, a.arrival
    FROM
        dailyhalts a 
    WHERE
        NOT EXISTS (
            SELECT  
                *
            FROM
                dailyhalts h 
            WHERE
                a.busstop = h.busstop AND
                a.bus = h.bus AND
                a.arrival - h.arrival BETWEEN 1 AND 10
        )
    GROUP BY
        a.busstop, a.bus, a.arrival
)
-- Print timetable for a given day
SELECT
    a.busstop, a.bus, a.arrival, DATEADD(minute,AVG(b.arrival),'2010/01/01')
FROM
    timetable a INNER JOIN
    observed b ON
        a.busstop = b.busstop AND
        a.bus = b.bus AND
        b.arrival BETWEEN a.arrival AND a.arrival + 10
GROUP BY
    a.busstop, a.bus, a.arrival

输入:

ID  BusID   BusStopID   TimeStamp
1   1   1   2010-01-01 10:00:00.000
2   1   1   2010-01-01 10:01:00.000
3   1   1   2010-01-01 10:02:00.000
4   1   2   2010-01-01 11:00:00.000
5   1   3   2010-01-01 12:00:00.000
6   1   3   2010-01-01 12:01:00.000
7   1   3   2010-01-01 12:02:00.000
8   1   3   2010-01-01 12:03:00.000
9   1   1   2010-01-02 11:00:00.000
10  1   1   2010-01-02 11:03:00.000
11  1   1   2010-01-02 11:07:00.000
12  1   2   2010-01-02 12:00:00.000
13  1   3   2010-01-02 13:00:00.000
14  1   3   2010-01-02 13:01:00.000
15  1   1   2010-01-03 10:03:00.000
16  1   1   2010-01-03 10:05:00.000

输出:

busstop bus arrival (No column name)
1   1   600 2010-01-01 10:02:00.000
1   1   660 2010-01-01 11:03:00.000
2   1   660 2010-01-01 11:00:00.000
2   1   720 2010-01-01 12:00:00.000
3   1   720 2010-01-01 12:01:00.000
3   1   780 2010-01-01 13:00:00.000
于 2010-12-08T12:08:16.673 回答