4

我有一个包含设备 gps 坐标的 sql 表,每n分钟更新一次(设备安装在车辆中)。鉴于 GPS 的性质,许多条目非常相似,但就服务器而言完全不同。我可以很容易地大致匹配事物(在〜3.6'或36'内)CAST(lat as decimal(7,4))

我希望能够获取结果集并压缩近似的重复条目,但仍保持基于时间的顺序。这是一个例子:

Row    Lat         Lng        vel Hdg Time
01    31.12345    -88.12345   00  00  12-4-21 01:45:00
02    31.12346    -88.12345   00  00  12-4-21 01:46:00
03    31.12455    -88.12410   10  01  12-4-21 01:47:00
04    31.12495    -88.12480   17  01  12-4-21 01:48:00
05    31.12532    -88.12560   22  01  12-4-21 01:49:00
06    31.12567    -88.12608   25  02  12-4-21 01:50:00
07    31.12638    -88.12672   24  02  12-4-21 01:51:00
08    31.12689    -88.12722   19  02  12-4-21 01:52:00
09    31.12345    -88.12345   00  00  12-4-21 01:53:00
10    31.12346    -88.12346   00  00  12-4-21 01:54:00
11    31.12347    -88.12345   00  00  12-4-21 01:55:00
12    31.12346    -88.12346   00  00  12-4-21 01:56:00
13    31.12689    -88.12788   10  40  12-4-21 01:57:00
14    31.12604    -88.12691   13  39  12-4-21 01:58:00
15    31.12572    -88.12603   15  39  12-4-21 01:59:00

我想要的最终结果是将第 1 行和第 2 行压缩为一行,将第 9 行到第 12 行压缩为一行,其中包含AVG(Lat)AVG(Lng)MIN(Time).

鉴于上述数据,这是我希望收到的结果集:

Row    Lat         Lng        vel Hdg Time
01    31.123455   -88.12345   00  00  12-4-21 01:45:00
02    31.12455    -88.12410   10  01  12-4-21 01:47:00
03    31.12495    -88.12480   17  01  12-4-21 01:48:00
04    31.12532    -88.12560   22  01  12-4-21 01:49:00
05    31.12567    -88.12608   25  02  12-4-21 01:50:00
06    31.12638    -88.12672   24  02  12-4-21 01:51:00
07    31.12689    -88.12722   19  02  12-4-21 01:52:00
08    31.12346    -88.123455  00  00  12-4-21 01:53:00
09    31.12689    -88.12788   10  40  12-4-21 01:57:00
10    31.12604    -88.12691   13  39  12-4-21 01:58:00
11    31.12572    -88.12603   15  39  12-4-21 01:59:00

分组之间的界限将是运动。速度 > 0,或 gps 坐标变化超过x量。在这种情况下,x是 0.0001。 如下所述,问题在于给定坐标处的多个停靠点(在不同的时间)被集中到一个停靠点中。如果我今天下午 4 点访问坐标 x,明天上午 8 点,然后在下午 6 点再次访问,我看到的唯一一个是明天@下午 6 点(在 的情况下MAX(Time))或今天@下午 4 点(在情况下) MIN(Time)

如果速度为 0,则航向也为 0。但是,重要的是,如果第 1 行和第 2 行以及第 9 到 12 行的坐标足够相似以至于相同(即四舍五入到 4小数位)。

我有一个查询可以做到这一点:

SELECT Geography::Point(AVG(dbo.GPSEntries.Latitude), 
                        AVG(dbo.GPSEntries.Longitude),
                        4326 ) as Location,
       dbo.GPSEntries.Velocity,
       dbo.GPSEntries.Heading,
       MAX(dbo.GPSEntries.Time) as maxTime,
       MIN(dbo.GPSEntries.Time) as minTime,
       AVG(dbo.RFDatas.RSSI) as avgRSSI,
       COUNT(1) as samples

FROM dbo.GPSEntries
     INNER JOIN
         dbo.Reports ON
             dbo.GPSEntries.Report_Id = dbo.Reports.Id 
     INNER JOIN
         dbo.RFDatas ON
             dbo.GPSEntries.Report_Id = dbo.RFDatas.Report_Id

GROUP BY CAST(Latitude as Decimal(7,4)),
         CAST(Longitude as Decimal(7,4)),
         Velocity,
         Heading

ORDER BY MAX(Time)

换句话说,如果我从 A 点旅行到 B 点,停留 30 分钟(每分钟 1 次报告 30 份报告),然后前往 C 点,停留 20 分钟,然后返回 B 点并再停留 20 分钟在前往 D 点前几分钟,我希望能够在 B 点看到两个单独的站点。

这是来自我的数据库的一些实际数据,经过消毒以保护无辜者或责怪阿拉巴马州东北部的某个人。

Latitude    Longitude   Spd Vel MAX(Time)               MIN(Time)                sig RowCount    
34.747420   -86.302580  68  157 2012-06-13 01:31:37.000 2012-06-13 01:31:37.000  -91   1
34.759140   -86.307620  61  134 2012-06-13 01:33:06.000 2012-06-13 01:33:06.000  -91   2
34.763237   -86.307264  0   0   2012-06-13 01:34:36.000 2012-06-12 01:27:21.000  -97   7
34.763288   -86.307280  0   0   2012-06-13 14:30:44.000 2012-06-12 01:30:21.000  -98 527
34.760220   -86.308200  38  110 2012-06-13 14:33:44.000 2012-06-13 14:33:44.000  -98   1
34.750350   -86.305750  5   90  2012-06-13 14:35:13.000 2012-06-13 14:35:13.000  -83   2
34.737160   -86.298040  70  88  2012-06-13 14:36:43.000 2012-06-13 14:36:43.000  -80   1
34.736420   -86.277270  120 33  2012-06-13 14:38:13.000 2012-06-13 14:38:13.000  -87   2
34.747090   -86.248370  120 37  2012-06-13 14:39:43.000 2012-06-13 14:39:43.000  -93   2
34.755620   -86.240640  70  179 2012-06-13 14:41:13.000 2012-06-13 14:41:13.000  -81   1
34.771240   -86.242760  70  0   2012-06-13 14:42:42.000 2012-06-13 14:42:42.000  -88   2
34.785510   -86.245710  70  6   2012-06-13 14:44:12.000 2012-06-13 14:44:12.000  -99   2
34.800220   -86.239400  70  1   2012-06-13 14:45:42.000 2012-06-13 14:45:42.000  -86   1
34.815070   -86.232180  70  16  2012-06-13 14:47:12.000 2012-06-13 14:47:12.000  -98   2
34.824540   -86.226198  0   0   2012-06-13 14:51:41.000 2012-06-13 00:13:48.000 -101   9
34.824579   -86.226171  0   0   2012-06-14 00:26:19.000 2012-06-12 00:46:57.000  -99 168

您会注意到第 4 行和最后一行分别有 527 和 168 个条目,它们跨越 2 天。这些条目仅来自 1 个设备,并且来自设备在同一地点多次停止数小时的位置。

这是一些压缩的 csv 数据:示例

我最后做了什么

对 Aaron Bertrand 提供的查询进行了一些小的修改,如下所示:

WITH d AS
(
  SELECT Time
        ,Latitude
        ,Longitude
        ,Velocity
        ,Heading
        ,TimeRN = ROW_NUMBER() OVER (ORDER BY [Time])
  FROM dbo.GPSEntries
  GROUP BY Time, Latitude, Longitude, Velocity, Heading
),
y AS (
  SELECT BeginTime  = MIN(Time)
        ,EndTime    = MAX(Time)
        ,Latitude   = AVG(Latitude)
        ,Longitude  = AVG(Longitude)
--      ,[RowCount] = COUNT(*)
        ,GroupNumber
  FROM ( 
    SELECT  Time
           ,Latitude
           ,Longitude
           ,GroupNumber = ( 
              SELECT MIN(d2.TimeRN)
              FROM d AS d2
              WHERE d2.TimeRN >= d.TimeRN AND
              NOT EXISTS ( 
                SELECT 1
                FROM d AS d3    -- Between 250 and 337 feet
                WHERE ABS(d2.Latitude - d.Latitude) <= .0007 AND   
                      ABS(d2.Longitude - d.Longitude) <= .0007 AND
                      d2.Velocity = d.Velocity ) )
    FROM d ) AS x
  GROUP BY GroupNumber
)
SELECT y.Latitude
      ,y.Longitude
      ,d.Velocity
      ,d.Heading
      ,y.BeginTime
--    ,y.EndTime
--    ,y.[RowCount]
--    ,Duration = CONVERT(time(0),DATEADD(SS,DATEDIFF(SS,y.BeginTime, y.EndTime), '0:00:00'), 108)
FROM y INNER JOIN d ON y.BeginTime = d.[Time]
-- FOR STOPS (5 minute):
-- WHERE DATEDIFF(MI, Y.BeginTime, y.EndTime) + 1 > 5
ORDER BY y.BeginTime;
4

1 回答 1

1

以下是 tempdb 中的一些示例数据:

USE tempdb;
GO

CREATE TABLE dbo.GPSEntries
( 
  Latitude DECIMAL(8,5), 
  Longitude DECIMAL(8,5), 
  Velocity TINYINT, 
  Heading TINYINT, 
  [Time] SMALLDATETIME
);

INSERT dbo.GPSEntries VALUES
 (31.12345,-88.12345,00,00,'2012-04-21 01:45:00'),
 (31.12346,-88.12345,00,00,'2012-04-21 01:46:00'),
 (31.12455,-88.12410,10,01,'2012-04-21 01:47:00'),
 (31.12495,-88.12480,17,01,'2012-04-21 01:48:00'),
 (31.12532,-88.12560,22,01,'2012-04-21 01:49:00'),
 (31.12567,-88.12608,25,02,'2012-04-21 01:50:00'),
 (31.12638,-88.12672,24,02,'2012-04-21 01:51:00'),
 (31.12689,-88.12722,19,02,'2012-04-21 01:52:00'),
 (31.12345,-88.12345,00,00,'2012-04-21 01:53:00'),
 (31.12346,-88.12346,00,00,'2012-04-21 01:54:00'),
 (31.12347,-88.12345,00,00,'2012-04-21 01:55:00'),
 (31.12346,-88.12346,00,00,'2012-04-21 01:56:00'),
 (31.12689,-88.12788,10,40,'2012-04-21 01:57:00'),
 (31.12604,-88.12691,13,39,'2012-04-21 01:58:00'),
 (31.12572,-88.12603,15,39,'2012-04-21 01:59:00');

我尝试满足查询:

;WITH d AS
(
    SELECT Time, Latitude, Longitude, Velocity, Heading, 
        NormLat = CONVERT(DECIMAL(7,4), Latitude), 
        NormLong = CONVERT(DECIMAL(7,4), Longitude),
        TimeRN = ROW_NUMBER() OVER (ORDER BY [Time])
    FROM dbo.GPSEntries
    -- /* you probably want filters:
    -- WHERE DeviceID = @SomeDeviceID
    -- AND [Time] >= @SomeStartDate
    -- AND [Time] <  DATEADD(DAY, 1, @SomeEndDate)
    -- /* also your sample CSV file had lots of duplicates, so:
    GROUP BY Time, Latitude, Longitude, Velocity, Heading
),
y AS (
  SELECT MinTime = MIN(Time), MaxTime = MAX(Time), Latitude = AVG(Latitude), 
    Longitude = AVG(Longitude), [RowCount] = COUNT(*) FROM 
    (
      SELECT Time, Latitude, Longitude, GroupNumber = 
      (
        SELECT MIN(d2.TimeRN) 
         FROM d AS d2 WHERE d2.TimeRN >= d.TimeRN 
         AND NOT EXISTS 
         (
           SELECT 1 FROM d AS d3
           WHERE d2.NormLat = d.NormLat
           AND d2.NormLong = d.NormLong
         )
       )
       FROM d
    ) AS x GROUP BY GroupNumber
)
SELECT [Row] = ROW_NUMBER() OVER (ORDER BY y.MinTime),
  y.Latitude, y.Longitude, d.Velocity, d.Heading, 
  y.MinTime, y.MaxTime, y.[RowCount]
FROM y INNER JOIN d ON y.MinTime = d.[Time]
ORDER BY y.MinTime;

结果:

Row Latitude  Longitude  Velocity Heading MinTime          MaxTime          RowCount
---|---------|----------|--------|-------|----------------|----------------|--------
1   31.123455 -88.123450   0        0     2012-04-21 01:45 2012-04-21 01:46   2
2   31.124550 -88.124100   10       1     2012-04-21 01:47 2012-04-21 01:47   1
3   31.124950 -88.124800   17       1     2012-04-21 01:48 2012-04-21 01:48   1
4   31.125320 -88.125600   22       1     2012-04-21 01:49 2012-04-21 01:49   1
5   31.125670 -88.126080   25       2     2012-04-21 01:50 2012-04-21 01:50   1
6   31.126380 -88.126720   24       2     2012-04-21 01:51 2012-04-21 01:51   1
7   31.126890 -88.127220   19       2     2012-04-21 01:52 2012-04-21 01:52   1
8   31.123460 -88.123455   0        0     2012-04-21 01:53 2012-04-21 01:56   4
9   31.126890 -88.127880   10       40    2012-04-21 01:57 2012-04-21 01:57   1
10  31.126040 -88.126910   13       39    2012-04-21 01:58 2012-04-21 01:58   1
11  31.125720 -88.126030   15       39    2012-04-21 01:59 2012-04-21 01:59   1
于 2012-06-14T02:35:15.097 回答