编辑:经过更多的故障排除后,我发现了以下内容 - 我将 lag(event_time) 添加到查询中以查看查询收集的日期并且得到同样奇怪的结果:
SELECT device_id,
event_time,
unix_time,
event_id,
lag(event_time) OVER (PARTITION BY device_id ORDER BY unix_time,event_id) AS lag_time
FROM ios_d_events;
对于某些 device_ids,这会按预期返回,但是某些 device_ids 会返回以下内容:
device_id | event_time | unix_time | event_id | lag_time
--------------------------------------+---------------------+-----------------+--------------------------------------+----------------------------
111111111111111111111111111111111111 | 2015-03-01 10:41:47 | 1425206507.0000 | 4C48BE67-31EB-4432-96EF-0F30B4191340 |
111111111111111111111111111111111111 | 2015-03-01 10:41:48 | 1425206508.0000 | A4AE33A2-6CDC-480C-A8D7-C4024810D236 | 2015-03-01 10:41:47
111111111111111111111111111111111111 | 2015-03-01 10:41:51 | 1425206511.0000 | 997614AE-CE7E-45F6-BCE3-93E70CD46609 | 2015-03-01 10:41:48
111111111111111111111111111111111111 | 2015-03-01 10:41:53 | 1425206513.0000 | 202DA38C-1823-4100-85AB-3DE139FB3CE3 | 2015-03-01 10:41:51
111111111111111111111111111111111111 | 2015-03-01 10:42:11 | 1425206531.0000 | 8DFA7938-2123-4978-A89D-6C92404B504D | 2015-03-01 10:41:53
111111111111111111111111111111111111 | 2015-03-01 10:42:12 | 1425206532.0000 | 463E9833-9526-4E76-A907-4651C13991A0 | 2015-03-01 10:42:11
111111111111111111111111111111111111 | 2015-03-01 10:42:14 | 1425206534.0000 | 8204696E-3DAA-423E-9031-EC80BFA1157E | 2015-03-01 10:42:12
111111111111111111111111111111111111 | 2015-03-01 10:42:20 | 1425206540.0000 | 10DB02E2-2D4F-4611-98D7-966074FBD398 | 2015-03-01 10:42:14
111111111111111111111111111111111111 | 2015-03-01 10:42:20 | 1425206540.0000 | 50535667-A02D-47F7-8320-86EAC4638964 | 2015-03-01 10:42:20
111111111111111111111111111111111111 | 2015-03-01 10:42:27 | 1425206547.0000 | 6C8BC79D-CB3E-4FE2-8EDD-B8421EA237FF | 2015-03-01 10:42:20
111111111111111111111111111111111111 | 2015-03-01 10:42:27 | 1425206547.0000 | A732E59D-2EEE-44AE-BB5B-D016E6E6C597 | 2015-03-01 10:42:27
111111111111111111111111111111111111 | 2015-03-01 10:42:27 | 1425206547.0000 | ABBE4184-65C6-41C1-AC0B-74828DDD8DE8 | 2015-03-01 10:42:27
111111111111111111111111111111111111 | 2015-03-01 10:42:40 | 1425206560.0000 | 03D0B5FF-9E4D-4F14-8169-7D1C93C617B3 | 2015-03-01 10:42:27
111111111111111111111111111111111111 | 2015-03-01 10:42:40 | 1425206560.0000 | 5C1DFE08-8081-4C84-9D8F-E29EBEB1C9D5 | 2015-03-01 10:42:40
111111111111111111111111111111111111 | 2015-03-01 10:42:40 | 1425206560.0000 | 82C50F8E-9790-4C27-8979-9484954934B4 | 2015-03-01 10:42:40
111111111111111111111111111111111111 | 2015-03-01 10:42:42 | 1425206562.0000 | 2D30722E-FB37-4563-B2CD-FA545D95AAB4 | 2015-03-01 10:42:40
111111111111111111111111111111111111 | 2015-03-01 10:42:49 | 1425206569.0000 | 7613F856-763E-4792-904E-8D89E2502710 | 2015-03-01 10:42:42
111111111111111111111111111111111111 | 2015-03-01 10:43:01 | 1425206581.0000 | DB39294A-E133-4A05-B367-210944965FA3 | 2015-03-01 10:42:49
111111111111111111111111111111111111 | 2015-03-01 10:43:02 | 1425206582.0000 | 61B8AE48-D5C8-4809-9C4F-56EEA45E3626 | 2015-03-01 10:43:01
111111111111111111111111111111111111 | 2015-03-01 10:43:02 | 1425206582.0000 | 82870AB0-08F1-403F-B805-836CC1454D1A | 2015-03-01 10:43:02
111111111111111111111111111111111111 | 2015-03-01 10:43:04 | 1425206584.0000 | BA24E540-0F5B-4BC4-B59B-29D729FA5C35 | 2015-03-01 10:43:02
999999999999999999999999999999999999 | 2015-05-13 16:40:19 | 1431535219.0000 | 25E20777-508D-4194-9324-BE8A44CE7B59 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:40:19 | 1431535219.0000 | 72DCEE64-CB3A-43CD-A949-FBE35C0A8873 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:40:21 | 1431535221.0000 | A0062926-C8A0-47ED-A7F0-65716893DDC0 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:40:25 | 1431535225.0000 | 0BCEABD0-DCEB-431C-A6FE-6E1243FC9890 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:40:34 | 1431535234.0000 | 8F72E8D1-A167-460B-9034-C4D1423CAF15 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:41:19 | 1431535279.0000 | 44C0214C-E13B-4CBB-92BB-4809EAD5DF15 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:41:20 | 1431535280.0000 | C8CBB801-E8F3-47FD-B9C9-50BD0DDFA356 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:41:22 | 1431535282.0000 | C3173090-8BB8-407F-BFF5-07B58BCFCA7B | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:41:48 | 1431535308.0000 | 91290A91-2394-4C5F-8687-6A74593F9FFB | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:41:48 | 1431535308.0000 | EF138C8C-93BC-4C29-8438-EB0214B3CFC4 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:41:52 | 1431535312.0000 | 76AA328F-D46D-4235-B780-2B05FCE02855 | 1970-01-03 09:15:58.430244
999999999999999999999999999999999999 | 2015-05-13 16:41:53 | 1431535313.0000 | 563E04CE-7DDB-4543-8F1D-C81625E839F8 | 1970-01-03 09:15:58.430244
看起来当 device_id 的前两个事件具有相同的时间戳时会发生这种情况。希望这能给某人一个线索?
OP
我正在尝试使用 Spark SQL 中的 lag() 函数来确定表中两个后续事件之间的时间长度。重要的列是 device_id、文本列、unix_time、数字时间戳和 event_id,每行都是唯一的。
我正在运行的查询:
SELECT device_id,
unix_time,
event_id,
unix_time - lag(unix_time)
OVER
(PARTITION BY device_id ORDER BY unix_time,event_id)
AS seconds_since_last_event
FROM ios_d_events;
在 Postgres 中,这给出了预期的结果 - 但是当在 Pyspark 中运行时,任何时候有两个事件具有相同的时间戳, seconds_since_last_event 被计算为一个很大的数字,即 -1435151676846888 或 -1431583545415023 或 25534 - 我不知道在哪里这些数字来自。
我尝试在查询中添加一个 if() 语句,如
if(unix_time = lag(unix_time)
OVER (PARTITION BY device_id ORDER BY unix_time,event_id),
0,
unix_time - lag(unix_time)
OVER (PARTITION BY device_id ORDER BY unix_time,event_id))
AS seconds_since_last_event
但我得到了相同的结果。有什么想法可能导致这种情况吗?