sql - SAS - 模糊匹配毫秒时间戳“Just Before”或“Just After”给定时间戳

Question

我正在处理 SAS 9.3 中的高频财务数据，时间戳（数字，format=time12.3）以毫秒为单位，例如：

 [h]:mm:ss:000.

之前的代码使用了一个PROC SQL我在各种论坛上看到过几次的结构，你在这个问题上

对（否则不必要的）ID 变量执行 FULL JOIN 并计算每个数据集上的时间戳之间的差异，然后
仅通过分组 ID 保留时间差 = MIN（时间差）的那些记录（否则这是必要的）。

这还包括任何一个约束

"take the exact match or pull the next closest earlier record"

或者

"take the exact match or pull the next farther record"

按组 ID（数字、顺序）。问题是这很慢，数据很大（数百万条记录），我们需要对大约 12 个不同的时间戳执行此操作。

如果有人能指导我以更快的方式做到这一点，我将不胜感激！

示例数据（用于连接A=B或A < B最接近的连接：

ObsID   TimeFromDataA   TimeFromDataB
1       5:21:18:157     5:22:03:291
2       11:04:09:222    11:04:09:223

... 等等 ...

score 1 · Accepted Answer

我面前没有 SAS，所以我无法对此进行测试，但我很确定这应该可以解决问题。

下面的示例应仅保留原始表中时间戳之前的记录。主要思想是连接第二张表中时间戳之前的所有内容，然后尝试再次连接第二张表，看看别名 B 和别名 C 之间是否有任何记录。如果有，那么我们将它们从使用 WHERE 子句的最终结果（我们仅在 c.groupid 不存在时保留记录）。

然后可以对其进行修改以查找随后发生的记录。

select *
from xxx      a
left join yyy b  on b.groupid = a.groupid
                and b.datetime < a.datetime
left join yyy c  on c.groupid = a.groupid
                and c.datetime between b.datetime and a.datetime
where c.groupid eq .

我假设您要加入 2 个不同的表，但从概念上讲，即使您将同一个表加入到自身中，这也会起作用。

编辑：哎呀误读了这个问题-我没有看到您也允许完全匹配。明天我会修改我的答案以考虑到这一点。无论如何，这在此期间可能很有用。

这是考虑到完全匹配的修改后的代码：

select *  /* USE COALESCE() FUNCTION TO KEEP DESIRED VALUES */
from xxx      a
left join yyy b  on b.groupid = a.groupid
                and b.datetime < a.datetime
left join yyy c  on c.groupid = a.groupid
                and c.datetime between b.datetime and a.datetime
left join yyy d  on d.groupid = a.groupid
                and d.datetime = a.datetime
where d.groupid
   or (d.groupid eq . and c.groupid eq .)

这是未经测试的...

sql - SAS - 模糊匹配毫秒时间戳“Just Before”或“Just After”给定时间戳

1 回答 1

Related

Reference