0

我正在慢慢完成一个利用 AWS Athena 处理各种日志文件的项目。我的目标是使用日志文件进行事件关联,因此我需要找到某种方法来在给定时间范围内从单个 SQL 语句中选择和显示来自多个表的数据。这是我要实现的目标的示例:

scada.timestamp             process.eventid                         scada.srcaddr   process.requestid                       scada. action
2017-03-16T07:25:46.000Z    c148e2ce-8500-467a-a970-ef1d43dd4aea    172.31.25.225   032bfafb-e8a3-4c06-a2dc-fa740abc135     ACCEPT
2017-03-16T07:25:46.000Z    8cc8143a-cf55-4db3-b112-0ff7f268edd0    172.31.25.225   f413e138-9445-408f-8124-ee6c33229889    ACCEPT

以下是来自 2 个表的数据示例:

表格1:

SELECT eventtime, requestid, eventid FROM process_native limit 10;

        eventtime               requestid                               eventid
        2016-05-07T08:57:37Z    032bfafb-e8a3-4c06-a2dc-fa740abc135c    c148e2ce-8500-467a-a970-ef1d43dd4aea
        2016-05-07T08:57:37Z    f413e138-9445-408f-8124-ee6c33229889    8cc8143a-cf55-4db3-b112-0ff7f268edd0

表 2:

SELECT tstart, srcaddr, action FROM scada_raw limit 10;

tstart      srcaddr         action
1489509010  139.59.39.211   REJECT
1489509010  172.31.20.111   ACCEPT

由于表 2 将时间存储为 unix 时间,这会使事情变得有些复杂,因此我需要对其进行转换,以便使用一种通用的时间格式:

表2更新时间:

SELECT to_iso8601(from_unixtime(tstart)) as timestamp, srcaddr, action FROM scada_raw limit 10;

timestamp                   srcaddr         action
2017-03-16T07:25:46.000Z    172.31.25.225   ACCEPT
2017-03-16T07:25:46.000Z    172.31.25.225   ACCEPT

坦率地说,我不知道该怎么做 :) 这是我想到的一个查询,它只是超时:

SELECT process_native.eventid,
         process_native.requestid,
         scada_raw.srcaddr,
         scada_raw.action,
FROM process_native, scada_raw
WHERE scada_rawe.eventtime >= '2017-02-17T00:00:00Z'
        AND scada_raw.eventtime < '2017-03-20T00:00:00Z'

我真的不知道下一步该去哪里,我已经用 SQL 工作了 3 天了,这超出了我的范围。我的目标甚至可以实现吗?

谢谢!

4

1 回答 1

1

即使您不能保证日期将匹配以进行正确连接,您也可以将记录彼此靠近。例如:

eventtime               requestid      eventid        srcaddr       action
2017-03-14 16:30:10.000                               139.59.39.211 REJECT
2017-03-14 16:30:10.000                               172.31.20.111 ACCEPT
2017-03-14 16:30:11.000 032bfafb-e8... c148e2ce-85...  
2017-03-14 16:30:11.000 f413e138-94... 8cc8143a-cf...  

从这样的查询:

WITH TimelineRecords AS (
    SELECT 
        eventtime,
        requestid,
        eventid,
        NULL srcaddr,
        NULL action
    FROM
        process_native
    WHERE
        eventtime BETWEEN timestamp '2017-03-14 16:30:00' AND  timestamp '2017-03-14 16:35:00'
    UNION ALL
    SELECT
        from_unixtime(tstart) eventtime,
        NULL requestid,
        NULL eventid,
        srcaddr,
        action
    FROM
        scada_raw
    WHERE
        from_unixtime(tstart) BETWEEN timestamp '2017-03-14 16:30:00' AND  timestamp '2017-03-14 16:35:00'
)
SELECT
    *
FROM
    TimelineRecords
ORDER BY
    eventtime;

很抱歉这两个 WHERE 子句,当我把它放在最后一个 select 语句上时,Athena 不喜欢它。

于 2017-03-24T03:43:19.943 回答