2

我目前正在将我们的 webserverlog 中的消息分成几行

例如:我的消息(数据类型字符串)如下所示:

at=info method=GET path="/v1/..." host=web.com request_id=a3d71fa9-9501-4bfe-8462-54301a976d74 fwd="xxx.xx" dyno=web.1 connect=1ms service=167ms status=200 bytes=1114

我想把这些分成几行:

path   | service  | connect  | method | status | fwd     | dyno   |
------ | -------  | -------- | ------ | ------ | ------- | ------ | 
/v1/...|  167     |  1       |  GET   | 200    | xxx.xxx | web.1  |

我在标准 SQL 中使用 Amazon Athena 上的 regexp_extract 函数(第一次),并且已经从字符串中取出了几行,但我正在努力处理几行。

例如,当我尝试从字符串中切出测功机时,我得到的信息比我需要的多

 REGEXP_EXTRACT (message,'dyno=[^,]+[a-z]')AS dyno
 -> dyno=web.2 connect=0ms service=192ms status=200 bytes

我想要dyno=web.1结果然后再次提取

如果我将字符串从开头(“dyno =”)剪切到“connect =”之前的空白处,那就太好了,但我在阅读的网站中找不到正确的选项。

我如何编写选项来获得正确的字符串?

4

2 回答 2

3

对于塞巴斯蒂安的评论,我同意这\S+应该是继续前进的解决方案。所以查询看起来像这样:

select REGEXP_EXTRACT (message,'dyno=(\S+)',1) AS dyno
from (
  select
  'at=info method=GET path="/v1/..." host=web.com request_id=a3d71fa9-9501-4bfe-8462-54301a976d74 fwd="xxx.xx" dyno=web.1 connect=1ms service=167ms status=200 bytes=1114' message
)
于 2017-03-09T15:18:07.787 回答
0

如果您的值中没有空格(如键值对),那么有一个简单的解决方案。

select  msg['at']         as "at"
       ,msg['method']     as "method"
       ,msg['path']       as "path"
       ,msg['host']       as "host"
       ,msg['request_id'] as "request_id"
       ,msg['fwd']        as "fwd"
       ,msg['dyno']       as "dyno"
       ,msg['connect']    as "connect"
       ,msg['service']    as "service"
       ,msg['status']     as "status"
       ,msg['bytes']      as "bytes"

from   (select  split_to_map (message,' ','=') as msg       
        from    mytable
        )
;

  at  | method |   path    |  host   |              request_id              |   fwd    | dyno  | connect | service | status | bytes
------+--------+-----------+---------+--------------------------------------+----------+-------+---------+---------+--------+-------
 info | GET    | "/v1/..." | web.com | a3d71fa9-9501-4bfe-8462-54301a976d74 | "xxx.xx" | web.1 | 1ms     | 167ms   | 200    | 1114
于 2017-03-19T18:39:11.813 回答