sql - 如何为每个空格分配一个带有 regexp_extract 的字符串（SQL-Athena）

Question

我目前正在将我们的 webserverlog 中的消息分成几行

例如：我的消息（数据类型字符串）如下所示：

at=info method=GET path="/v1/..." host=web.com request_id=a3d71fa9-9501-4bfe-8462-54301a976d74 fwd="xxx.xx" dyno=web.1 connect=1ms service=167ms status=200 bytes=1114

我想把这些分成几行：

path   | service  | connect  | method | status | fwd     | dyno   |
------ | -------  | -------- | ------ | ------ | ------- | ------ | 
/v1/...|  167     |  1       |  GET   | 200    | xxx.xxx | web.1  |

我在标准 SQL 中使用 Amazon Athena 上的 regexp_extract 函数（第一次），并且已经从字符串中取出了几行，但我正在努力处理几行。

例如，当我尝试从字符串中切出测功机时，我得到的信息比我需要的多

 REGEXP_EXTRACT (message,'dyno=[^,]+[a-z]')AS dyno
 -> dyno=web.2 connect=0ms service=192ms status=200 bytes

我想要dyno=web.1结果然后再次提取

如果我将字符串从开头（“dyno =”）剪切到“connect =”之前的空白处，那就太好了，但我在阅读的网站中找不到正确的选项。

我如何编写选项来获得正确的字符串？

score 3 · Accepted Answer

对于塞巴斯蒂安的评论，我同意这\S+应该是继续前进的解决方案。所以查询看起来像这样：

select REGEXP_EXTRACT (message,'dyno=(\S+)',1) AS dyno
from (
  select
  'at=info method=GET path="/v1/..." host=web.com request_id=a3d71fa9-9501-4bfe-8462-54301a976d74 fwd="xxx.xx" dyno=web.1 connect=1ms service=167ms status=200 bytes=1114' message
)

score 0 · Accepted Answer

如果您的值中没有空格（如键值对），那么有一个简单的解决方案。

select  msg['at']         as "at"
       ,msg['method']     as "method"
       ,msg['path']       as "path"
       ,msg['host']       as "host"
       ,msg['request_id'] as "request_id"
       ,msg['fwd']        as "fwd"
       ,msg['dyno']       as "dyno"
       ,msg['connect']    as "connect"
       ,msg['service']    as "service"
       ,msg['status']     as "status"
       ,msg['bytes']      as "bytes"

from   (select  split_to_map (message,' ','=') as msg       
        from    mytable
        )
;

  at  | method |   path    |  host   |              request_id              |   fwd    | dyno  | connect | service | status | bytes
------+--------+-----------+---------+--------------------------------------+----------+-------+---------+---------+--------+-------
 info | GET    | "/v1/..." | web.com | a3d71fa9-9501-4bfe-8462-54301a976d74 | "xxx.xx" | web.1 | 1ms     | 167ms   | 200    | 1114

sql - 如何为每个空格分配一个带有 regexp_extract 的字符串（SQL-Athena）

2 回答 2

Related

Reference