regex - 如何使用 logstash 过滤器处理多行日志条目？

Question

背景：

我有一个自定义生成的日志文件，它具有以下模式：

[2014-03-02 17:34:20] - 127.0.0.1|ERROR| E:\xampp\htdocs\test.php|123|subject|The error message goes here ; array (
  'create' => 
  array (
    'key1' => 'value1',
    'key2' => 'value2',
    'key3' => 'value3'
  ),
)
[2014-03-02 17:34:20] - 127.0.0.1|DEBUG| flush_multi_line

第二个条目[2014-03-02 17:34:20] - 127.0.0.1|DEBUG| flush_multi_line是一个虚拟行，只是为了让 logstash 知道多行事件已经结束，此行稍后会被删除。

我的配置文件如下：

input {
  stdin{}
}

filter{
  multiline{
      pattern => "^\["
      what => "previous"
      negate=> true
  }
  grok{
    match => ['message',"\[.+\] - %{IP:ip}\|%{LOGLEVEL:loglevel}"]
  }

  if [loglevel] == "DEBUG"{ # the event flush  line
    drop{}
  }else if [loglevel] == "ERROR"  { # the first line of multievent
    grok{
      match => ['message',".+\|.+\| %{PATH:file}\|%{NUMBER:line}\|%{WORD:tag}\|%{GREEDYDATA:content}"] 
    }
  }else{ # its a new line (from the multi line event)
    mutate{
      replace => ["content", "%{content} %{message}"] # Supposing each new line will override the message field
    }
  }  
}

output {
  stdout{ debug=>true }
}

内容字段的输出是：The error message goes here ; array (

问题：

我的问题是我想将多行的其余部分存储到内容字段：

The error message goes here ; array (
  'create' => 
  array (
    'key1' => 'value1',
    'key2' => 'value2',
    'key3' => 'value3'
  ),
)

所以我可以稍后删除消息字段。

@message字段包含整个多行事件，所以我尝试了mutate过滤器，上面带有替换功能，但我无法让它工作:(。

我不了解 Multiline 过滤器的工作方式，如果有人能对此有所了解，将不胜感激。

谢谢，

阿卜杜。

score 13 · Accepted Answer

我浏览了源代码，发现：

多行过滤器将取消所有被认为是待处理事件的后续事件，然后将该行附加到原始消息字段，这意味着多行过滤器之后的任何过滤器在这种情况下都不会应用
唯一会通过过滤器的事件是被认为是新事件的事件（在我的情况下以[开头的事件）

这是工作代码：

input {
   stdin{}
}  

filter{
      if "|ERROR|" in [message]{ #if this is the 1st message in many lines message
      grok{
        match => ['message',"\[.+\] - %{IP:ip}\|%{LOGLEVEL:loglevel}\| %{PATH:file}\|%{NUMBER:line}\|%{WORD:tag}\|%{GREEDYDATA:content}"]
      }

      mutate {
        replace => [ "message", "%{content}" ] #replace the message field with the content field ( so it auto append later in it )
        remove_field => ["content"] # we no longer need this field
      }
    }

    multiline{ #Nothing will pass this filter unless it is a new event ( new [2014-03-02 1.... )
        pattern => "^\["
        what => "previous"
        negate=> true
    }

    if "|DEBUG| flush_multi_line" in [message]{
      drop{} # We don't need the dummy line so drop it
    }
}

output {
  stdout{ debug=>true }
}

干杯，

阿卜杜

score 12 · Accepted Answer

本期提到了 grok 和多行处理https://logstash.jira.com/browse/LOGSTASH-509

只需在您的 grok 正则表达式前面添加“（？m）”，您就不需要突变。问题示例：

pattern => "(?m)<%{POSINT:syslog_pri}>(?:%{SPACE})%{GREEDYDATA:message_remainder}"

score 6 · Accepted Answer

多行过滤器会将“\n”添加到消息中。例如：

"[2014-03-02 17:34:20] - 127.0.0.1|ERROR| E:\\xampp\\htdocs\\test.php|123|subject|The error message goes here ; array (\n  'create' => \n  array (\n    'key1' => 'value1',\n    'key2' => 'value2',\n    'key3' => 'value3'\n  ),\n)"

但是，grok 过滤器无法解析“\n”。因此，您需要将 \n 替换为另一个字符，例如空格。

mutate {
    gsub => ['message', "\n", " "]
}

然后，grok 模式可以解析消息。例如：

 "content" => "The error message goes here ; array (   'create' =>    array (     'key1' => 'value1',     'key2' => 'value2',     'key3' => 'value3'   ), )"

score 1 · Accepted Answer

问题不只是过滤器的排序。顺序对于记录存储非常重要。您不需要另一行来表明您已完成输出多行日志行。只需确保多行过滤器首先出现在 grok 之前（见下文）

Ps 我已经成功地解析了一个多行日志行，其中 xml 被附加到日志行的末尾并且它跨越了多行，但我仍然在我的内容等效变量中得到了一个干净的 xml 对象（下面命名为 xmlrequest）。在你说任何关于在日志中记录 xml 的事情之前......我知道......它并不理想......但那是另一场辩论:)）：

filter { 
multiline{
        pattern => "^\["
        what => "previous"
        negate=> true
    }

mutate {
    gsub => ['message', "\n", " "]
}

mutate {
    gsub => ['message', "\r", " "]
}

grok{
        match => ['message',"\[%{WORD:ONE}\] \[%{WORD:TWO}\] \[%{WORD:THREE}\] %{GREEDYDATA:xmlrequest}"]
    }

xml {
source => xmlrequest
remove_field => xmlrequest
target => "request"
  }
}

regex - 如何使用 logstash 过滤器处理多行日志条目？

背景：

问题：

4 回答 4

Related

Reference