5

I'm trying to parse a large message with Logstash using a file input, a json filter, and an elasticsearch output. 99% of the time this works fine, but when one of my log messages is too large, I get JSON parse errors, as the initial message is broken up into two partial invalid JSON streams. The size of such messages is about 40,000+ characters long. I've looked to see if there is any information on the size of the buffer, or some max length that I should try to stay under, but haven't had any luck. The only answers I found related to the udp input, and being able to change the buffer size.

Does Logstash has a limit size for each event-message? https://github.com/elastic/logstash/issues/1505

This could also be similar to this question, but there were never any replies or suggestions: Logstash Json filter behaving unexpectedly for large nested JSONs

As a workaround, I wanted to split my message up into multiple messages, but I'm unable to do this, as I need all the information to be in the same record in Elasticsearch. I don't believe there is a way to call the Update API from logstash. Additionally, most of the data is in an array, so while I can update an Elasticsearch record's array using a script (Elasticsearch upserting and appending to array), I can't do that from Logstash.

The data records look something like this:

{ "variable1":"value1", 
 ......, 
 "variable30": "value30", 
 "attachements": [ {5500 charcters of JSON}, 
                   {5500 charcters of JSON}, 
                   {5500 charcters of JSON}.. 
                   ...
                   {8th dictionary of JSON}]
 }

Does anyone know of a way to have Logstash process these large JSON messages, or a way that I can split them up and have them end up in the same Elasticsearch record (using Logstash)?

Any help is appreciated, and I'm happy to add any information needed!

4

1 回答 1

2

如果您的elasticsearch输出有一个document_id集合,它将更新文档(logstash 中的默认操作是对index数据 - 如果文档已经存在,它将更新文档)

在您的情况下,您需要在 json 消息中包含一些唯一字段,然后依靠它在弹性搜索中进行合并。例如:

{"key":"123455","attachment1":"something big"}
{"key":"123455","attachment2":"something big"}
{"key":"123455","attachment3":"something big"}

然后有一个elasticsearch输出,如:

elasticsearch { 
  host => localhost
  document_id => "%{key}" 
}
于 2015-05-04T13:59:02.113 回答