我在使用 fluentd 进行 ES 日志记录期间遇到这些错误。我在 k8s 上使用 fluentd logging 进行应用程序日志记录,我们正在处理 100M(大约 400 tps)并遇到此问题。我正在使用 M6g.2xlarge(8 核和 32 RAM)AWS 实例 3 个主节点和 20 个数据节点。在 200 tps 下,在 200 得到这些问题后一切正常。Kibana 有延迟,ES 上有数据丢失。
ES 版本:7.15.0
流利版本:1.12.4
我的日志记录流程:Fluentd > ES > Kibana
错误日志:
2022-02-04 00:37:53 +0530 [warn]: #0 failed to flush the buffer. retry_time=3 next_retry_seconds=2022-02-04 00:37:56 +0530 chunk="5d721d8e59e44f5bbbf4aa5e267f7e3e" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"eslogging-prod.abc.com\", :port=>80, :scheme=>\"http\"}): [429] {\"error\":{\"root_cause\":[{\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of coordinating operation [coordinating_and_primary_bytes=1621708431, replica_bytes=0, all_bytes=1621708431, coordinating_operation_bytes=48222318, max_coordinating_and_primary_bytes=1655072358]\"}],\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of coordinating operation [coordinating_and_primary_bytes=1621708431, replica_bytes=0, all_bytes=1621708431, coordinating_operation_bytes=48222318, max_coordinating_and_primary_bytes=1655072358]\"},\"status\":429}
配置文件:
<source>
@type tail
@id in_tail_container_logs
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type json
time_key @timestamp
time_format %Y-%m-%dT%H:%M:%S.%N%z
keep_time_key true
</parse>
</source>
<filter kubernetes.**>
@type kubernetes_metadata
skip_container_metadata "true"
</filter>
<filter kubernetes.var.log.containers.**prod**>
@id filter_concat
@type concat
key log
use_first_timestamp true
multiline_end_regexp /\n$/
separator ""
</filter>
<filter kubernetes.var.log.containers.**prod**>
@type record_transformer
<record>
log_json ${record["log"]}
</record>
remove_keys $.kubernetes.pod_id,$.kubernetes.container_image
</filter>
<filter kubernetes.var.log.containers.**prod**>
@type parser
@log_level debug
key_name log_json
#reserve_time true
reserve_data true
remove_key_name_field true
emit_invalid_record_to_error true
<parse>
@type json
</parse>
</filter>
<match kubernetes.var.log.containers.**prod**>
@type elasticsearch
@log_level info
include_tag_key true
suppress_type_name true
host "eslogging-prod.abc.com"
port 80
reload_connections false
logstash_format true
logstash_prefix ${$.kubernetes.labels.app}
reconnect_on_error true
num_threads 8
request_timeout 2147483648
compression_level best_compression
compression gzip
include_timestamp true
utc_index false
time_key_format "%Y-%m-%dT%H:%M:%S.%N%z"
time_key time
reload_on_failure true
prefer_oj_serializer true
bulk_message_request_threshold -1
slow_flush_log_threshold 30.0
log_es_400_reason true
<buffer tag, $.kubernetes.labels.app>
@type file
path /var/log/fluentd-buffers/kubernetes-apps.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 8
flush_interval 5s
retry_forever true
retry_max_interval 30
chunk_limit_size 200M
queue_limit_length 512
overflow_action throw_exception
</buffer>
</match>