0

我在使用 fluentd 进行 ES 日志记录期间遇到这些错误。我在 k8s 上使用 fluentd logging 进行应用程序日志记录,我们正在处理 100M(大约 400 tps)并遇到此问题。我正在使用 M6g.2xlarge(8 核和 32 RAM)AWS 实例 3 个主节点和 20 个数据节点。在 200 tps 下,在 200 得到这些问题后一切正常。Kibana 有延迟,ES 上有数据丢失。

ES 版本:7.15.0
流利版本:1.12.4

我的日志记录流程:Fluentd > ES > Kibana

错误日志:

2022-02-04 00:37:53 +0530 [warn]: #0 failed to flush the buffer. retry_time=3 next_retry_seconds=2022-02-04 00:37:56 +0530 chunk="5d721d8e59e44f5bbbf4aa5e267f7e3e" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"eslogging-prod.abc.com\", :port=>80, :scheme=>\"http\"}): [429] {\"error\":{\"root_cause\":[{\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of coordinating operation [coordinating_and_primary_bytes=1621708431, replica_bytes=0, all_bytes=1621708431, coordinating_operation_bytes=48222318, max_coordinating_and_primary_bytes=1655072358]\"}],\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of coordinating operation [coordinating_and_primary_bytes=1621708431, replica_bytes=0, all_bytes=1621708431, coordinating_operation_bytes=48222318, max_coordinating_and_primary_bytes=1655072358]\"},\"status\":429}

配置文件:

<source>
    @type tail
    @id in_tail_container_logs
    path /var/log/containers/*.log
    pos_file /var/log/fluentd-containers.log.pos
    tag kubernetes.*
    read_from_head true
    <parse>
        @type json
        time_key @timestamp
        time_format %Y-%m-%dT%H:%M:%S.%N%z
        keep_time_key true
    </parse>
</source>

<filter kubernetes.**>
    @type kubernetes_metadata
    skip_container_metadata "true"
</filter>

<filter kubernetes.var.log.containers.**prod**>
    @id filter_concat
    @type concat
    key log
    use_first_timestamp true
    multiline_end_regexp /\n$/
    separator ""
</filter>

<filter kubernetes.var.log.containers.**prod**>
    @type record_transformer
    <record>
        log_json ${record["log"]}
    </record>
    remove_keys $.kubernetes.pod_id,$.kubernetes.container_image
</filter>

<filter kubernetes.var.log.containers.**prod**>
    @type parser
    @log_level debug
    key_name log_json
    #reserve_time true
    reserve_data true
    remove_key_name_field true
    emit_invalid_record_to_error true
    <parse>
        @type json
    </parse>
</filter>    

<match kubernetes.var.log.containers.**prod**>
    @type elasticsearch
    @log_level info
    include_tag_key true
    suppress_type_name true
    host "eslogging-prod.abc.com"
    port 80
    reload_connections false
    logstash_format true
    logstash_prefix ${$.kubernetes.labels.app}
    reconnect_on_error true
    num_threads 8
    request_timeout 2147483648
    compression_level best_compression
    compression gzip
    include_timestamp true
    utc_index false
    time_key_format "%Y-%m-%dT%H:%M:%S.%N%z"
    time_key time
    reload_on_failure true
    prefer_oj_serializer true
    bulk_message_request_threshold -1
    slow_flush_log_threshold 30.0
    log_es_400_reason true
    <buffer tag, $.kubernetes.labels.app>
        @type file
        path /var/log/fluentd-buffers/kubernetes-apps.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 8
        flush_interval 5s
        retry_forever true 
        retry_max_interval 30
        chunk_limit_size 200M
        queue_limit_length 512
        overflow_action throw_exception           
    </buffer>
</match>
4

1 回答 1

0

您遇到了 Elasticsearch 的索引速率限制。看看这篇文章以获得提示:https : //chenriang.me/elasticsearch-bulk-insert-rejection.html 我建议尝试减少 Fluentd 中的块大小并增加重试次数。

于 2022-02-04T06:42:57.773 回答