prometheus - 如何使用联合从多个 Prometheus 实例（每个使用 instance="localhost:9090"）中收集 Prometheus 的指标

Question

我们有多个在数据中心运行的 Prometheus 实例（我将它们称为 DC Prometheus 实例）和一个额外的 Prometheus 实例（我们在下文中将其称为“主”），我们通过以下方式从 DC Prometheus 实例中收集指标使用联合功能。

主要 Prometheus 正在从自身中抓取 {job='prometheus'} 值，但也从 DC Prometheus 实例（每个从 localhost:9090 中抓取）。

问题是主要普罗米修斯抱怨无序样本：

WARN[1585] 摄取无序样本时出错 numDropped=369 source=target.go:475 target=dc1-prometheus:443

我发现这是因为包含{job="prometheus"}在 'match[]' 参数中。

我试图通过标签重新标记来解决这个问题，但是当我尝试使用单个 DC Prometheus 并不断更换时，我无法让它工作（我仍然遇到无序样本错误），而且我没有在使用多个目标时，甚至不知道用什么来代替。

  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/prometheus/federate'
    scheme: 'https'

    params:
      'match[]':
        - '{job="some-jobs-here..."}'
        - '{job="prometheus"}'

    relabel_configs:
    - source_labels: ['instance']
      target_label: 'instance'
      regex: 'localhost:9090'
      replacement: '??' # I've tried with 'dc1-prometheus:9090' and single target only.. no luck

    target_groups:
      - targets:
        - 'dc1-prometheus'
        - 'dc2-prometheus'
        - 'dc3-prometheus'

我的问题是如何使用 relabel_configs 来摆脱乱序错误。我到处都在使用 Prometheus 0.17。

score 12 · Accepted Answer

您需要在这里做的是external_labels在每个数据中心 Prometheus 服务器上指定唯一性。这将导致他们在/federate端点上添加这些标签，并防止您遇到冲突的时间序列。

我关于联合 Prometheus 的博客文章在这样的情况下有一个示例：http ://www.robustperception.io/scaling-and-federating-prometheus/

（我应该在这里补充一点，这relabel_configs对你没有帮助，因为这只改变了目标标签。metric_relabel_configs改变了从刮擦中返回的内容。见http://www.robustperception.io/life-of-a-label/）

prometheus - 如何使用联合从多个 Prometheus 实例（每个使用 instance="localhost:9090"）中收集 Prometheus 的指标

1 回答 1

Related

Reference