1

我设置了 Prometheus 和 blackbox 导出器。这里是配置。

root@monitor-1:~# cat /etc/prometheus/prometheus.yml
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    scrape_interval: 5s
    static_configs:
      - targets:
        - http://wiki.itsmwork.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.20.202:9115


root@monitor-1:~# cat /etc/prometheus/blackbox.yaml | more
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      preferred_ip_protocol: "ip4"
      no_follow_redirects: false
      fail_if_ssl: false
      tls_config:
        insecure_skip_verify: true

我手动检查了 http 站点,它返回了预期的 0。

root@monitor-1:~# curl "http://localhost:9115/probe?target=wiki.itsmwork.com&module=http_2xx" | grep -v '^#'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2013  100  2013    0     0   294k      0 --:--:-- --:--:-- --:--:--  327k
probe_dns_lookup_time_seconds 0.002698265
probe_duration_seconds 0.00308218
probe_failed_due_to_regex 0
probe_http_content_length 0
probe_http_duration_seconds{phase="connect"} 0
probe_http_duration_seconds{phase="processing"} 0
probe_http_duration_seconds{phase="resolve"} 0
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0
probe_http_redirects 0
probe_http_ssl 0
probe_http_status_code 0
probe_http_uncompressed_body_length 0
probe_http_version 0
probe_ip_addr_hash 0
probe_ip_protocol 0
probe_success 0

但如果我在 Prometheus UI 中检查相同的目标,up{instance="http://wiki.itsmwork.com",job="blackbox"} 始终为 1。

我怎样才能确定问题是什么?

4

1 回答 1

4

处理 blackbox exporter 时注意不要混淆up和。probe_success第一个指标表示 exporter 本身是可访问的,后一个指标是关于blackbox exporter ifself scrapes 的目标。所以你得到的组合是:

  • 黑盒导出器工作正常
  • 从黑盒导出器进行探测时,要监视的系统未按预期响应

这也符合您的手动测试:对 blackbox_exporter 实例(您的 curl 命令)的请求有效,但导致探测失败(如有效负载中所示)。因此,对于您的仪表板,您应该始终将up指标与probe_success如果您想对所探测的系统进行推理,因为也可能存在您要监控的系统运行正常但黑盒导出器作业未正常运行的情况。您可以使用up切换到 的度量来发现这一点0

于 2021-01-19T12:09:43.293 回答