我在独立模式下在 k8s 上部署了几个 flink,并通过一个 promethus-pushgateway 导出它们的指标。
问题是:
度量数据间歇性地到达promethus,导致在grafana中显示时点之间的间隙
普罗米修斯目标:
monitoring/pushgateway/0 (1/1 up)
Endpoint: http://172.19.88.111:9091/metrics
State : UP
Labels: endpoint="tcp" instance="172.19.88.111:9091" job="pushgateway" namespace="flink-sql" pod="pushgateway-76d64545dd-6prdn" service="pushgateway"
我直接查询推送网关,但每次都无法获取所有指标
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:17 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:18 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:18 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:19 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
flink_jobmanager_numRegisteredTaskManagers{host="jobmanager",instance="",job="model"} 20
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:20 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:21 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:22 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:22 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:23 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:23 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
flink_jobmanager_numRegisteredTaskManagers{host="jobmanager",instance="",job="model"} 20
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:24 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:24 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="172_19_90_175",instance="",job="model1122"} 8
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:25 UTC 2021
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:26 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
bash-5.0# date && curl -s http://pushgateway.flink-sql:9091/metrics | grep flink_jobmanager_numRegisteredTaskManagers
Mon May 24 07:15:27 UTC 2021
# HELP flink_jobmanager_numRegisteredTaskManagers numRegisteredTaskManagers (scope: jobmanager)
# TYPE flink_jobmanager_numRegisteredTaskManagers gauge
flink_jobmanager_numRegisteredTaskManagers{host="flink_jobmanager",instance="",job="flink-sql"} 0
我的 flink-conf.yaml 中的配置
metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
metrics.reporter.promgateway.host: pushgateway.flink-sql
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: flink-sql
metrics.reporter.promgateway.randomJobNameSuffix: false
metrics.reporter.promgateway.deleteOnShutdown: false
metrics.reporter.promgateway.interval: 3 SECONDS
偶设置promethus Scrape interval
metrics.reporter.promgateway.interval
为1秒,无效果;