0

在thanos查询中,prometheus sidecar状态为health:</p>

10.0.66.140:10901 | UP | prometheus="monitoring/k8s"prometheus_replica="prometheus-k8s-0" | 2021-10-03 01:41:57 |   | 793.000ms ago

但是当我查询时,会报错:

执行查询时出错:扩展系列:代理系列():地址:10.0.66.140:10901 标签集:{prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0"} Mint:1633225317596 Maxt:9223372036854775807:从Addr: 10.0.66.140:10901 LabelSets: {prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0"} Mint: 1633225317596 Maxt: 9223372036854775807: rpc error: code = Canceled desc = grpc: 客户端连接正在关闭

Thanos 查询 Pods 出现以下错误:

level=warn ts=2021-10-03T06:04:55.23300433Z caller=storeset.go:570 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.0.66.140:10901: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=10.0.66.140:10901
level=info ts=2021-10-03T06:04:55.23316993Z caller=storeset.go:426 component=storeset msg="removing store because it's unhealthy or does not exist" address=10.0.66.140:10901 extLset="{prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"}"
level=error ts=2021-10-03T06:04:55.236279545Z caller=proxy.go:307 component=proxy request="min_time:1633240788495 max_time:1633241088495 matchers:<name:\"__name__\" value:\"up\" > aggregates:COUNT aggregates:SUM partial_response_disabled:true " err="Addr: 10.0.66.140:10901 LabelSets: {prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"} Mint: 1633225317596 Maxt: 9223372036854775807: receive series from Addr: 10.0.66.140:10901 LabelSets: {prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"} Mint: 1633225317596 Maxt: 9223372036854775807: rpc error: code = Canceled desc = grpc: the client connection is closing"
level=info ts=2021-10-03T06:04:56.069001968Z caller=storeset.go:463 component=storeset msg="adding new storeAPI to query storeset" address=10.0.66.140:10901 extLset="{prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"}"

不知道这是什么原因,谁能帮忙提供一些思路。我的对象存储是正常的,prometheus的监控数据已经从sidecar上报到对象存储了。

Thanos查询yaml(省略不必要的内容),未设置资源:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: thanos-query
  name: thanos-query
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/instance: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: thanos-query
    spec:
      containers:
      - args:
        - query
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:9090
        - --log.level=info
        - --log.format=logfmt
        - --query.replica-label=prometheus_replica
        - --query.replica-label=rule_replica
        - --store=dnssrv+_grpc._tcp.thanos-sidecar-self.monitoring.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.thanos-ruler.monitoring.svc.cluster.local
        - --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc.cluster.local:10901
        - --store=10.0.66.140:10901
        - --query.auto-downsampling
        env:
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.22.0
      dnsPolicy: ClusterFirst
      nodeSelector:
      securityContext:
        fsGroup: 65534
        runAsUser: 65534
      serviceAccount: thanos-query
      serviceAccountName: thanos-query
      terminationGracePeriodSeconds: 120
4

0 回答 0