在thanos查询中,prometheus sidecar状态为health:</p>
10.0.66.140:10901 | UP | prometheus="monitoring/k8s"prometheus_replica="prometheus-k8s-0" | 2021-10-03 01:41:57 | | 793.000ms ago
但是当我查询时,会报错:
执行查询时出错:扩展系列:代理系列():地址:10.0.66.140:10901 标签集:{prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0"} Mint:1633225317596 Maxt:9223372036854775807:从Addr: 10.0.66.140:10901 LabelSets: {prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0"} Mint: 1633225317596 Maxt: 9223372036854775807: rpc error: code = Canceled desc = grpc: 客户端连接正在关闭
Thanos 查询 Pods 出现以下错误:
level=warn ts=2021-10-03T06:04:55.23300433Z caller=storeset.go:570 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.0.66.140:10901: rpc error: code = DeadlineExceeded desc = context deadline exceeded" address=10.0.66.140:10901
level=info ts=2021-10-03T06:04:55.23316993Z caller=storeset.go:426 component=storeset msg="removing store because it's unhealthy or does not exist" address=10.0.66.140:10901 extLset="{prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"}"
level=error ts=2021-10-03T06:04:55.236279545Z caller=proxy.go:307 component=proxy request="min_time:1633240788495 max_time:1633241088495 matchers:<name:\"__name__\" value:\"up\" > aggregates:COUNT aggregates:SUM partial_response_disabled:true " err="Addr: 10.0.66.140:10901 LabelSets: {prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"} Mint: 1633225317596 Maxt: 9223372036854775807: receive series from Addr: 10.0.66.140:10901 LabelSets: {prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"} Mint: 1633225317596 Maxt: 9223372036854775807: rpc error: code = Canceled desc = grpc: the client connection is closing"
level=info ts=2021-10-03T06:04:56.069001968Z caller=storeset.go:463 component=storeset msg="adding new storeAPI to query storeset" address=10.0.66.140:10901 extLset="{prometheus=\"monitoring/k8s\", prometheus_replica=\"prometheus-k8s-0\"}"
不知道这是什么原因,谁能帮忙提供一些思路。我的对象存储是正常的,prometheus的监控数据已经从sidecar上报到对象存储了。
Thanos查询yaml(省略不必要的内容),未设置资源:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/instance: thanos-query
name: thanos-query
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/instance: thanos-query
template:
metadata:
labels:
app.kubernetes.io/instance: thanos-query
spec:
containers:
- args:
- query
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:9090
- --log.level=info
- --log.format=logfmt
- --query.replica-label=prometheus_replica
- --query.replica-label=rule_replica
- --store=dnssrv+_grpc._tcp.thanos-sidecar-self.monitoring.svc.cluster.local
- --store=dnssrv+_grpc._tcp.thanos-ruler.monitoring.svc.cluster.local
- --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc.cluster.local:10901
- --store=10.0.66.140:10901
- --query.auto-downsampling
env:
- name: HOST_IP_ADDRESS
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
image: quay.io/thanos/thanos:v0.22.0
dnsPolicy: ClusterFirst
nodeSelector:
securityContext:
fsGroup: 65534
runAsUser: 65534
serviceAccount: thanos-query
serviceAccountName: thanos-query
terminationGracePeriodSeconds: 120