对于 kube-state-metrics,我收到错误消息“没有可用的节点与所有谓词匹配:MatchNodeSelector (7)、PodToleratesNodeTaints (1)”。请指导我如何解决此问题
admin@ip-172-20-58-79:~/kubernetes-prometheus$ kubectl describe po -n kube-system kube-state-metrics-747bcc4d7d-kfn7t
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3s (x20 over 4m) default-scheduler No nodes are available that match all of the predicates: MatchNodeSelector (7), PodToleratesNodeTaints (1).
这个问题与节点上的内存有关吗?如果是,我该如何确认?我检查了所有节点,只有一个节点似乎在 80% 以上,剩余的内存使用率在 45% 到 70% 之间
以下屏幕截图显示了 kube-state-metrics (0/1 up):
此外,Prometheus 显示 kubernetes-pods (0/0 up) 是由于 kube-state-metrics 不起作用还是其他原因?和上面截图中看到的 kubernetes-apiservers (0/1 up) 为什么不起来?如何解决它?
admin@ip-172-20-58-79:~/kubernetes-prometheus$ sudo tail -f /var/log/kube-apiserver.log | grep 错误
I0110 10:15:37.153827 7 logs.go:41] http: TLS handshake error from 172.20.44.75:60828: remote error: tls: bad certificate
I0110 10:15:42.153543 7 logs.go:41] http: TLS handshake error from 172.20.44.75:60854: remote error: tls: bad certificate
I0110 10:15:47.153699 7 logs.go:41] http: TLS handshake error from 172.20.44.75:60898: remote error: tls: bad certificate
I0110 10:15:52.153788 7 logs.go:41] http: TLS handshake error from 172.20.44.75:60936: remote error: tls: bad certificate
I0110 10:15:57.154014 7 logs.go:41] http: TLS handshake error from 172.20.44.75:60992: remote error: tls: bad certificate
E0110 10:15:58.929167 7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58104: write: connection reset by peer
E0110 10:15:58.931574 7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58098: write: connection reset by peer
E0110 10:15:58.933864 7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58088: write: connection reset by peer
E0110 10:16:00.842018 7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58064: write: connection reset by peer
E0110 10:16:00.844301 7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.42.187:58058: write: connection reset by peer
E0110 10:18:17.275590 7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.44.75:37402: write: connection reset by peer
E0110 10:18:17.275705 7 runtime.go:66] Observed a panic: &errors.errorString{s:"kill connection/stream"} (kill connection/stream)
E0110 10:18:17.276401 7 runtime.go:66] Observed a panic: &errors.errorString{s:"kill connection/stream"} (kill connection/stream)
E0110 10:18:17.277808 7 status.go:62] apiserver received an error that is not an metav1.Status: write tcp 172.20.58.79:443->172.20.44.75:37392: write: connection reset by peer
MaggieO 回复后更新:
admin@ip-172-20-58-79:~/kubernetes-prometheus/kube-state-metrics-configs$ cat deployment.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v1.8.0
name: kube-state-metrics
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v1.8.0
spec:
containers:
- image: quay.io/coreos/kube-state-metrics:v1.8.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: kube-state-metrics
此外,我想将此命令添加到上面的 deployment.yaml 但出现缩进错误。显示请帮助我应该在哪里添加它。
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
更新 2:@MaggieO 即使在添加了命令/参数后,它仍显示相同的错误并且 pod 处于挂起状态:
更新 deployment.yaml :
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/name":"kube-state-metrics","app.kubernetes.io/version":"v1.8.0"},"name":"kube-state-metrics","namespace":"kube-system"},"spec":{"replicas":1,"selector":{"matchLabels":{"app.kubernetes.io/name":"kube-state-metrics"}},"template":{"metadata":{"labels":{"app.kubernetes.io/name":"kube-state-metrics","app.kubernetes.io/version":"v1.8.0"}},"spec":{"containers":[{"args":["--kubelet-insecure-tls","--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname"],"image":"quay.io/coreos/kube-state-metrics:v1.8.0","imagePullPolicy":"Always","livenessProbe":{"httpGet":{"path":"/healthz","port":8080},"initialDelaySeconds":5,"timeoutSeconds":5},"name":"kube-state-metrics","ports":[{"containerPort":8080,"name":"http-metrics"},{"containerPort":8081,"name":"telemetry"}],"readinessProbe":{"httpGet":{"path":"/","port":8081},"initialDelaySeconds":5,"timeoutSeconds":5}}],"nodeSelector":{"kubernetes.io/os":"linux"},"serviceAccountName":"kube-state-metrics"}}}}
creationTimestamp: 2020-01-10T05:33:13Z
generation: 4
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v1.8.0
name: kube-state-metrics
namespace: kube-system
resourceVersion: "178851301"
selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/kube-state-metrics
uid: b20aa645-336a-11ea-9618-0607d7cb72ed
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v1.8.0
spec:
containers:
- args:
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
image: quay.io/coreos/kube-state-metrics:v1.8.0
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
protocol: TCP
- containerPort: 8081
name: telemetry
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 8081
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-state-metrics
serviceAccountName: kube-state-metrics
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: 2020-01-10T05:33:13Z
lastUpdateTime: 2020-01-10T05:33:13Z
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: 2020-01-15T07:24:27Z
lastUpdateTime: 2020-01-15T07:29:12Z
message: ReplicaSet "kube-state-metrics-7f8c9c6c8d" is progressing.
reason: ReplicaSetUpdated
status: "True"
type: Progressing
observedGeneration: 4
replicas: 2
unavailableReplicas: 2
updatedReplicas: 1
更新 3:如下图所示,无法获取节点,请告诉我如何解决此问题