我已经为 pod 中长时间运行的应用程序设置了一个活性探针。它在一天内失败了几次,导致 Pod 重新启动了几次。没有准备就绪探测。
livenessProbe:
httpGet:
path: /
port: http
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 20
periodSeconds: 20
successThreshold: 1
failureThreshold: 3
进一步检查应用程序代码或 docker 镜像没有发现异常。所以我禁用了活性探测,并使用连接到网络的 PC 上的 python 脚本每 10 秒手动探测一次 NodePort 服务。手动探测虽然比活性探测更频繁、更严格,但成功并没有失败。每次ping大约持续200~400ms
手动探针与设置的活性探针大致相同
timeoutSeconds: 500ms
periodSeconds: 10
successThreshold: 1
failureThreshold: 1
为什么它成功了,而 liveness 探测失败了?它是否表示 k8s 网络问题?
吊舱清单:
kind: Pod
apiVersion: v1
metadata:
name: pypi-pypiserver-74b689df7-rh9bm
namespace: default
labels:
app.kubernetes.io/instance: pypi
app.kubernetes.io/name: pypiserver
spec:
volumes:
- name: secrets
secret:
secretName: pypi-pypiserver
defaultMode: 420
- name: packages
persistentVolumeClaim:
claimName: pypi-pypiserver
- name: default-token-cx7m7
secret:
secretName: default-token-cx7m7
defaultMode: 420
containers:
- name: pypiserver
image: 'registry.digitalocean.com/evergreen/pypiserver:latest'
args:
- run
- '--passwords=.'
- '--authenticate=.'
- '--port=8080'
- '--welcome=/dev/null'
- '--server=wsgiref'
- /data/packages
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 1600m
memory: 1Gi
requests:
cpu: 400m
memory: 256Mi
volumeMounts:
- name: packages
mountPath: /data/packages
mountPropagation: None
- name: secrets
readOnly: true
mountPath: /config
- name: default-token-cx7m7
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
livenessProbe:
httpGet:
path: /
port: http
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
nodeSelector:
doks.digitalocean.com/node-pool: k8s-node-pool-hive-dev-2
serviceAccountName: default
serviceAccount: default
nodeName: k8s-node-pool-hive-dev-2-8adyc
securityContext:
runAsUser: 9898
runAsGroup: 9898
fsGroup: 9898
imagePullSecrets:
- name: evergreen
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority