0

我已经为 pod 中长时间运行的应用程序设置了一个活性探针。它在一天内失败了几次,导致 Pod 重新启动了几次。没有准备就绪探测。

livenessProbe:
  httpGet:
    path: /
    port: http
    scheme: HTTP
  initialDelaySeconds: 30
  timeoutSeconds: 20
  periodSeconds: 20
  successThreshold: 1
  failureThreshold: 3

进一步检查应用程序代码或 docker 镜像没有发现异常。所以我禁用了活性探测,并使用连接到网络的 PC 上的 python 脚本每 10 秒手动探测一次 NodePort 服务。手动探测虽然比活性探测更频繁、更严格,但成功并没有失败。每次ping大约持续200~400ms

手动探针与设置的活性探针大致相同

timeoutSeconds: 500ms
periodSeconds: 10
successThreshold: 1
failureThreshold: 1

为什么它成功了,而 liveness 探测失败了?它是否表示 k8s 网络问题?

吊舱清单:

kind: Pod
apiVersion: v1
metadata:
  name: pypi-pypiserver-74b689df7-rh9bm
  namespace: default
  labels:
    app.kubernetes.io/instance: pypi
    app.kubernetes.io/name: pypiserver
spec:
  volumes:
    - name: secrets
      secret:
        secretName: pypi-pypiserver
        defaultMode: 420
    - name: packages
      persistentVolumeClaim:
        claimName: pypi-pypiserver
    - name: default-token-cx7m7
      secret:
        secretName: default-token-cx7m7
        defaultMode: 420
  containers:
    - name: pypiserver
      image: 'registry.digitalocean.com/evergreen/pypiserver:latest'
      args:
        - run
        - '--passwords=.'
        - '--authenticate=.'
        - '--port=8080'
        - '--welcome=/dev/null'
        - '--server=wsgiref'
        - /data/packages
      ports:
        - name: http
          containerPort: 8080
          protocol: TCP
      resources:
        limits:
          cpu: 1600m
          memory: 1Gi
        requests:
          cpu: 400m
          memory: 256Mi
      volumeMounts:
        - name: packages
          mountPath: /data/packages
          mountPropagation: None
        - name: secrets
          readOnly: true
          mountPath: /config
        - name: default-token-cx7m7
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      livenessProbe:
        httpGet:
          path: /
          port: http
          scheme: HTTP
        initialDelaySeconds: 30
        timeoutSeconds: 10
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  nodeSelector:
    doks.digitalocean.com/node-pool: k8s-node-pool-hive-dev-2
  serviceAccountName: default
  serviceAccount: default
  nodeName: k8s-node-pool-hive-dev-2-8adyc
  securityContext:
    runAsUser: 9898
    runAsGroup: 9898
    fsGroup: 9898
  imagePullSecrets:
    - name: evergreen
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priority: 0
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority
4

1 回答 1

0

NodePort 探测只是确认 svc 在此端口可用。它不会检查 pod 是否处于活动状态。检查 livenessprobe 以了解 pod 容器的可用性。

更多细节在这里 https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

于 2021-07-04T09:39:25.477 回答