0

运行部署时,我会停机。请求在可变时间(20-40 秒)后失败。

当 preStop 发送 SIGUSR1,等待 31 秒,然后发送 SIGTERM 时,入口容器的就绪检查失败。在该时间范围内,应从服务中删除 pod,因为就绪检查设置为在 2 次失败尝试(以 5 秒为间隔)后失败。

如何查看从服务中添加和删除 pod 的事件以找出导致此问题的原因?

围绕准备就绪的事件会自行检查吗?

我使用 Google Container Engine 1.2.2 版并使用 GCE 的网络负载均衡器。

服务:

apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: http
    protocol: TCP
  - name: https
    port: 443
    targetPort: https
    protocol: TCP  
  selector:
    app: myapp

部署:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        version: 1.0.0-61--66-6
    spec:
      containers:
      - name: myapp
        image: ****  
        resources:
          limits:
            cpu: 100m
            memory: 250Mi
          requests:
            cpu: 10m
            memory: 125Mi
        ports:
        - name: http-direct
          containerPort: 5000
        livenessProbe:
          httpGet:
            path: /status
            port: 5000
          initialDelaySeconds: 30
          timeoutSeconds: 1
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["sleep 31;"]
      - name: haproxy
        image: travix/haproxy:1.6.2-r0
        imagePullPolicy: Always
        resources:
          limits:
            cpu: 100m
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 25Mi
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        env:
        - name: "SSL_CERTIFICATE_NAME"
          value: "ssl.pem"         
        - name: "OFFLOAD_TO_PORT"
          value: "5000"
        - name: "HEALT_CHECK_PATH"
          value: "/status"
        volumeMounts:
        - name: ssl-certificate
          mountPath: /etc/ssl/private
        livenessProbe:
          httpGet:
            path: /status
            port: 443
            scheme: HTTPS
          initialDelaySeconds: 30
          timeoutSeconds: 1
        readinessProbe:
          httpGet:
            path: /readiness
            port: 81
          initialDelaySeconds: 0
          timeoutSeconds: 1
          periodSeconds: 5
          successThreshold: 1
          failureThreshold: 2
        lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["kill -USR1 1; sleep 31; kill 1"]
      volumes:
      - name: ssl-certificate
        secret:
          secretName: ssl-c324c2a587ee-20160331
4

1 回答 1

1

当探测失败时,探测器将发出一个警告事件,原因为 as Unhealthy,消息为xx probe errored: xxx

您应该能够使用kubectl get eventskubectl describe pods -l app=myapp,version=1.0.0-61--66-6(按标签过滤 pod)找到这些事件。

于 2016-05-03T21:21:45.847 回答