1

就绪探针使应用程序处于非就绪状态。处于此状态时,应用程序无法连接到任何 kubernetes 服务。

我将 Ubuntu 18 用于我的 kubernetes 集群的主节点和节点。(当我在集群中只使用master时问题仍然存在,所以我认为这不是master节点的问题)。

我使用 Spring 应用程序设置了我的 kubernetes 集群,该应用程序使用 hazelcast 来管理缓存。因此,在使用就绪探测时,应用程序无法访问我创建的 kubernetes 服务,以便使用hazelcast-kubernetes插件通过 hazelcast 连接应用程序。

当我取出就绪探针时,应用程序会尽快连接到成功创建 hazelcast 集群的服务,并且一切正常。

就绪探针将连接到一个 rest api,它的唯一响应是一个 200 代码。然而,当应用程序启动时,在进程的中间它会启动 hazelcast 集群,因此,它会尝试连接到 kubernetes hazelcast 服务,该服务将应用程序的缓存与其他 pod 连接起来,而就绪探针没有' 没有被清除,并且 pod 由于探测而处于非就绪状态。这是应用程序无法连接到 kubernetes 服务并且由于我添加的配置而失败或卡住的情况。

服务.yaml:

apiVersion: v1
kind: Service
metadata:
  name: my-app-cluster-hazelcast
spec:
  selector:
    app: my-app
  ports:
  - name: hazelcast
    port: 5701

部署.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
  labels:
    app: my-app-deployment
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 2
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      terminationGracePeriodSeconds: 180
      containers:
      - name: my-app
        image: my-repo:5000/my-app-container
        imagePullPolicy: Always
        ports:
        - containerPort: 5701
        - containerPort: 9080
        readinessProbe:
          httpGet:
            path: /app/api/excluded/sample
            port: 9080
          initialDelaySeconds: 120
          periodSeconds: 15
        securityContext:
          capabilities:
            add:
              - SYS_ADMIN
        env:
          - name: container
            value: docker

hazelcast.xml:

<?xml version="1.0" encoding="UTF-8"?>

<hazelcast
        xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.11.xsd"
        xmlns="http://www.hazelcast.com/schema/config"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <properties>
        <property name="hazelcast.jmx">false</property>
        <property name="hazelcast.logging.type">slf4j</property>
    </properties>

    <network>
        <port auto-increment="false">5701</port>
            <outbound-ports>
                <ports>49000,49001,49002,49003</ports>
            </outbound-ports>
        <join>
            <multicast enabled="false"/>
            <kubernetes enabled="true">
                <namespace>default</namespace>
                <service-name>my-app-cluster-hazelcast</service-name>
            </kubernetes>
        </join>
    </network>
</hazelcast>

hazelcast-client.xml:

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast-client
        xsi:schemaLocation="http://www.hazelcast.com/schema/client-config http://www.hazelcast.com/schema/client-config/hazelcast-client-config-3.11.xsd"
        xmlns="http://www.hazelcast.com/schema/client-config"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <properties>
        <property name="hazelcast.logging.type">slf4j</property>
    </properties>

    <connection-strategy async-start="false" reconnect-mode="ON">
        <connection-retry enabled="true">
            <initial-backoff-millis>1000</initial-backoff-millis>
            <max-backoff-millis>60000</max-backoff-millis>
        </connection-retry>
    </connection-strategy>

    <network>
        <kubernetes enabled="true">
            <namespace>default</namespace>
            <service-name>my-app-cluster-hazelcast</service-name>
        </kubernetes>
    </network>
</hazelcast-client>

预期结果:

该服务能够连接到 pod,在其描述中创建端点。

$ kubectl 描述服务 my-app-cluster-hazelcast

Name:              my-app-cluster-hazelcast
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector:          app=my-app
Type:              ClusterIP
IP:                10.244.28.132
Port:              hazelcast  5701/TCP
TargetPort:        5701/TCP
Endpoints:         10.244.4.10:5701,10.244.4.9:5701
Session Affinity:  None
Events:            <none>

该应用程序运行正常,并在其 hazelcast 集群中显示了两个成员,并且部署显示为就绪,可以完全访问该应用程序:

日志:

2019-08-26 23:07:36,614 TRACE [hz._hzInstance_1_dev.InvocationMonitorThread] (com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor): [10.244.4.10]:5701 [dev] [3.11] Broadcasting operation control packets to: 2 members

$ kubectl 获取部署

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
my-app-deployment   2/2     2            2           2m27s

实际结果:

该服务没有获得任何端点。

$ kubectl 描述服务 my-app-cluster-hazelcast

Name:              my-app-cluster-hazelcast
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector:          app=my-app
Type:              ClusterIP
IP:                10.244.28.132
Port:              hazelcast  5701/TCP
TargetPort:        5701/TCP
Endpoints:
Session Affinity:  None
Events:            <none>

应用程序被 hazelcast-client.xml 中启用的连接策略卡住,并带有以下日志,使其自己的集群保持没有通信并且部署永远处于非就绪状态:

日志:

22:54:11.236 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 57686 ms later, attempt 52 , cap retrytimeout millis 60000
22:55:02.036 [hz._hzInstance_1_dev.cached.thread-4] DEBUG com.hazelcast.internal.cluster.impl.MembershipManager - [10.244.4.8]:5701 [dev] [3.11] Sending member list to the non-master nodes:

Members {size:1, ver:1} [
        Member [10.244.4.8]:5701 - 6a4c7184-8003-4d24-8023-6087d68e9709 this
]

22:55:08.968 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 51173 ms later, attempt 53 , cap retrytimeout millis 60000
22:56:00.184 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 55583 ms later, attempt 54 , cap retrytimeout millis 60000

$ kubectl 获取部署

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
my-app-deployment   0/2     2            0           45m
4

2 回答 2

1

只是为了澄清:

正如OP参考就绪探针所述:

kubelet 使用就绪探测来了解容器何时准备好开始接受流量。当 Pod 的所有 Container 都准备好时,就认为 Pod 准备好了。此信号的一种用途是控制哪些 Pod 用作服务的后端。当 Pod 未准备好时,它会从服务负载均衡器中移除

于 2019-08-27T15:10:52.643 回答
0

在您的服务 yaml 中,您有

spec:
  selector:
    app: my-app

但在部署 yaml 中,标签值不同

metadata:
  name: my-app-deployment
  labels:
    app: my-app-deployment

这有什么原因吗?

于 2019-08-26T23:33:08.273 回答