就绪探针使应用程序处于非就绪状态。处于此状态时,应用程序无法连接到任何 kubernetes 服务。
我将 Ubuntu 18 用于我的 kubernetes 集群的主节点和节点。(当我在集群中只使用master时问题仍然存在,所以我认为这不是master节点的问题)。
我使用 Spring 应用程序设置了我的 kubernetes 集群,该应用程序使用 hazelcast 来管理缓存。因此,在使用就绪探测时,应用程序无法访问我创建的 kubernetes 服务,以便使用hazelcast-kubernetes插件通过 hazelcast 连接应用程序。
当我取出就绪探针时,应用程序会尽快连接到成功创建 hazelcast 集群的服务,并且一切正常。
就绪探针将连接到一个 rest api,它的唯一响应是一个 200 代码。然而,当应用程序启动时,在进程的中间它会启动 hazelcast 集群,因此,它会尝试连接到 kubernetes hazelcast 服务,该服务将应用程序的缓存与其他 pod 连接起来,而就绪探针没有' 没有被清除,并且 pod 由于探测而处于非就绪状态。这是应用程序无法连接到 kubernetes 服务并且由于我添加的配置而失败或卡住的情况。
服务.yaml:
apiVersion: v1
kind: Service
metadata:
name: my-app-cluster-hazelcast
spec:
selector:
app: my-app
ports:
- name: hazelcast
port: 5701
部署.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-deployment
labels:
app: my-app-deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
terminationGracePeriodSeconds: 180
containers:
- name: my-app
image: my-repo:5000/my-app-container
imagePullPolicy: Always
ports:
- containerPort: 5701
- containerPort: 9080
readinessProbe:
httpGet:
path: /app/api/excluded/sample
port: 9080
initialDelaySeconds: 120
periodSeconds: 15
securityContext:
capabilities:
add:
- SYS_ADMIN
env:
- name: container
value: docker
hazelcast.xml:
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast
xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.11.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<properties>
<property name="hazelcast.jmx">false</property>
<property name="hazelcast.logging.type">slf4j</property>
</properties>
<network>
<port auto-increment="false">5701</port>
<outbound-ports>
<ports>49000,49001,49002,49003</ports>
</outbound-ports>
<join>
<multicast enabled="false"/>
<kubernetes enabled="true">
<namespace>default</namespace>
<service-name>my-app-cluster-hazelcast</service-name>
</kubernetes>
</join>
</network>
</hazelcast>
hazelcast-client.xml:
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast-client
xsi:schemaLocation="http://www.hazelcast.com/schema/client-config http://www.hazelcast.com/schema/client-config/hazelcast-client-config-3.11.xsd"
xmlns="http://www.hazelcast.com/schema/client-config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<properties>
<property name="hazelcast.logging.type">slf4j</property>
</properties>
<connection-strategy async-start="false" reconnect-mode="ON">
<connection-retry enabled="true">
<initial-backoff-millis>1000</initial-backoff-millis>
<max-backoff-millis>60000</max-backoff-millis>
</connection-retry>
</connection-strategy>
<network>
<kubernetes enabled="true">
<namespace>default</namespace>
<service-name>my-app-cluster-hazelcast</service-name>
</kubernetes>
</network>
</hazelcast-client>
预期结果:
该服务能够连接到 pod,在其描述中创建端点。
$ kubectl 描述服务 my-app-cluster-hazelcast
Name: my-app-cluster-hazelcast
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector: app=my-app
Type: ClusterIP
IP: 10.244.28.132
Port: hazelcast 5701/TCP
TargetPort: 5701/TCP
Endpoints: 10.244.4.10:5701,10.244.4.9:5701
Session Affinity: None
Events: <none>
该应用程序运行正常,并在其 hazelcast 集群中显示了两个成员,并且部署显示为就绪,可以完全访问该应用程序:
日志:
2019-08-26 23:07:36,614 TRACE [hz._hzInstance_1_dev.InvocationMonitorThread] (com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor): [10.244.4.10]:5701 [dev] [3.11] Broadcasting operation control packets to: 2 members
$ kubectl 获取部署
NAME READY UP-TO-DATE AVAILABLE AGE
my-app-deployment 2/2 2 2 2m27s
实际结果:
该服务没有获得任何端点。
$ kubectl 描述服务 my-app-cluster-hazelcast
Name: my-app-cluster-hazelcast
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector: app=my-app
Type: ClusterIP
IP: 10.244.28.132
Port: hazelcast 5701/TCP
TargetPort: 5701/TCP
Endpoints:
Session Affinity: None
Events: <none>
应用程序被 hazelcast-client.xml 中启用的连接策略卡住,并带有以下日志,使其自己的集群保持没有通信并且部署永远处于非就绪状态:
日志:
22:54:11.236 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 57686 ms later, attempt 52 , cap retrytimeout millis 60000
22:55:02.036 [hz._hzInstance_1_dev.cached.thread-4] DEBUG com.hazelcast.internal.cluster.impl.MembershipManager - [10.244.4.8]:5701 [dev] [3.11] Sending member list to the non-master nodes:
Members {size:1, ver:1} [
Member [10.244.4.8]:5701 - 6a4c7184-8003-4d24-8023-6087d68e9709 this
]
22:55:08.968 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 51173 ms later, attempt 53 , cap retrytimeout millis 60000
22:56:00.184 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 55583 ms later, attempt 54 , cap retrytimeout millis 60000
$ kubectl 获取部署
NAME READY UP-TO-DATE AVAILABLE AGE
my-app-deployment 0/2 2 0 45m