我有两个在集群中运行的 vertx 微服务,并使用本地云中的无头服务(链接)相互通信。每当我进行滚动部署时,我都会面临服务中的连接问题。当我分析日志时,我可以看到旧节点/pod 已从集群列表中删除,但事件总线并未将其删除并以循环方式使用它。
下面是部署前的成员组信息
Member [192.168.4.54]:5701 - ace32cef-8cb2-4a3b-b15a-2728db068b80 //pod 1
Member [192.168.4.54]:5705 - f0c39a6d-4834-4b1d-a179-1f0d74cabbce this
Member [192.168.101.79]:5701 - ac0dcea9-898a-4818-b7e2-e9f8aaefb447 //pod 2
开始部署时,pod 2 会从成员列表中删除,
[192.168.4.54]:5701 [dev] [4.0.2] Could not connect to: /192.168.101.79:5701. Reason: SocketException[Connection refused to address /192.168.101.79:5701]
Removing connection to endpoint [192.168.101.79]:5701 Cause => java.net.SocketException {Connection refused to address /192.168.101.79:5701}, Error-Count: 5
Removing Member [192.168.101.79]:5701 - ac0dcea9-898a-4818-b7e2-e9f8aaefb447
并添加了新成员,
Member [192.168.4.54]:5701 - ace32cef-8cb2-4a3b-b15a-2728db068b80
Member [192.168.4.54]:5705 - f0c39a6d-4834-4b1d-a179-1f0d74cabbce this
Member [192.168.94.85]:5701 - 1347e755-1b55-45a3-bb9c-70e07a29d55b //new pod
All migration tasks have been completed. (repartitionTime=Mon May 10 08:54:19 MST 2021, plannedMigrations=358, completedMigrations=358, remainingMigrations=0, totalCompletedMigrations=3348, elapsedMigrationTime=1948ms, totalElapsedMigrationTime=27796ms)
但是,当对已部署的服务发出请求时,虽然旧的 pod 已从成员组中删除,但事件总线正在使用旧的 pod/服务引用(ac0dcea9-898a-4818-b7e2-e9f8aaefb447
),
[vert.x-eventloop-thread-1] DEBUG io.vertx.core.eventbus.impl.clustered.ConnectionHolder - tx.id=f9f5cfc9-8ad8-4eb1-b12c-322feb0d1acd Not connected to server ac0dcea9-898a-4818-b7e2-e9f8aaefb447 - starting queuing
我查看了滚动部署的官方文档,我的部署似乎遵循文档中提到的两个关键事项,只删除了一个 pod,然后添加了新的。
never start more than one new pod at once
forbid more than one unavailable pod during the process
我正在使用 vertx 4.0.3 和 hazelcast kubernetes 1.2.2。我的 Verticle 类正在扩展 AbstractVerticle 并使用,
Vertx.clusteredVertx(options, vertx -> {
vertx.result().deployVerticle(verticleName, deploymentOptions);
对不起,很长的帖子,非常感谢任何帮助。