我们在我们的应用程序中使用rabbitmq,两个小时前,我们的一个应用服务器在尝试连接rabbitmq时被阻塞,检查rabbitmq服务器后,我们发现一个节点的内存超过了水印,几分钟后,这个节点关闭了。重启这个节点后,整个集群工作正常,但是我注意到有很多连接处于阻塞和阻塞状态,但 rabbitmqctl list_connections pid name peer_address state
在所有节点中使用显示没有连接处于阻塞/阻塞状态……所以这真的让我很困惑:
- 整个集群的一个节点超过水印后,但其他节点工作正常,我的应用程序无法连接到rabbitmq集群?ps:我们使用 spring.amqp 和 spring-rabbit 版本 1.1.0.RELEASE
- 超过水印时节点会因什么原因而关闭?
- 为什么重启节点后,仍然有阻塞连接,但是使用rabbitmqctl它们都处于运行状态?
这是来自我的 rabbitmq 服务器的一些日志:
=INFO REPORT==== 1-Mar-2013::19:36:21 ===
vm_memory_high_watermark clear. Memory used:1656590680 allowed:1658778419
=INFO REPORT==== 1-Mar-2013::19:36:21 ===
alarm_handler: {clear,{resource_limit,memory,rabbit@cos22}}
当我尝试从 Web 管理关闭被阻止的连接时,出现错误:
=INFO REPORT==== 1-Mar-2013::20:55:24 ===
Closing connection <0.17197.115> because "Closed via management plugin"
=ERROR REPORT==== 1-Mar-2013::20:55:24 ===
webmachine error: path="/api/connections/10.64.13.200%3A45891%20-%3E%2010.64.12.226%3A5672"
{throw,
{error,{not_a_connection_pid,<0.17197.115>}},
[{rabbit_networking,close_connection,2,
[{file,"src/rabbit_networking.erl"},{line,317}]},
{rabbit_mgmt_wm_connection,delete_resource,2,
[{file,"rabbitmq-management/src/rabbit_mgmt_wm_connection.erl"},
{line,52}]},
{webmachine_resource,resource_call,3,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
{line,169}]},
{webmachine_resource,do,3,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
{line,128}]},
{webmachine_decision_core,resource_call,1,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,48}]},
{webmachine_decision_core,decision,1,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,416}]},
{webmachine_decision_core,handle_request,2,
[{file,
"webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
{line,33}]},
{rabbit_webmachine,'-makeloop/1-fun-0-',3,
[{file,"rabbitmq-mochiweb/src/rabbit_webmachine.erl"},{line,75}]}]}
使用 rabbitmqctl 显示全部处于运行状态:
rabbitmqctl list_connections pid name peer_address state
Listing connections ...
<rabbit@cos23.1.1271.51> 10.64.13.197:57321 -> 10.64.12.225:5672 10.64.13.197 running
<rabbit@cos23.1.1100.51> 10.64.13.196:57240 -> 10.64.12.225:5672 10.64.13.196 running
<rabbit@cos23.1.1056.51> 10.64.12.196:58608 -> 10.64.12.225:5672 10.64.12.196 running
<rabbit@cos23.1.1079.51> 10.64.11.235:48962 -> 10.64.12.225:5672 10.64.11.235 running
<rabbit@cos23.1.1419.51> 10.64.13.228:49857 -> 10.64.12.225:5672 10.64.13.228 running
<rabbit@cos23.1.1049.51> 10.64.11.193:36387 -> 10.64.12.225:5672 10.64.11.193 running
<rabbit@cos23.1.1159.51> 10.64.10.123:52017 -> 10.64.12.225:5672 10.64.10.123 running
<rabbit@cos23.1.26289.45> 10.64.12.247:38504 -> 10.64.12.225:5672 10.64.12.247 running
<rabbit@cos23.1.1121.51> 10.64.10.29:51483 -> 10.64.12.225:5672 10.64.10.29 running
<rabbit@cos23.1.1067.51> 10.64.11.234:50244 -> 10.64.12.225:5672 10.64.11.234 running
<rabbit@cos23.1.1149.51> 10.64.11.178:33795 -> 10.64.12.225:5672 10.64.11.178 running
<rabbit@cos23.1.1136.51> 10.64.10.28:39557 -> 10.64.12.225:5672 10.64.10.28 running
<rabbit@cos23.1.1370.51> 10.64.13.233:38766 -> 10.64.12.225:5672 10.64.13.233 running
<rabbit@cos23.1.1388.51> 10.64.13.229:50932 -> 10.64.12.225:5672 10.64.13.229 running
<rabbit@cos23.1.1254.51> 10.64.13.241:49311 -> 10.64.12.225:5672 10.64.13.241 running
<rabbit@cos23.1.1031.51> 10.64.11.195:39455 -> 10.64.12.225:5672 10.64.11.195 running
<rabbit@cos23.1.1038.51> 10.64.10.27:58938 -> 10.64.12.225:5672 10.64.10.27 running
<rabbit@cos23.1.1167.51> 10.64.13.240:37777 -> 10.64.12.225:5672 10.64.13.240 running
<rabbit@cos23.1.1442.51> 10.64.10.130:37251 -> 10.64.12.225:5672 10.64.10.130 running
<rabbit@cos22.3.2659.0> 10.64.13.200:54840 -> 10.64.12.226:5672 10.64.13.200 running
...done.
并且有很多通道处于阻塞状态的连接,但我无法使用rabbitctl list_connections找到这个连接:
AMQP 0-9-1
10.64.13.200:45891 -> 10.64.12.226:5672
rabbit@cos22 0B/s
(49.2MB total)
0B/s
(2.4MB total)
0s 60920
非常感谢任何帮助和建议。