2

我们在我们的应用程序中使用rabbitmq,两个小时前,我们的一个应用服务器在尝试连接rabbitmq时被阻塞,检查rabbitmq服务器后,我们发现一个节点的内存超过了水印,几分钟后,这个节点关闭了。重启这个节点后,整个集群工作正常,但是我注意到有很多连接处于阻塞和阻塞状态,但 rabbitmqctl list_connections pid name peer_address state在所有节点中使用显示没有连接处于阻塞/阻塞状态……所以这真的让我很困惑:

  1. 整个集群的一个节点超过水印后,但其他节点工作正常,我的应用程序无法连接到rabbitmq集群?ps:我们使用 spring.amqp 和 spring-rabbit 版本 1.1.0.RELEASE
  2. 超过水印时节点会因什么原因而关闭?
  3. 为什么重启节点后,仍然有阻塞连接,但是使用rabbitmqctl它们都处于运行状态?

这是来自我的 rabbitmq 服务器的一些日志:

=INFO REPORT==== 1-Mar-2013::19:36:21 ===
vm_memory_high_watermark clear. Memory used:1656590680 allowed:1658778419

=INFO REPORT==== 1-Mar-2013::19:36:21 ===
alarm_handler: {clear,{resource_limit,memory,rabbit@cos22}}

当我尝试从 Web 管理关闭被阻止的连接时,出现错误:

=INFO REPORT==== 1-Mar-2013::20:55:24 ===
Closing connection <0.17197.115> because "Closed via management plugin"

=ERROR REPORT==== 1-Mar-2013::20:55:24 ===
webmachine error: path="/api/connections/10.64.13.200%3A45891%20-%3E%2010.64.12.226%3A5672"
{throw,
{error,{not_a_connection_pid,<0.17197.115>}},
[{rabbit_networking,close_connection,2,
     [{file,"src/rabbit_networking.erl"},{line,317}]},
 {rabbit_mgmt_wm_connection,delete_resource,2,
     [{file,"rabbitmq-management/src/rabbit_mgmt_wm_connection.erl"},
      {line,52}]},
 {webmachine_resource,resource_call,3,
     [{file,
          "webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
      {line,169}]},
 {webmachine_resource,do,3,
     [{file,
          "webmachine-wrapper/webmachine-git/src/webmachine_resource.erl"},
      {line,128}]},
 {webmachine_decision_core,resource_call,1,
     [{file,
          "webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
      {line,48}]},
 {webmachine_decision_core,decision,1,
     [{file,
          "webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
      {line,416}]},
 {webmachine_decision_core,handle_request,2,
     [{file,
          "webmachine-wrapper/webmachine-git/src/webmachine_decision_core.erl"},
      {line,33}]},
 {rabbit_webmachine,'-makeloop/1-fun-0-',3,
     [{file,"rabbitmq-mochiweb/src/rabbit_webmachine.erl"},{line,75}]}]}

使用 rabbitmqctl 显示全部处于运行状态:

rabbitmqctl list_connections pid name peer_address state
Listing connections ...
<rabbit@cos23.1.1271.51>        10.64.13.197:57321 -> 10.64.12.225:5672 10.64.13.197    running
<rabbit@cos23.1.1100.51>        10.64.13.196:57240 -> 10.64.12.225:5672 10.64.13.196    running
<rabbit@cos23.1.1056.51>        10.64.12.196:58608 -> 10.64.12.225:5672 10.64.12.196    running
<rabbit@cos23.1.1079.51>        10.64.11.235:48962 -> 10.64.12.225:5672 10.64.11.235    running
<rabbit@cos23.1.1419.51>        10.64.13.228:49857 -> 10.64.12.225:5672 10.64.13.228    running
<rabbit@cos23.1.1049.51>        10.64.11.193:36387 -> 10.64.12.225:5672 10.64.11.193    running
<rabbit@cos23.1.1159.51>        10.64.10.123:52017 -> 10.64.12.225:5672 10.64.10.123    running
<rabbit@cos23.1.26289.45>       10.64.12.247:38504 -> 10.64.12.225:5672 10.64.12.247    running
<rabbit@cos23.1.1121.51>        10.64.10.29:51483 -> 10.64.12.225:5672  10.64.10.29     running
<rabbit@cos23.1.1067.51>        10.64.11.234:50244 -> 10.64.12.225:5672 10.64.11.234    running
<rabbit@cos23.1.1149.51>        10.64.11.178:33795 -> 10.64.12.225:5672 10.64.11.178    running
<rabbit@cos23.1.1136.51>        10.64.10.28:39557 -> 10.64.12.225:5672  10.64.10.28     running
<rabbit@cos23.1.1370.51>        10.64.13.233:38766 -> 10.64.12.225:5672 10.64.13.233    running
<rabbit@cos23.1.1388.51>        10.64.13.229:50932 -> 10.64.12.225:5672 10.64.13.229    running
<rabbit@cos23.1.1254.51>        10.64.13.241:49311 -> 10.64.12.225:5672 10.64.13.241    running
<rabbit@cos23.1.1031.51>        10.64.11.195:39455 -> 10.64.12.225:5672 10.64.11.195    running
<rabbit@cos23.1.1038.51>        10.64.10.27:58938 -> 10.64.12.225:5672  10.64.10.27     running
<rabbit@cos23.1.1167.51>        10.64.13.240:37777 -> 10.64.12.225:5672 10.64.13.240    running
<rabbit@cos23.1.1442.51>        10.64.10.130:37251 -> 10.64.12.225:5672 10.64.10.130    running
<rabbit@cos22.3.2659.0> 10.64.13.200:54840 -> 10.64.12.226:5672 10.64.13.200    running
...done.

并且有很多通道处于阻塞状态的连接,但我无法使用rabbitctl list_connections找到这个连接:

AMQP 0-9-1  
10.64.13.200:45891 -> 10.64.12.226:5672
rabbit@cos22    0B/s
(49.2MB total)
0B/s
(2.4MB total)
0s  60920

非常感谢任何帮助和建议。

4

1 回答 1

2

从 rabbitmq 邮件列表中得到了答案:

这些连接/通道不存在。您在管理插件中看到了一个错误,它将保留有关在集群节点崩溃时处于活动状态的连接和通道的信息。

此错误已在 RabbitMQ 3.0.3 中修复。

于 2013-04-17T09:21:12.253 回答