6

我们在 AWS 节点上运行 SonarQube 5.1.2。在使用短时间(通常是一两天)后,Sonar Web 服务器变得无响应并导致服务器的 CPU 达到峰值:

top - 01:59:47 up 2 days,  3:43,  1 user,  load average: 1.89, 1.76, 1.11
Tasks:  93 total,   1 running,  92 sleeping,   0 stopped,   0 zombie
Cpu(s): 94.5%us,  0.0%sy,  0.0%ni,  5.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   7514056k total,  2828772k used,  4685284k free,   155372k buffers
Swap:        0k total,        0k used,        0k free,   872440k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                           
 2328 root      20   0 3260m 1.1g  19m S 188.3 15.5  62:51.79 java                                                                                                                                                                                                              
   11 root      20   0     0    0    0 S  0.3  0.0   0:07.90 events/0                                                                                                                                                                                                           
 2284 root      20   0 3426m 407m  19m S  0.3  5.5   9:51.04 java                                                                                                                                                                                                               
    1 root      20   0 19356 1536 1224 S  0.0  0.0   0:00.23 init 

188% 的 CPU 负载来自 WebServer 进程:

$ ps -eF|grep "root *2328"
root      2328  2262  2 834562 1162384 0 Mar01 ?       01:06:24 /usr/java/jre1.8.0_25/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.management.enabled=false -Djruby.compile.invokedynamic=false -Xmx768m -XX:MaxPermSize=160m -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/opt/sonar/temp -cp ./lib/common/*:./lib/server/*:/opt/sonar/lib/jdbc/mysql/mysql-connector-java-5.1.34.jar org.sonar.server.app.WebServer /tmp/sq-process615754070383971531properties

我们最初认为我们在一个太小的节点上运行,最近升级到一个 m3-large 实例,但我们看到了同样的问题(除了现在它使用 2 个 CPU 而不是一个)。

日志中唯一有趣的信息是:

2016.03.04 01:52:38 WARN  web[o.e.transport] [sonar-1456875684135] Received response for a request that has timed out, sent [39974ms] ago, timed out [25635ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]]], id [43817]
2016.03.04 01:53:19 INFO  web[o.e.client.transport] [sonar-1456875684135] failed to get node info for [#transport#-1][xxxxxxxx-build02-us-west-2b][inet[/127.0.0.1:9001]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[/127.0.0.1:9001]][cluster:monitor/nodes/info] request_id [43817] timed out after [14339ms]
    at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366) ~[elasticsearch-1.4.4.jar:na]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_25]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_25]
    at java.lang.Thread.run(Unknown Source) [na:1.8.0_25]

有谁知道这里可能发生了什么或有一些想法如何进一步诊断这个问题?

4

0 回答 0