我们有一个用 scala 2.12.x 和 Play framework 2.5.x 编写的 API。API 使用 MariaDb 连接器/J 2.5.4 连接到 AWS aurora 集群jdbc:mysql:aurora://some-aurora-cluster
功能上一切正常,除了我们注意到 CPU 使用率很高,即使没有流量。一些研究表明:
[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ top -H
…
6373 root 20 0 4644452 990888 21540 S 14.6 12.6 1:15.68 MariaDb-failove
6374 root 20 0 4644452 990888 21540 S 13.6 12.6 1:16.11 MariaDb-failove
6305 root 20 0 4644452 990888 21540 S 13.3 12.6 1:14.31 MariaDb-failove
6375 root 20 0 4644452 990888 21540 S 12.3 12.6 1:14.59 MariaDb-failove
6372 root 20 0 4644452 990888 21540 S 11.3 12.6 1:15.78 MariaDb-failove
…
上面的 cmd 显示了一些MariaDb-failover。我不确定它的作用以及为什么有多个忙于高 CPU 使用率。
[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ netstat -a | less
…
tcp6 0 0 ip-xxx-xxx-xxx-31.:37446 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37108 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37648 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:36934 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:36870 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37254 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6 0 0 ip-xxx-xxx-xxx-31.:37902 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
…
有很多TIME_WAIT。这也很奇怪,因为在我执行这个 cmd 时没有流量。
[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n
1 established)
1 Foreign
1 SYN_SENT
6 LISTEN
37 ESTABLISHED
851 TIME_WAIT
有数百个TIME_WAIT;每次执行 cmd 时,数字都会发生变化。
有没有人知道这是正常的还是我需要担心的事情?
如果您还有其他问题,请告诉我。
========== 更多信息
ps -aux | grep java
得到PID:2655
jstack 2655 > threaddump.log
这是内容(修剪):
2020-06-16 16:44:39
Full thread dump OpenJDK 64-Bit Server VM (25.252-b09 mixed mode):
"MariaDb-failover-5" #276 daemon prio=5 os_prio=0 tid=0x00007f4b0400b000 nid=0x18e7 waiting on condition [0x00007f4b0c6d6000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-4" #275 daemon prio=5 os_prio=0 tid=0x00007f4b0400a000 nid=0x18e6 waiting on condition [0x00007f4afbefd000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-3" #274 daemon prio=5 os_prio=0 tid=0x00007f4b04009000 nid=0x18e5 waiting on condition [0x00007f4afbbfa000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-2" #273 daemon prio=5 os_prio=0 tid=0x00007f4b04008000 nid=0x18e4 waiting on condition [0x00007f4b0c5d5000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"MariaDb-failover-1" #265 daemon prio=5 os_prio=0 tid=0x00007f4b2407e800 nid=0x18a1 waiting on condition [0x00007f4b0c2d4000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"VM Thread" os_prio=0 tid=0x00007f4b540db800 nid=0xf37 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f4b54069000 nid=0xf35 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f4b5406a800 nid=0xf36 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f4b54132800 nid=0xf3e waiting on condition
JNI global references: 2520
十六进制 LWP 的 PID:
6373 18e5
6374 18e6
6305 18a1
6375 18e7
6372 18e4
========== 更多信息
我们的 API 有 5 种不同的数据库配置 - 5 种不同的数据库。每个都有一个连接字符串jdbc:mysql:aurora://some-aurora-cluster
请注意,该aurora
模式用于更好地体验故障转移,因此它解释了 5 个轻量级进程。但它们很健谈,导致TIME_WAIT
CPU 使用率升高的许多原因。
以前有没有人遇到过这种情况,你是如何缓解的?我仍然想使用aurora
模式(或等效的东西),这样我们就不必在数据库故障转移时重新启动应用程序。