1

我们有一个用 scala 2.12.x 和 Play framework 2.5.x 编写的 API。API 使用 MariaDb 连接器/J 2.5.4 连接到 AWS aurora 集群jdbc:mysql:aurora://some-aurora-cluster

功能上一切正常,除了我们注意到 CPU 使用率很高,即使没有流量。一些研究表明:

[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ top -H

    …
 6373 root      20   0 4644452 990888  21540 S 14.6 12.6   1:15.68 MariaDb-failove
 6374 root      20   0 4644452 990888  21540 S 13.6 12.6   1:16.11 MariaDb-failove
 6305 root      20   0 4644452 990888  21540 S 13.3 12.6   1:14.31 MariaDb-failove
 6375 root      20   0 4644452 990888  21540 S 12.3 12.6   1:14.59 MariaDb-failove
 6372 root      20   0 4644452 990888  21540 S 11.3 12.6   1:15.78 MariaDb-failove
    …

上面的 cmd 显示了一些MariaDb-failover。我不确定它的作用以及为什么有多个忙于高 CPU 使用率。

[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ netstat -a | less

…
tcp6       0      0 ip-xxx-xxx-xxx-31.:37446 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6       0      0 ip-xxx-xxx-xxx-31.:37108 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6       0      0 ip-xxx-xxx-xxx-31.:37648 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6       0      0 ip-xxx-xxx-xxx-31.:36934 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6       0      0 ip-xxx-xxx-xxx-31.:36870 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6       0      0 ip-xxx-xxx-xxx-31.:37254 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
tcp6       0      0 ip-xxx-xxx-xxx-31.:37902 ip-xxx-xxx-xxx-129:mysql TIME_WAIT
…

有很多TIME_WAIT。这也很奇怪,因为在我执行这个 cmd 时没有流量。

[ec2-user@ip-xxx-xxx-xxx-xxx ~]$ netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n

      1 established)
      1 Foreign
      1 SYN_SENT
      6 LISTEN
     37 ESTABLISHED
    851 TIME_WAIT

有数百个TIME_WAIT;每次执行 cmd 时,数字都会发生变化。

有没有人知道这是正常的还是我需要担心的事情?

如果您还有其他问题,请告诉我。

========== 更多信息

ps -aux | grep java

得到PID:2655

jstack 2655 > threaddump.log

这是内容(修剪):

2020-06-16 16:44:39
Full thread dump OpenJDK 64-Bit Server VM (25.252-b09 mixed mode):

"MariaDb-failover-5" #276 daemon prio=5 os_prio=0 tid=0x00007f4b0400b000 nid=0x18e7 waiting on condition [0x00007f4b0c6d6000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
    at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
    at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
    at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"MariaDb-failover-4" #275 daemon prio=5 os_prio=0 tid=0x00007f4b0400a000 nid=0x18e6 waiting on condition [0x00007f4afbefd000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
    at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
    at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
    at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"MariaDb-failover-3" #274 daemon prio=5 os_prio=0 tid=0x00007f4b04009000 nid=0x18e5 waiting on condition [0x00007f4afbbfa000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
    at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
    at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
    at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"MariaDb-failover-2" #273 daemon prio=5 os_prio=0 tid=0x00007f4b04008000 nid=0x18e4 waiting on condition [0x00007f4b0c5d5000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
    at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
    at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
    at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"MariaDb-failover-1" #265 daemon prio=5 os_prio=0 tid=0x00007f4b2407e800 nid=0x18a1 waiting on condition [0x00007f4b0c2d4000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:269)
    at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:203)
    at org.mariadb.jdbc.internal.failover.thread.FailoverLoop.doRun(FailoverLoop.java:84)
    at org.mariadb.jdbc.internal.failover.thread.TerminableRunnable.run(TerminableRunnable.java:80)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"VM Thread" os_prio=0 tid=0x00007f4b540db800 nid=0xf37 runnable 

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f4b54069000 nid=0xf35 runnable 

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f4b5406a800 nid=0xf36 runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f4b54132800 nid=0xf3e waiting on condition 

JNI global references: 2520

十六进制 LWP 的 PID:

6373 18e5
6374 18e6
6305 18a1
6375 18e7
6372 18e4

========== 更多信息

我们的 API 有 5 种不同的数据库配置 - 5 种不同的数据库。每个都有一个连接字符串jdbc:mysql:aurora://some-aurora-cluster

请注意,该aurora模式用于更好地体验故障转移,因此它解释了 5 个轻量级进程。但它们很健谈,导致TIME_WAITCPU 使用率升高的许多原因。

以前有没有人遇到过这种情况,你是如何缓解的?我仍然想使用aurora模式(或等效的东西),这样我们就不必在数据库故障转移时重新启动应用程序。

4

1 回答 1

0

经过几天的搜索和研究,我终于深入研究了 MariaDb 驱动程序代码,特别是在线程转储显示的区域,并一直跟踪代码堆栈。

https://github.com/mariadb-corporation/mariadb-connector-j/blob/master/src/main/java/org/mariadb/jdbc/internal/protocol/AuroraProtocol.java

我找到了对retriesAllDown默认值为 120 的设置的引用。进一步阅读 MariaDb 驱动程序知识库页面,我还发现了另一个failoverLoopRetries默认值为 120 的设置。

您可以在此处阅读有关 MariaDb 驱动程序设置的更多信息:https://github.com/mariadb-corporation/mariadb-connector-j/blob/3bc66153b51aca188afc50ff35a0123f16c099ed/src/main/java/org/mariadb/jdbc/util/DefaultOptions。爪哇

对于我们的团队和 API,我们对 12 的值(默认值的 10%)感到满意,并决定对这两个设置都使用该值,所以这里是修改后的连接字符串:

jdbc:mysql:aurora://some-aurora-cluster?retriesAllDown=12&failoverLoopRetries=12

这显着降低了 CPU 使用率,并且仍然保持了我们需要的故障转移功能。

希望这个答案可以帮助某人。在它帮助至少 10 个人之前,我不会将其标记为我最初问题的答案。

于 2020-06-18T21:14:28.533 回答