4

我们在使用 RDS Mysql 的 Amazon 托管 Play 应用程序时遇到问题。

有时应用程序最终没有响应。今天我用 jstack -F 检查了线程状态,有多个线程卡在 c3p0 连接的本机代码中:

Thread 14060: (state = IN_NATIVE)
 - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
 - java.net.SocketInputStream.read(byte[], int, int, int) @bci=79, line=150 (Compiled frame)
 - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=121 (Compiled frame)
 - com.mysql.jdbc.util.ReadAheadInputStream.fill(int) @bci=262, line=114 (Compiled frame)
 - com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(byte[], int, int) @bci=176, line=161 (Compiled frame)
 - com.mysql.jdbc.util.ReadAheadInputStream.read(byte[], int, int) @bci=48, line=189 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.readFully(java.io.InputStream, byte[], int, int) @bci=34, line=3036 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.reuseAndReadPacket(com.mysql.jdbc.Buffer, int) @bci=23, line=3489 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.reuseAndReadPacket(com.mysql.jdbc.Buffer) @bci=3, line=3478 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.checkErrorPacket(int) @bci=12, line=4019 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.sendCommand(int, java.lang.String, com.mysql.jdbc.Buffer, boolean, java.lang.String, int) @bci=428, line=2490 (Compiled frame)
 - com.mysql.jdbc.MysqlIO.sqlQueryDirect(com.mysql.jdbc.StatementImpl, java.lang.String, java.lang.String, com.mysql.jdbc.Buffer, int, int, int, boolean, java.lang.String, com.mysql.jdbc.Field[]) @bci=559, line=2651 (Compiled frame)
 - com.mysql.jdbc.ConnectionImpl.execSQL(com.mysql.jdbc.StatementImpl, java.lang.String, int, com.mysql.jdbc.Buffer, int, int, boolean, java.lang.String, com.mysql.jdbc.Field[], boolean) @bci=130, line=2677 (Compiled frame)
 - com.mysql.jdbc.ConnectionImpl.execSQL(com.mysql.jdbc.StatementImpl, java.lang.String, int, com.mysql.jdbc.Buffer, int, int, boolean, java.lang.String, com.mysql.jdbc.Field[]) @bci=17, line=2627 (Compiled frame)
 - com.mysql.jdbc.StatementImpl.executeQuery(java.lang.String) @bci=534, line=1556 (Compiled frame)
 - com.mysql.jdbc.DatabaseMetaData$9.forEach(java.lang.Object) @bci=174, line=5013 (Compiled frame)
 - com.mysql.jdbc.IterateBlock.doForAll() @bci=25, line=51 (Compiled frame)
 - com.mysql.jdbc.DatabaseMetaData.getTables(java.lang.String, java.lang.String, java.lang.String, java.lang.String[]) @bci=280, line=4991 (Compiled frame)
 - com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnectionNoQuery(java.sql.Connection, java.lang.Throwable[]) @bci=21, line=185 (Compiled frame)
 - com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(java.sql.Connection, java.lang.String, java.lang.Throwable[]) @bci=7, line=62 (Compiled frame)
 - com.mchange.v2.c3p0.AbstractConnectionTester.activeCheckConnection(java.sql.Connection) @bci=4, line=67 (Compiled frame)
 - com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(java.lang.Object) @bci=122, line=368 (Compiled frame)
 - com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(java.lang.Object) @bci=26, line=310 (Compiled frame)
 - com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run() @bci=35, line=1999 (Interpreted frame)
 - com.mchange.v2.async.ThreadPerTaskAsynchronousRunner$TaskThread.run() @bci=41, line=255 (Interpreted frame)

如果没有设置其他连接测试,DefaultConnectionTester.activeCheckConnectionNoQuery 似乎会检查 DatabaseMetaData.getTables。

C3p0 手册有关于配置正确连接测试的说明,这个答案有关于使用 Play 进行 c3p0 设置的说明!但我想知道:

  • 有没有办法检查这是什么问题?
    • 我之前遇到的标准“Mysql 连接在 8 小时后被终止”结果异常
  • 似乎有一些使用 c3p0/Play/Hibernate 的默认连接测试,因为这已经完成了?
    • 我有一种预感,默认情况下 c3p0 不进行连接测试
      • 默认值 idleConnectionTestPeriod = 0,testConnectionOnCheckin = false,testConnectionOnCheckout = false
  • 有没有办法为连接测试查询设置超时?
    • 例如,即使我配置了一些明智的preferredTestQuery,该查询是否也会因为当前的getTables() 调用而挂起?
  • 还有什么我应该考虑的吗?

使用自动测试表更新(不是解决方案)

我们将 c3p0.automaticTestTable 设置为进行更合理的连接测试,但这似乎并没有解决问题,只是稍微更改了线程的堆栈跟踪。

在类似的情况下,我们现在有如下线程堆栈:

"Task-Thread-for-com.mchange.v2.async.ThreadPerTaskAsynchronousRunner@6064cd08" daemon prio=10 tid=0x00007f53d403a000 nid=0x63dd runnable [0x00007f53dce4a000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
        at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
        at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
        - locked <0x00000000dbecd0d8> (a com.mysql.jdbc.util.ReadAheadInputStream)
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3036)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3489)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3478)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4019)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2490)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2651)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2677)
        - locked <0x00000000dbec1a70> (a com.mysql.jdbc.JDBC4Connection)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2627)
        at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1556)
        - locked <0x00000000dbec1a70> (a com.mysql.jdbc.JDBC4Connection)
        - locked <0x00000000d1330078> (a com.mysql.jdbc.StatementImpl)
        at com.mchange.v2.c3p0.impl.DefaultConnectionTester.activeCheckConnection(DefaultConnectionTester.java:73)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:374)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishIdleResource(C3P0PooledConnectionPool.java:310)
        at com.mchange.v2.resourcepool.BasicResourcePool$AsyncTestIdleResourceTask.run(BasicResourcePool.java:1999)
        at com.mchange.v2.async.ThreadPerTaskAsynchronousRunner$TaskThread.run(ThreadPerTaskAsynchronousRunner.java:255)

播放线程正在等待获取数据库连接:

"play-thread-3" prio=10 tid=0x00007f53c400b000 nid=0x5b3e in Object.wait() [0x00007f53ddd57000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:579)
        - locked <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
        ... (näitä vaihtelevat määrät)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
        - locked <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:584)
        - locked <0x00000000db9f38f8> (a com.mchange.v2.resourcepool.BasicResourcePool)
        at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:477)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:525)
        at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:128)
        at org.hibernate.ejb.connection.InjectedDataSourceConnectionProvider.getConnection(InjectedDataSourceConnectionProvider.java:71)
        at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:446)
        at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:167)
        at org.hibernate.jdbc.JDBCContext.connection(JDBCContext.java:160)
        at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:81)
        at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1473)
        at org.hibernate.ejb.TransactionImpl.begin(TransactionImpl.java:60)
        at play.db.jpa.JPAPlugin.startTx(JPAPlugin.java:377)
        at play.db.jpa.JPAPlugin.beforeInvocation(JPAPlugin.java:345)
        at play.plugins.PluginCollection.beforeInvocation(PluginCollection.java:473)
        at play.Invoker$Invocation.before(Invoker.java:217)
        at play.Invoker$Invocation.run(Invoker.java:277)
        at play.server.PlayHandler$NettyInvocation.run(PlayHandler.java:229)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

一些注意事项:

  • 有可能(至少在理论上)许多连接恰好在读取堆栈转储的同时被测试,即使我发现这不太可能。
  • 另一个问题中提到数据死锁或硬件/网络问题可能会导致这种行为。
    • 至少不应该有任何正在进行的模式修改。

兄弟,图科


回答我自己:

  • 找不到连接挂起的原因:(
    • RDS 有 MySQL 的 error.log,不幸的是那里什么都没有
  • Play 有一些 c3p0 ComboPooledDataSource 的默认设置(在DBPlugin.onApplicationStart()中)
    • idleTimeExcessConnections :来自 application.conf db.pool.maxIdleTimeExcessConnections(默认 0)
    • checkoutTimeout :来自 application.conf db.pool.timeout(默认 5000)
    • 空闲连接测试周期:10
    • setTestConnectionOnCheckin : 真
    • 另外,conf/c3p0.properties 可以用来设置自定义属性
  • 尚未找到为连接测试设置超时的方法(可能不支持简短的DefaultConnectionTester源代码查找)

我们将继续设置 c3p0.automaticTestTable 并检查以后是否会出现任何问题。

更新:不幸的是我们还没有解决问题:(同样使用自动测试表,连接测试似乎卡住了 java.net.SocketInputStream.socketRead0(Native Method)


再次对自己回答:

我们最终使用 c3p0:s maxAdministrativeTaskTime尝试了这种“为连接测试查询设置超时”方法。这有点小技巧,所以必须看看这是否可以成为最终解决方案(如果可行)。

到目前为止还没有问题,我们将看看它是如何工作的。日志中也没有 c3p0 错误/堆栈跟踪,我希望此中断会显示在日志中吗?要么超时尚未发生,要么我们应该调整日志配置以显示 c3p0 日志。

更新(2013-04-04):在此之后问题(可能)没有经常发生,但时不时发生:(

4

1 回答 1

3

再次对自己回答:

我们首先使用为 MySQL 驱动程序指定的 socketTimeout,以便连接测试不会卡住。

似乎底层的 TCP 连接结束了,因此服务器没有回答(TCP ACK 或任何东西)。

更新:最终的解决方案是为 AWS 提供 tcpdump 支持,他们在 AWS 基础设施中发现了一些案例。他们建议我们在 EC2 实例的安全组中打开临时端口范围。AWS 论坛上的更多讨论。

于 2013-04-29T10:29:37.297 回答