5

我们的生产系统反复出现明显的死锁,我们似乎无法深入了解这些死锁。我们无法追踪与在线用户数量的任何相关性,而且似乎我们没有用完可用的连接。

我们有一个通过 Hibernate 和 c3p0 连接到 Oracle 的 Java EE 应用程序。我们的 c3p0 配置是:

minPoolSize=10
maxPoolSize=300
initialPoolSize=30
acquireIncrement=10
maxIdleTime=1800
maxStatementsPerConnection=0
numHelperThreads=5

明显的死锁日志输出总是或多或少像这样:

[com.mchange.v2.async.ThreadPoolAsynchronousRunner] (Timer-1) com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@4c9f1b4d -- APPARENT DEADLOCK!!! Complete Status:
    Managed Threads: 5
    Active Threads: 5
    Active Tasks:
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@7fe1ab86 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1)
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@38c42c01 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2)
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@572512c4 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#4)
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@42f32e8e (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0)
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@6b758ef8 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#3)
    Pending Tasks:
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@72fd72e5
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@5d82535d
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@172f2ea1
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@1a9e57eb
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@20ee5a35

实际的池线程堆栈跟踪不同,我在下面添加了一些示例:

Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#4,5,jboss]
            java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
            java.lang.StringBuilder.<init>(StringBuilder.java:80)
            oracle.net.ns.Packet.<init>(Packet.java:513)
            oracle.net.ns.ConnectPacket.<init>(ConnectPacket.java:64)
            oracle.net.ns.NSProtocol.connect(NSProtocol.java:278)
            oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1042)
            oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:301)
            oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:531)
            oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:221)
            oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
            oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:503)
            com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
            com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
            com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)

Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2,5,jboss]
            oracle.jdbc.driver.T4CTTIoauthenticate.processRPA(T4CTTIoauthenticate.java:491)
            oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:295)
            oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
            oracle.jdbc.driver.T4CTTIoauthenticate.doOSESSKEY(T4CTTIoauthenticate.java:390)
            oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:356)
            oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:531)
            oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:221)
            oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
            oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:503)
            com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
            com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
            com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
            com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:1

Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1,5,jboss]
            oracle.net.ns.NSProtocol.connect(NSProtocol.java:346)
            oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1042)
            oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:301)
            oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:531)
            oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:221)
            oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
            oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:503)
            com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
            com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
            com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
            com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
            com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
            com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
            com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
            com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)

关于我们应该在哪里进行调查的任何建议?这是 c3p0,我们的查询,我们的代码,数据库吗?

4

1 回答 1

3

所以,你的助手线程显然已经被 Connection 收购饱和了。这意味着在很长一段时间内多次尝试获取 Connections 既不会成功也不会因异常而失败。这最终是您需要调试的问题。

我要做的第一件事是升级到 c3p0-0.9.2.1,它使用“分散的”获取任务,在获取失败时节省池线程的使用。第二个建议可能是将名为 numHelperThreads 的 c3p0 配置参数增加到可能远高于其默认值 3 甚至您正在使用的 5 的值。看起来您的线程实际上被占用了连接获取的各个阶段,因此增加可以获取连接的“通道”的数量可能会有所帮助。[但请参阅下面的后记!]

但是,最终问题将归结为为什么连接获取完成如此缓慢(要声明一个明显的死锁,在大约 10 秒的时间内没有任何获取成功)。这可能是数据库或网络问题。

但请尝试升级到 0.9.2 并增加 numHelperThreads。[numHelperThreads 的默认值可能已经过时;在多核机器时代,专门用于 IO 绑定任务的线程池可能应该是核心数量的 s 倍] 这些调整可能会很好地解决问题,或者导致更好地了解下一步该往哪里看。

祝你好运!

ps 我猜您显示的线程堆栈跟踪实际上并不是从构建期间到明显死锁的堆栈跟踪。很难计算这些时间,因为在 c3p0 宣布之前,您不知道表观死锁即将到来。我敢打赌,在死锁中,这些堆栈跟踪彼此更加相似,并且正在等待某种 IO。

于 2013-04-17T08:38:12.393 回答