10

我想我发现了一种情况,即 log4j a) 直接和 b) 通过 commons-logging 混合使用会导致某种类加载死锁。我不确定这种情况是否可能发生(JVM 不应该检测到这种情况吗?)以及如何处理它。

问题

在我们的构建系统中,我们目前正在按顺序运行我们的单元测试——为了使构建更快,我们显然可以改变它来并行运行我们的单元测试。但是,如果我们这样做,一些构建会遇到执行超时。在分析这种“挂起构建”的线程转储时,我们发现自己在不同的模块中大部分时间都涉及不同的测试。但它总是归结为尝试初始化 Logger 的两个线程:一个使用Logger.getLogger(直接使用 log4j),另一个使用LogFactory.getLog(使用 commons-logging)。

我的分析

所以我们有一个线程(直接使用 log4j 的那个)在这个地方等待:

"pool-1-thread-3" prio=10 tid=0x00007f6528ca6000 nid=0x6f8f in Object.wait() [0x00007f64d9ca6000]
  java.lang.Thread.State: RUNNABLE
at org.apache.log4j.LogManager.<clinit>(LogManager.java:82)
at org.apache.log4j.Logger.getLogger(Logger.java:117)
at de.is24.platform.contacts.domain.PhoneNumberFormat.<clinit>(PhoneNumberFormat.java:21)

不幸的是,这是一条相当“拥挤”的路线:

Hierarchy h = new Hierarchy(new RootLogger((Level) Level.DEBUG));

另一个线程(使用 commons-logging)在这里等待:

"pool-1-thread-2" prio=10 tid=0x00007f6528bf9800 nid=0x6f8e in Object.wait() [0x00007f64d9da7000]
  java.lang.Thread.State: RUNNABLE
at org.apache.log4j.Priority.<clinit>(Priority.java:45)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.commons.logging.impl.Log4JLogger.class$(Log4JLogger.java:37)
at org.apache.commons.logging.impl.Log4JLogger.<clinit>(Log4JLogger.java:45)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:209)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:351)

这很简单:

final static public Priority FATAL = new Level(FATAL_INT, "FATAL", 0);

所以对我来说,似乎第二个线程正在初始化类Priority并等待加载Level类。第一个线程可能会尝试加载Level类(该行中的其他内容似乎无关),并且随着Level类 extends Priority,等待Priority类被加载。
我们有我们的僵局。

那么,你能告诉我这个分析是否正确吗?还是我错过了什么?

更新我

我写了一些测试用例,你可以在这里找到它们:https ://github.com/sebastiankirsch/classloading

有几个测试用例证明了这个问题:

  1. TestLoadingByClassForName应该很快导致死锁(在我的机器上每运行三次)
  2. TestLoadingMixed表示问题的一个版本被剥离到观察到的堆栈跟踪的最小值;这个失败的频率要低得多(大约是 4 倍)
  3. TestMixedLoggerInstantiation模仿行为:一个类使用 log4j 实例化记录器,另一个使用 commons-logging。在这里很难发现死锁,因为涉及更多代码 - 我需要添加一个随机睡眠,这当然需要适应您的机器

TestMixedLoggerInstantiation这是挂起的测试用例的堆栈跟踪。

Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode):

"UseLog4JLogger" prio=10 tid=0x00007fa1f017d800 nid=0x7bd8 in Object.wait() [0x00007fa1e5ba4000]
   java.lang.Thread.State: RUNNABLE
    at org.apache.log4j.LogManager.<clinit>(LogManager.java:82)
    at org.apache.log4j.Logger.getLogger(Logger.java:117)
    at net.tcc.classloading.UseLog4JLogger.run(UseLog4JLogger.java:23)

"UseCommonsLoggingLogFactory" prio=10 tid=0x00007fa1f00e5000 nid=0x7bd7 in Object.wait() [0x00007fa1e5ca4000]
   java.lang.Thread.State: RUNNABLE
    at org.apache.log4j.Priority.<clinit>(Priority.java:45)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)
    at org.apache.commons.logging.impl.Log4JLogger.class$(Log4JLogger.java:37)
    at org.apache.commons.logging.impl.Log4JLogger.<clinit>(Log4JLogger.java:45)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
    at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
    at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:209)
    at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:351)
    at net.tcc.classloading.UseCommonsLoggingLogFactory.run(UseCommonsLoggingLogFactory.java:13)

"ReaderThread" prio=10 tid=0x00007fa1f00ed000 nid=0x7bd6 runnable [0x00007fa1e5da6000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:129)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
    - locked <0x00000007d7088a00> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:167)
    at java.io.BufferedReader.fill(BufferedReader.java:136)
    at java.io.BufferedReader.readLine(BufferedReader.java:299)
    - locked <0x00000007d7088a00> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(BufferedReader.java:362)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner$ReaderThread.run(RemoteTestRunner.java:140)

"Low Memory Detector" daemon prio=10 tid=0x00007fa1f009d800 nid=0x7bd4 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007fa1f009b800 nid=0x7bd3 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007fa1f0098800 nid=0x7bd2 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007fa1f0096800 nid=0x7bd1 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007fa1f007a000 nid=0x7bd0 in Object.wait() [0x00007fa1e6c54000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d7001300> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    - locked <0x00000007d7001300> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007fa1f0078000 nid=0x7bcf in Object.wait() [0x00007fa1e6d55000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d70011d8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0x00000007d70011d8> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007fa1f000c000 nid=0x7bc5 in Object.wait() [0x00007fa1f50b0000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d730dfd8> (a net.tcc.classloading.UseCommonsLoggingLogFactory)
    at java.lang.Thread.join(Thread.java:1186)
    - locked <0x00000007d730dfd8> (a net.tcc.classloading.UseCommonsLoggingLogFactory)
    at java.lang.Thread.join(Thread.java:1239)
    at net.tcc.classloading.TestMixedLoggerInstantiation.test(TestMixedLoggerInstantiation.java:21)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

"VM Thread" prio=10 tid=0x00007fa1f0071800 nid=0x7bce runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fa1f001f000 nid=0x7bc6 runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fa1f0021000 nid=0x7bc7 runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fa1f0022800 nid=0x7bc8 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fa1f0024800 nid=0x7bc9 runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fa1f0026800 nid=0x7bca runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fa1f0028000 nid=0x7bcb runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fa1f002a000 nid=0x7bcc runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fa1f002c000 nid=0x7bcd runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007fa1f00a8800 nid=0x7bd5 waiting on condition 

JNI global references: 1168

重现死锁

从https://github.com/sebastiankirsch/classloading下载代码。
TestLoadingByClassForName应该很容易为您造成死锁(只需运行几次),这是您的系统/JVM 最终会在复杂场景中陷入死锁的先决条件。

现在运行TestMixedLoggerInstantiation几次。注意平均运行时间,打开UseLog4JLogger并将睡眠定时器总和设置为略小于平均运行时间。这最终会导致死锁,但这种情况很少发生。

因此,像这样在命令行上运行它:

for (( ; ; )) do
  testExectution
done

无需testExecution手动将它们放在一起,只需在测试中设置断点,调试,键入ps -ef | grep TestMixedLoggerInstantiationshell,然后复制 IDE 使用的命令。

4

2 回答 2

7

我终于在Java语言规范中找到了答案,特别是在第12.4.2章详细初始化过程中。
那里说

[...]
2) 如果 C 的 Class 对象指示其他线程正在对 C 进行初始化,则 [...]阻塞当前线程,直到通知正在进行的初始化已完成,[.. .]
7) 接下来,如果 C 是类而不是接口,并且它的超类 SC 尚未初始化,则递归地执行 SC 的整个过程
10) 如果初始化程序的执行正常完成,[...]将 C 的 Class 对象标记为完全初始化,通知所有等待线程,[...]

所以观察到的行为完全符合 JLS 的规定。我仍然有点困惑,没有办法检测到这样的死锁。以及为什么线程被标记为 RUNNABLE - 但我想这不是你要执行的典型字节码,所以谁知道......

如何解决问题

该问题的解决方案是摆脱commons-logging. 正如@Robert Johnson指出的那样,这可以通过使用来轻松org.slf4j:jcl-over-slf4j完成。我还检查了 SLF 代码:它没有“利用”不幸的 log4j 设计。

于 2013-03-26T08:33:50.617 回答
1

你的分析是正确的。
您可以尝试在不同的类加载器中运行并发测试,请参阅此处有关如何执行此操作的讨论。Surefire中有一个未解决的错误, JUnit 组中对此问题进行了讨论。
作为一种解决方法,您可以使用org.apache.myfaces.test.runners.TestPerClassLoaderRunner,如上面链接中所述。

于 2013-03-24T08:05:35.403 回答