4

I'm using the JVM for a scientific application. The first step in my process is to load a lot of data into little double[] arrays (48-element arrays for each node in a large graph). Long before I get to the point where I find out if I have enough memory to load all of them, Java slows down asymptotically, and jvisualvm tells me that this is because nearly all of the CPU time is spent in garbage collection:

enter image description here

The first minute or so is fine: "used heap" (right plot) jumps up and down as it grows because some objects are temporary (I wrote this in Scala) and some objects are permanent. After that, however, the data-loading grinds to a halt because the garbage collector is apparently checking the same objects over and over (left plot). It must be expecting them to go out of scope, but I'm keeping them in scope because I want to use them for my analysis.

I know that the garbage collector puts objects in different generations, based on their likelihood of survival. The first generation contains objects that are recently created and likely to die soon; later generations are progressively more likely to be long-lived. If my objects are wrongly in the first generation, is there any way to tell the garbage collector that they ought to be in a later generation? I know that I'll be keeping them--- how can I tell the garbage collector?

Although I'd like these objects be in a more permanent generation, PermGen would be too far: they will die eventually, after tens of minutes of processing. (I want to use this in a Hadoop reducer, which might work on a different chunk of data after this one without a new JVM.)

Note: I'm using the Sun HotSpot VM:

% java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

Correction (to a previous edit): Changing the -Xmx does change the saturation point, but apparently Java ignores the -Xmx command line argument if it is passed after the -jar argument. That is, do

java -Xmx2048 -jar MyJarFile.jar

rather than

java -jar MyJarFile.jar -Xmx2048

Because of this, I was incorrectly diagnosing the behavior with respect to maximum heap and all the answers pointing to the -Xmx flag are valid.

The saturation point I describe happens when the "heap size" (orange on right plot) reaches the chosen -Xmx limit, and the "heap size" is always about 1.6 times the "used heap" (blue on right plot) unless you explicitly set the size of the "Old" generation with -XX:NewRatio or -XX:OldSize. These also need to be before the -jar argument, and they provide a lot of control.

4

3 回答 3

5

除非您的堆接近饱和条件,否则 GC 不应以螺旋方式调用其自身。您需要增加最大堆大小 (-Xmx) - 从接近 2 倍的预期保留开始。您还可以使用 CMS 收集器,它可以改善使用大型终身集的情况。您可能还需要手动调整您的新一代,因为老一代不需要定期进行扫描。

您也可以考虑使用 NIO 直接ByteBuffers。虽然它们是为更高效的 I/O 操作而设计的,但对于寿命很长的宽内存阵列来说,它们可能是一个合理的选择。

于 2013-09-25T15:50:50.177 回答
1

我认为你应该使用 JVisualVM 的VisualGC插件检查它,这样你就可以看到不同代是如何使用的。从截图来看,老年代似乎被填满了(因为堆还没有完全填满,但是 GC 正在努力工作),所以 GC 很难释放内存。您应该增加堆或使用-XX:NewRatio调整世代的大小,您也可以尝试调整任期阈值以控制何时将对象视为“旧”。

于 2013-09-25T15:55:36.220 回答
0

如果对象仍在被引用,则不会对其进行垃圾收集。因此,只需保留对对象的引用,直到您希望它们被垃圾收集。

于 2013-09-25T15:54:31.140 回答