I'm using the JVM for a scientific application. The first step in my process is to load a lot of data into little double[]
arrays (48-element arrays for each node in a large graph). Long before I get to the point where I find out if I have enough memory to load all of them, Java slows down asymptotically, and jvisualvm tells me that this is because nearly all of the CPU time is spent in garbage collection:
The first minute or so is fine: "used heap" (right plot) jumps up and down as it grows because some objects are temporary (I wrote this in Scala) and some objects are permanent. After that, however, the data-loading grinds to a halt because the garbage collector is apparently checking the same objects over and over (left plot). It must be expecting them to go out of scope, but I'm keeping them in scope because I want to use them for my analysis.
I know that the garbage collector puts objects in different generations, based on their likelihood of survival. The first generation contains objects that are recently created and likely to die soon; later generations are progressively more likely to be long-lived. If my objects are wrongly in the first generation, is there any way to tell the garbage collector that they ought to be in a later generation? I know that I'll be keeping them--- how can I tell the garbage collector?
Although I'd like these objects be in a more permanent generation, PermGen would be too far: they will die eventually, after tens of minutes of processing. (I want to use this in a Hadoop reducer, which might work on a different chunk of data after this one without a new JVM.)
Note: I'm using the Sun HotSpot VM:
% java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
Correction (to a previous edit): Changing the -Xmx
does change the saturation point, but apparently Java ignores the -Xmx
command line argument if it is passed after the -jar
argument. That is, do
java -Xmx2048 -jar MyJarFile.jar
rather than
java -jar MyJarFile.jar -Xmx2048
Because of this, I was incorrectly diagnosing the behavior with respect to maximum heap and all the answers pointing to the -Xmx
flag are valid.
The saturation point I describe happens when the "heap size" (orange on right plot) reaches the chosen -Xmx
limit, and the "heap size" is always about 1.6 times the "used heap" (blue on right plot) unless you explicitly set the size of the "Old" generation with -XX:NewRatio
or -XX:OldSize
. These also need to be before the -jar
argument, and they provide a lot of control.