7

我目前正在评估 G1 垃圾收集器以及它在我们的应用程序中的表现。查看 gc-log,我注意到很多集合都有很长的“扫描 RS”阶段:

7968.869: [GC pause (mixed), 10.27831700 secs]
   [Parallel Time: 10080.8 ms]
   (...)
      [Scan RS (ms):  4030.4  4034.1  4032.0  4032.0
       Avg: 4032.1, Min: 4030.4, Max: 4034.1, Diff:   3.7]
      [Object Copy (ms):  6038.5  6033.3  6036.7  6037.1
       Avg: 6036.4, Min: 6033.3, Max: 6038.5, Diff:   5.2]
   (...)
    [Eden: 19680M(19680M)->0B(20512M) Survivors: 2688M->2624M Heap:
     75331M(111904M)->51633M(115744M)]
 [Times: user=40.49 sys=0.02, real=10.28 secs] 

所有删除的日志行条目都以个位数毫秒显示运行时间。

我认为大部分时间应该花在复制上,对吧?Scan RS 需要这么长时间的原因可能是什么?关于如何调整 G1 设置的任何想法?JVM 开始于

-Xms40960M -Xmx128G -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -Xloggc:gc.log

编辑:哦,我忘了......我正在使用Java 7u25

更新: 我注意到另外两件奇怪的事情:

16187.740: [GC concurrent-mark-start]
16203.934: [GC pause (young), 2.89871800 secs]
(...)
16218.455: [GC pause (young), 4.61375100 secs]
(...)
16237.441: [GC pause (young), 4.46131800 secs]
(...)
16257.785: [GC pause (young), 4.73922600 secs]
(...)
16275.417: [GC pause (young), 3.87863400 secs]
(...)
16291.505: [GC pause (young), 3.72626400 secs]
(...)
16307.824: [GC pause (young), 3.72921700 secs]
(...)
16325.851: [GC pause (young), 3.91060700 secs]
(...)
16354.600: [GC pause (young), 5.61306000 secs]
(...)
16393.069: [GC pause (young), 17.50453200 secs]
(...)
16414.590: [GC concurrent-mark-end, 226.8497670 sec]

在执行并行运行的同时,并发 GC 运行仍在继续。我不确定这是否是故意的,但对我来说似乎有点不对。诚然,这是一个极端的例子,但我确实在我的日志中看到了这种行为。

另一件事是我的 JVM 进程增长到 160g。考虑到 128g 的堆大小,这是一个相当大的开销。这是意料之中的,还是 G1 泄漏内存?关于如何找到它的任何想法?

PS:我不确定我是否应该对更新提出新问题......如果你们中的任何人认为这会有益,请告诉我;)

更新 2: 我猜 G1 真的可能在泄漏内存: http: //printfdebugger.tumblr.com/post/19142660766/how-i-learned-to-love-cms-and-had-my-heart-broken-by -g1 由于现在这是一个交易破坏者,我不会花更多时间来玩这个。我还没有尝试过配置区域大小(-XX:G1HeapRegionSize)和降低堆占用率(-XX:InitiatingHeapOccupancyPercent)。

4

1 回答 1

6

Let's see.

1 - First clues

It looks like your GC was configured to use 4 threads (or you have 4 vCPU, but it is unlikely given the size of the heap). It is quite low for a 128GB heap, I was expecting more.

The GC events seems to happen at 25+ seconds interval. However, the log extract you gave do not mention the number of regions that were processed.

=> By any chance, did you specify pause time goals to G1GC (-XX:MaxGCPauseMillis=N) ?

2 - Long Scan RSet time

"Scan RSet" means the time the GC spent in scanning the Remembered Sets. Remembered Set of a region contains cards that correspond to the references pointing into that region. This phase scans those cards looking for the references pointing into all the regions of the collection set.

So here, we have one more question :

=> How many regions were processed during that particular collection (i.e. how big is the CSet)

3 - Long Object Copy time

The copy time, as the name suggest, is the time spend by each worker thread copying live objects from the regions in the Collection Set to the other regions.

Such long copy time can suggest that a lot of regions were processed, and that you may want to reduce that number. It could also suggest swapping, but this is very unlikely given your user/real values at the end of the log.

4 - Now what to do

You should check in the GC log the number of regions that were processed. Correlate this number with your region size and deduce the amount of memory that was scanned.

You can then set a smaller pause time goal (for instance, to 500ms using -XX:MaxGCPauseMillis=500). This will

  • increase the number of GC events,
  • reduce the amount of freed memory per GC cycle
  • reduce the STW pauses during YGC

Hope that helps !

Sources :

于 2013-07-15T21:52:55.020 回答