3

所以,我们使用 G1 GC。和 18GB 堆。年轻代大小约为3,5G。并且堆最大使用量约为12G。记忆中充满了短暂的事物。

此外,Couchbase实例在 JVM 附近的同一节点上运行可能很重要。当它在硬盘上持续更改时,它经常需要所有 iops,但是有足够的空闲 cpu 时间和内存。

启用的 JVM 选项:

-Xmx18g -Xms18g -XX:MaxPermSize=512M -XX:+UseG1GC -XX:+DisableExplicitGC -XX:+ParallelRefProcEnabled

我可悲的问题是年轻代 GC 暂停。长时间暂停 GC 日志通常如下所示:

2013-07-10T15:06:25.963+0400: 9122,066: [GC pause (young)
Desired survivor size 243269632 bytes, new threshold 5 (max 15)
        - age   1:   69789280 bytes,   69789280 total
- age   2:   58618240 bytes,  128407520 total
- age   3:   54519720 bytes,  182927240 total
- age   4:   51592728 bytes,  234519968 total
- age   5:   45186520 bytes,  279706488 total
9122,066: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 174,16 ms, remaining time: 25,84 ms, target pause time: 200,00 ms]
        9122,066: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 426 regions, survivors: 34 regions, predicted young region time: 164,97 ms]
        9122,066: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 426 regions, survivors: 34 regions, old: 0 regions, predicted pause time: 339,13 ms, target pause time: 200,00 ms]
        9122,259: [SoftReference, 0 refs, 0,0063780 secs]9124,575: [WeakReference, 4 refs, 0,0031600 secs]9124,578: [FinalReference, 1640 refs, 0,0033730 secs]9124,581: [PhantomReference, 145 refs, 0,0032080 secs]9124,5
        85: [JNI Weak Reference, 0,0000810 secs], 2,53669600 secs]
        [Parallel Time: 190,5 ms]
        [GC Worker Start (ms):  9122066,6  9122066,7  9122066,7  9122066,7  9122066,8  9122066,9  9122066,9  9122066,9  9122066,9  9122067,0  9122067,0  9122067,1  9122067,1  9122067,1  9122067,1  9122067,2  91220
        67,2  9122067,3
Avg: 9122067,0, Min: 9122066,6, Max: 9122067,3, Diff:   0,7]
        [Ext Root Scanning (ms):  4,7  6,0  4,8  4,5  4,2  4,3  4,2  4,3  4,6  3,4  13,5  5,2  4,2  5,6  4,2  4,1  4,3  4,0
Avg:   5,0, Min:   3,4, Max:  13,5, Diff:  10,1]
        [Update RS (ms):  20,9  19,6  21,1  21,3  21,2  21,2  21,3  21,2  21,7  21,5  12,1  20,2  21,1  19,4  21,0  21,1  20,7  21,2
Avg:  20,4, Min:  12,1, Max:  21,7, Diff:   9,6]
        [Processed Buffers : 27 23 25 29 31 22 25 34 28 14 36 23 24 22 28 24 25 24
Sum: 464, Avg: 25, Min: 14, Max: 36, Diff: 22]
        [Scan RS (ms):  9,0  9,2  8,7  8,8  9,1  9,1  8,9  9,1  8,3  9,2  9,0  9,1  9,2  9,2  9,1  9,0  9,0  9,1
Avg:   9,0, Min:   8,3, Max:   9,2, Diff:   1,0]
        [Object Copy (ms):  145,1  145,0  145,2  145,1  145,1  144,9  145,1  144,9  144,9  145,4  144,8  144,8  144,8  145,0  145,0  145,1  145,2  144,9
Avg: 145,0, Min: 144,8, Max: 145,4, Diff:   0,6]
        [Termination (ms):  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0
Avg:   0,0, Min:   0,0, Max:   0,0, Diff:   0,0]
        [Termination Attempts : 5 8 2 11 5 6 6 5 5 7 4 7 2 9 8 5 7 8
Sum: 110, Avg: 6, Min: 2, Max: 11, Diff: 9]
        [GC Worker End (ms):  9122246,4  9122246,6  9122246,7  9122246,6  9122246,7  9122246,7  9122246,5  9122246,7  9122246,5  9122246,5  9122246,6  9122246,7  9122246,8  9122246,4  9122246,6  9122246,5  9122246
        ,7  9122246,8
Avg: 9122246,6, Min: 9122246,4, Max: 9122246,8, Diff:   0,3]
        [GC Worker (ms):  179,8  179,9  180,0  179,8  179,9  179,9  179,6  179,8  179,6  179,5  179,6  179,6  179,7  179,3  179,5  179,4  179,4  179,5
Avg: 179,7, Min: 179,3, Max: 180,0, Diff:   0,7]
        [GC Worker Other (ms):  10,7  10,7  10,8  10,8  10,9  10,9  11,0  11,0  11,0  11,1  11,1  11,1  11,2  11,2  11,2  11,2  11,3  11,3
Avg:  11,0, Min:  10,7, Max:  11,3, Diff:   0,6]
        [Clear CT:   2,8 ms]
        [Other: 2343,4 ms]
        [Choose CSet:   0,1 ms]
        [Ref Proc: 2327,7 ms]
        [Ref Enq:   1,9 ms]
        [Free CSet:   8,2 ms]
        [Eden: 3408M(3408M)->0B(3400M) Survivors: 272M->280M Heap: 9998M(18432M)->6638M(18432M)]
        [Times: user=3,26 sys=0,02, real=2,54 secs]
Total time for which application threads were stopped: 2,5434370 seconds

产生问题的 GC 的唯一阶段是“引用处理”。但是日志看起来很奇怪:soft、weak、final 和 jni 引用处理只用了很少的时间。但总时间为 2.5 秒。或者甚至可能更长,在最坏的情况下长达 10 秒。

另一个暂停(更舒适)可能看起来像:

2013-07-10T16:26:11.862+0400: 13907,965: [GC pause (young)
    Desired survivor size 243269632 bytes, new threshold 4 (max 15)
    - age   1:   69125832 bytes,   69125832 total
    - age   2:   58756480 bytes,  127882312 total
    - age   3:   52397376 bytes,  180279688 total
    - age   4:   88850424 bytes,  269130112 total
    13907,965: [G1Ergonomics (CSet Construction) start choosing CSet, predicted base time: 77,38 ms, remaining time: 122,62 ms, target pause time: 200,00 ms]
    13907,965: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 427 regions, survivors: 33 regions, predicted young region time: 167,95 ms]
    13907,965: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 427 regions, survivors: 33 regions, old: 0 regions, predicted pause time: 245,33 ms, target pause time: 200,00 ms]
    13908,155: [SoftReference, 0 refs, 0,0041340 secs]13908,160: [WeakReference, 0 refs, 0,0023850 secs]13908,162: [FinalReference, 1393 refs, 0,0065970 secs]13908,169: [PhantomReference, 108 refs, 0,0018650 secs]13908,171: [JNI Weak Reference, 0,0000630 secs], 0,22008100 secs]
    [Parallel Time: 188,4 ms]
    [GC Worker Start (ms):  13907965,3  13907965,3  13907965,4  13907965,4  13907965,5  13907965,5  13907965,6  13907965,6  13907965,6  13907965,7  13907965,7  13907965,7  13907965,8  13907965,8  13907965,8  13907965,9  13907965,9  13907965,9
    Avg: 13907965,6, Min: 13907965,3, Max: 13907965,9, Diff:   0,6]
    [Ext Root Scanning (ms):  5,8  5,0  6,8  6,3  6,1  6,2  6,0  6,3  5,2  4,2  5,0  6,2  4,5  6,0  17,1  4,4  6,2  5,3
    Avg:   6,3, Min:   4,2, Max:  17,1, Diff:  12,9]
    [Update RS (ms):  24,8  26,0  23,9  24,1  24,1  24,1  24,2  23,9  25,0  25,2  25,1  24,1  26,0  24,3  13,7  25,7  24,2  24,7
    Avg:  24,1, Min:  13,7, Max:  26,0, Diff:  12,2]
    [Processed Buffers : 30 20 9 16 16 19 20 21 22 12 30 17 17 20 12 20 17 22
    Sum: 340, Avg: 18, Min: 9, Max: 30, Diff: 21]
    [Scan RS (ms):  7,5  7,1  7,2  7,5  7,6  7,5  7,5  7,6  7,1  7,4  7,6  7,2  7,2  7,4  7,2  7,5  7,0  7,7
    Avg:   7,4, Min:   7,0, Max:   7,7, Diff:   0,7]
    [Object Copy (ms):  133,1  133,1  133,2  133,1  133,2  133,1  133,2  133,1  133,5  134,0  133,0  133,2  133,0  132,9  132,6  133,1  133,2  132,9
    Avg: 133,1, Min: 132,6, Max: 134,0, Diff:   1,3]
    [Termination (ms):  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0  0,0
    Avg:   0,0, Min:   0,0, Max:   0,0, Diff:   0,0]
    [Termination Attempts : 1 3 1 1 1 1 1 1 1 1 2 3 2 1 1 1 1 1
    Sum: 24, Avg: 1, Min: 1, Max: 3, Diff: 2]
    [GC Worker End (ms):  13908136,6  13908136,9  13908136,5  13908136,7  13908136,7  13908136,8  13908136,7  13908136,7  13908136,8  13908136,8  13908136,5  13908136,6  13908136,5  13908136,5  13908136,5  13908136,5  13908136,8  13908136,6
    Avg: 13908136,7, Min: 13908136,5, Max: 13908136,9, Diff:   0,4]
    [GC Worker (ms):  171,3  171,6  171,1  171,2  171,2  171,3  171,1  171,1  171,1  171,2  170,8  170,9  170,7  170,7  170,7  170,6  171,0  170,7
    Avg: 171,0, Min: 170,6, Max: 171,6, Diff:   0,9]
    [GC Worker Other (ms):  17,2  17,2  17,3  17,3  17,4  17,4  17,5  17,5  17,5  17,5  17,6  17,6  17,7  17,7  17,7  17,7  17,8  17,8
    Avg:  17,5, Min:  17,2, Max:  17,8, Diff:   0,6]
    [Clear CT:   1,6 ms]
    [Other:  30,1 ms]
    [Choose CSet:   0,1 ms]
    [Ref Proc:  17,1 ms]
    [Ref Enq:   0,9 ms]
    [Free CSet:   7,4 ms]
    [Eden: 3416M(3416M)->0B(3456M) Survivors: 264M->224M Heap: 7289M(18432M)->3912M(18432M)]
    [Times: user=3,16 sys=0,00, real=0,22 secs]    

参考处理仍然是最长的阶段,但要短得多。ParallelRefProcEnabled 并不能解决我的问题。我也尝试过改变年轻一代的大小。它也没有帮助。设置不同的 -XX:MaxGCPauseMillis,更宽松的 600 毫秒或更严格的 100,导致吞吐量仍然很差。

CMS 性能甚至比 G1 更差,参数如下:

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly

年轻一代的收藏时间更长,更频繁。

我对所描述的日志完全感到困惑。明天我将尝试将 Couchbase 实例移动到另一个节点,以检查它是否冻结了 GC 线程。

但是,如果 Couchbase 不是重点,也许有人可以向我解释日志的含义。或者也许有一些神奇的 CMS 参数来解决这个问题。

我会很高兴得到任何帮助!

4

1 回答 1

2

问题由我们自己解决。我有一个新规则——永远不要在 JVM 附近安装 Couchbase。过去我们在实例 Postgresql 与 Couchbase 冲突时遇到过这样的问题,因为 Couchbase 喜欢抓取所有磁盘操作,而 Postgresql 无法提交任何内容。

所以,隔离 Couchbase,一切都会好起来的。

于 2013-07-11T05:49:44.810 回答