我有 6 个节点,1 个 Solr,5 个 Spark 节点,使用 datastax。我的集群位于与 Amazon EC2 类似的服务器上,具有 EBS 卷。每个节点有3个EBS卷,使用LVM组成一个逻辑数据盘。在我的 OPS 中心,同一节点经常变得无响应,这导致我的数据系统连接超时。我的数据量约为 400GB,有 3 个副本。我有 20 个流作业,每分钟有批处理间隔。这是我的错误信息:
/var/log/cassandra/output.log:WARN 13:44:31,868 Not marking nodes down due to local pause of 53690474502 > 5000000000
/var/log/cassandra/system.log:WARN [GossipTasks:1] 2016-09-25 16:40:34,944 FailureDetector.java:258 - Not marking nodes down due to local pause of 64532052919 > 5000000000
/var/log/cassandra/system.log:WARN [GossipTasks:1] 2016-09-25 16:59:12,023 FailureDetector.java:258 - Not marking nodes down due to local pause of 66027485893 > 5000000000
/var/log/cassandra/system.log:WARN [GossipTasks:1] 2016-09-26 13:44:31,868 FailureDetector.java:258 - Not marking nodes down due to local pause of 53690474502 > 5000000000
编辑:
这些是我更具体的配置。我想知道我是否做错了什么,如果是,我怎样才能详细了解它是什么以及如何解决它?
出堆设置为
MAX_HEAP_SIZE="16G"
HEAP_NEWSIZE="4G"
当前堆:
[root@iZ11xsiompxZ ~]# jstat -gc 11399
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
0.0 196608.0 0.0 196608.0 6717440.0 2015232.0 43417600.0 23029174.0 69604.0 68678.2 0.0 0.0 1041 131.437 0 0.000 131.437
[root@iZ11xsiompxZ ~]# jmap -heap 11399
Attaching to process ID 11399, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.102-b14
using thread-local object allocation.
Garbage-First (G1) GC with 23 thread(s)
堆配置:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 51539607552 (49152.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 30920409088 (29488.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 16777216 (16.0MB)
堆使用:
G1 Heap:
regions = 3072
capacity = 51539607552 (49152.0MB)
used = 29923661848 (28537.427757263184MB)
free = 21615945704 (20614.572242736816MB)
58.059545404588185% used
G1 Young Generation:
Eden Space:
regions = 366
capacity = 6878658560 (6560.0MB)
used = 6140461056 (5856.0MB)
free = 738197504 (704.0MB)
89.26829268292683% used
Survivor Space:
regions = 12
capacity = 201326592 (192.0MB)
used = 201326592 (192.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 1443
capacity = 44459622400 (42400.0MB)
used = 23581874200 (22489.427757263184MB)
free = 20877748200 (19910.572242736816MB)
53.04110320109241% used
40076 interned Strings occupying 7467880 bytes.
我不知道为什么会这样。非常感谢。