0

我们正在尝试使用 Cassandra 作为我们的数据存储,并且遇到了由于堆空间不足而导致节点失败的问题。我们在一个 9 节点集群上运行带有 Cassandra 2.0.1 的 Datastax 社区版,该集群运行 Ubuntu 服务器 13.04,每个节点有 16 GB RAM。在数据迁移期间,我们的两个节点由于堆空间不足而意外停机。日志中的堆栈跟踪相当不起眼且变化多端。这是其中之一的示例:

ERROR [MutationStage:21] 2013-11-01 07:08:39,656 CassandraDaemon.java (line 185) Exception in thread Thread[MutationStage:21,5,main]
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
    at org.apache.cassandra.utils.SlabAllocator$Region.init(SlabAllocator.java:178)
    at org.apache.cassandra.utils.SlabAllocator.getRegion(SlabAllocator.java:101)
    at org.apache.cassandra.utils.SlabAllocator.allocate(SlabAllocator.java:70)
    at org.apache.cassandra.utils.Allocator.clone(Allocator.java:30)
    at org.apache.cassandra.db.ColumnFamilyStore.internOrCopy(ColumnFamilyStore.java:2220)
    at org.apache.cassandra.db.Column.localCopy(Column.java:277)
    at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:107)
    at org.apache.cassandra.db.Memtable$1.apply(Memtable.java:104)
    at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
    at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
    at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
    at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
    at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
    at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

在此之前有这样的 AssertionErrors:

ERROR [FlushWriter:6176] 2013-11-01 06:55:48,825 CassandraDaemon.java (line 185) Exception in thread Thread[FlushWriter:6176,5,main]
java.lang.AssertionError
    at org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198)
    at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:186)
    at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:358)
    at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:317)
    at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

以及大量垃圾收集状态消息,如下所示:

INFO [ScheduledTasks:1] 2013-11-01 06:59:14,923 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 5935 ms for 1 collections, 2963961136 used; max is 3902799872
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,924 StatusLogger.java (line 55) Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadStage                         0         3       58646672         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) RequestResponseStage              0         1       22614351         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,925 StatusLogger.java (line 70) ReadRepairStage                   0         0          76371         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) MutationStage                     7       260      709366463         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) ReplicateOnWriteStage             0         0         104455         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,926 StatusLogger.java (line 70) GossipStage                       0         1        3695467         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,953 StatusLogger.java (line 70) AntiEntropyStage                  0         0            404         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MigrationStage                    0         0           1178         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,954 StatusLogger.java (line 70) MemtablePostFlusher               1        39          43229         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MemoryMeter                       0         0            668         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) FlushWriter                       0         0          23228         0                82
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,955 StatusLogger.java (line 70) MiscStage                         0         0            196         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) commitlog_archiver                0         0              0         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) InternalResponseStage             0         0            276         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 70) HintedHandoff                     0         0             13         0                 0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,956 StatusLogger.java (line 79) CompactionManager                 3        11
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 81) Commitlog                       n/a       261
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 93) MessagingService                n/a       1,0
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 103) Cache Type                     Size                 Capacity               KeysToSave
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,957 StatusLogger.java (line 105) KeyCache                   41783700                104857600                      all
INFO [ScheduledTasks:1] 2013-11-01 06:59:14,975 StatusLogger.java (line 111) RowCache                          0                        0                      all
...

考虑到这发生在仅 4 小时的数据摄取后,我们想知道为什么会发生这种情况以及我们可以做些什么来防止它再次发生。提前致谢。

4

1 回答 1

-1

多年来一直在 Ubuntu 上运行 Cassandra,它对 RAM 设置非常敏感。一般来说,每个节点不要存储超过 1TB 的数据,并避免在最大堆小于 8GB 的​​情况下运行。

请参阅 /etc/cassandra/cassandra-env.sh 中的“MAX_HEAP_SIZE”设置。

当您最初导入数据时,它会进入 RAM,然后被压缩。为初始启动设置更高的最大堆通常是一个好主意,然后在集群完全启动后使用较小的堆重新启动。

于 2015-04-08T15:06:05.533 回答