2

我有 cassandra 2.1.8 集群,有 16 个节点(Centos 6.6、1x4core xeon、32Gb RAM、3x3Tb HDD、java 1.8.0_65)并尝试一个一个地添加 16 个,但坚持使用第一个。

在新节点上启动 cassandra 进程后,从先前存在的节点到新添加的节点的 16 个流正在启动:

nodetool netstats |grep Already
Receiving 131 files, 241797656689 bytes total. Already received 100 files, 30419228367 bytes total
Receiving 150 files, 227954962242 bytes total. Already received 116 files, 29078363255 bytes total
Receiving 127 files, 239902942980 bytes total. Already received 103 files, 29680298986 bytes total
    ...

新节点处于“加入”状态(最后一行):

UN ...70 669.64 GB 256 ? a9c8adae-e54e-4e8e-a333-eb9b2b52bfed R0      
UN ...71 638.09 GB 256 ? 6aa8cf0c-069a-4049-824a-8359d1c58e59 R0    
UN ...80 667.07 GB 256 ? 7abb5609-7dca-465a-a68c-972e54469ad6 R1 
UJ ...81 102.99 GB 256 ? c20e431e-7113-489f-b2c3-559bbd9916e2 R2

在几个小时的加入过程中看起来很正常,但之后新节点上的 cassandra 进程因 oom 异常而死亡:

ERROR 09:07:37 Exception in thread Thread[Thread-1822,5,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:167)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at java.lang.Thread.run(Thread.java:745)

我已经进行了 6 或 7 次尝试,CMS 和 G1 GC,MAX_HEAP_SIZE 从 8G(默认)到 16G,没有运气。似乎 cassandra 因在不同的地方堆在堆上而陷入困境:

RROR [CompactionExecutor:6] 2015-11-08 04:42:24,277 CassandraDaemon.java:223 - Exception in thread Thread[CompactionExecutor:6,1,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:75) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:70) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:48) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1822) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:107) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:83) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:246) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:270) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1967) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1810) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(SliceQueryPager.java:90) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:85) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(SliceQueryPager.java:38) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:155) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:144) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:427) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at org.apache.cassandra.db.compaction.CompactionManager$10.run(CompactionManager.java:1144) ~[apache-cassandra-2.1.8.jar:2.1.8]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

进一步扩展 MAX_HEAP_SIZE 会导致系统 oom-killer 导致 cassandra 死亡。

有任何想法吗?

4

1 回答 1

0

我遇到了完全相同的问题(请参阅我的 JIRA 票证),它似乎与一个有很多墓碑的表有关(按大小分层压缩通常不能很好地清理它们)。一种潜在的分类措施是简单地重新启动节点并将其auto_bootstrap设置为 false,然后运行nodetool rebuild以完成该过程。这将导致现有数据被保留,同时允许新节点为流量提供服务。

但是您可能仍然有导致 OOM 的潜在问题。在流会话期间(很明显),一些非常大的东西被物化到内存中,并且很可能是:

  1. 一个非常大的分区,可能会发生意外。检查cfstats并查看最大分区字节数。如果是这种情况,您需要处理根数据模型问题并清理该数据。

  2. 很多墓碑。您应该在日志中看到关于此的警告。

如果您确实遇到了这些问题之一,那么您几乎肯定必须先解决它,然后才能成功进行流式传输。

于 2015-11-10T01:49:07.227 回答