我有这个用例,我需要不断地听一个 kafka 主题并根据 Spark 流应用程序的列值写入 2000 个列族(每个 15 列..时间序列数据)。我有一个本地 Cassandra 安装设置。在使用 3 个内核和 12 GB 内存的 CentOS VM 上创建这些列族大约需要 1.5 小时。在我的 spark 流应用程序中,我正在做一些预处理以将这些流事件存储到 Cassandra。我遇到了我的流媒体应用程序完成此操作所需的时间问题。
我试图根据密钥将 300 个事件保存到多个列族(大约 200-250 个),我的应用程序需要大约 10 分钟来保存它们。这似乎很奇怪,因为将这些事件按键分组打印到屏幕上需要不到一分钟的时间,但只有当我将它们保存到 Cassandra 时才需要时间。我将 300 万条记录保存到 Cassandra 没有问题。花了不到 3 分钟(但这是针对 Cassandra 中的单个列族)。
我的要求是尽可能实时,这似乎还很遥远。生产环境每 3 秒大约有 400 个事件。
是否需要对 Cassandra 中的 YAML 文件进行任何调整或对 cassandra-connector 本身进行任何更改
INFO 05:25:14 system_traces.events 0,0
WARN 05:25:14 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:14 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO 05:25:16 ParNew GC in 340ms. CMS Old Gen: 1308020680 -> 1454559048; Par Eden Space: 251658240 -> 0;
WARN 05:25:16 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:16 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO 05:25:17 ParNew GC in 370ms. CMS Old Gen: 1498825040 -> 1669094840; Par Eden Space: 251658240 -> 0;
WARN 05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:18 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN 05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN 05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO 05:25:19 ParNew GC in 382ms. CMS Old Gen: 1714792864 -> 1875460032; Par Eden Space: 251658240 -> 0;
W