indexing - Cassandra 二级索引 get_indexed_slices 超时

Question

我正在使用带有 2 个二级索引的 Cassandra 0.8，用于“DeviceID”和“DayOfYear”等列。我有这两个索引，以便在某个日期范围内检索设备的数据。每当我获得日期过滤器时，我都会将其转换为 DayOfYear 并使用 .net Thrift API 使用索引切片进行搜索。目前我也无法升级数据库。

我的问题是我通常没有任何问题使用 get_indexed_slices 查询当前日期（使用当前日期）检索行。但是，每当我查询一年中的昨天（这是索引列之一）时，我第一次进行查询时就会超时。大多数情况下，它会在我第二次查询时返回，而在第三次查询时返回 100%。

这两个列都在列族中创建为双数据类型，我通常每分钟获得 1 条记录。我在集群中有 3 个节点，并且 nodetool 报告表明节点已启动并正在运行，尽管来自 nodetool 的负载分布报告看起来像这样。

Starting NodeTool Address DC Rack Status State Load Owns xxx.xx.xxx.xx datacenter1 rack1 Up Normal 7.59 GB 51.39% xxx.xx.xxx.xx datacenter1 rack1 Up Normal 394.24 MB 3.81% xxx.xx.xxx.xx datacenter1 rack1 Up Normal 4.42 GB 44.80% 我在 YAML 中的配置如下。

hinted_handoff_enabled: true max_hint_window_in_ms: 3600000 # one hour hinted_handoff_throttle_delay_in_ms: 50 partitioner: org.apache.cassandra.dht.RandomPartitioner commitlog_sync: periodic commitlog_sync_period_in_ms: 120000 flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 concurrent_reads: 32 concurrent_writes: 24 sliced_buffer_size_in_kb: 64 rpc_keepalive: true rpc_server_type: sync thrift_framed_transport_size_in_mb: 15 thrift_max_message_length_in_mb: 16 incremental_backups: true snapshot_before_compaction: false column_index_size_in_kb: 64 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_throughput_mb_per_sec: 16 compaction_preheat_key_cache: true rpc_timeout_in_ms: 50000 index_interval: 128

有什么我可能会丢失的吗？配置有问题吗？

score 2 · Accepted Answer

将您的数据复制到另一个列族中，其中键是您的搜索数据。行切片更快

就我个人而言，我从来没有在生产环境中使用二级索引。或者我遇到了超时问题，或者二级索引检索数据的速度低于插入的数据量。我认为这与不按顺序读取数据和 HD 寻道时间有关。

score 1 · Accepted Answer

如果你来自关系模型，playOrm 也一样快，你可以在 noSQL 存储上建立关系，但你只需要对非常大的表进行分区。如果你这样做了，你就可以使用“可扩展的 JQL”来做你的事情

@NoSqlQuery(name="findJoinOnNullPartition", query="PARTITIONS t(:partId) select t FROM TABLE as t INNER JOIN t.security as s where s.securityType = :type and t.numShares = :shares")

IT 还具有基本 ORM 的 @ManyToOne、@OneToMany 等注释，尽管在 noSQL 中有些东西的工作方式不同，但很多都是相似的。

score -1 · Accepted Answer

我终于以不同的方式解决了我的问题。事实上，我意识到问题出在我的数据模型上。

问题的出现是因为我们来自 RDBMS 背景。我稍微重构了数据模型，现在，我得到了更快的响应。

indexing - Cassandra 二级索引 get_indexed_slices 超时

3 回答 3

Related

Reference