cassandra - Cassandra物化视图分区键更新性能

Question

我正在尝试更新基表中的列，该列是物化视图中的分区键，并试图了解其在生产环境中的性能影响。

基表：

CREATE TABLE if not exists data.test
 (  foreignid    uuid,
  id           uuid,         
 kind         text,
  version      text,            
 createdon    timestamp,         
**certid**    text,
  PRIMARY KEY(foreignid,createdon,id)     );

物化视图：

CREATE MATERIALIZED VIEW if not exists data.test_by_certid 
AS  SELECT * FROM data.test  WHERE id IS NOT NULL AND foreignid 
IS NOT NULL AND createdon IS NOT NULL AND certid IS NOT NULL 
PRIMARY KEY (**certid**, foreignid, createdon, id);

因此，certid 是我们物化视图中的新分区键

发生了什么：

1. When we first insert into the test table , usually the certids would
be empty which would be replaced by "none" string and inserted into
the test base table.

2.The row gets inserted into materialized view as well

3. When the user provides us with certid , the row gets updated in the test base table with the new certid

4.the action gets mirrored and the row is updated in materialized view wherein the partition key certid is getting updated from "none"
to a new value

问题：

1.What is the perfomance implication of updating the partition key certid in the materialized view?

2.For my use case, is it better to create a new table with certid as partition key (insert only when certid in non-empty) and manually
maintain all CRUD operations to the new table or should I use MV and
let cassandra do the bookkeeping?

需要注意的是，性能是一个重要的标准，因为它将在生产环境中使用。

谢谢

score 7 · Accepted Answer

更新存在一个或多个视图的表总是比更新没有视图的表更昂贵，因为执行 read-before-write 和锁定分区以确保并发更新与 read-before-写。您可以在ScyllaDb 的 wiki中阅读有关 Cassandra 中物化视图内部结构的更多信息。

如果更改certid是一次性操作，那么性能影响不应该太担心。无论如何，让 Cassandra 处理更新 MV 总是一个更好的主意，因为它会处理异常情况（例如当存储视图的节点被分区并且更新无法传播时会发生什么），并最终确保一致性.

如果您担心性能，请考虑将 Cassandra 替换为 Scylla。

cassandra - Cassandra物化视图分区键更新性能

1 回答 1

Related

Reference