cassandra - 具有 3 个字段的 CQL3 复合索引如何在 thrift 列族世界中映射？

Question

在planetcassandra阅读此博客后，我想知道具有 3 个字段的 CQL3 复合索引如何在节俭列族词中映射，例如：

CREATE TABLE comments (
        article_id uuid,
        posted_at timestamp,       
        author text,
        karma int,
        content text,
        PRIMARY KEY (article_id, posted_at)
    )

在这里，article_id 列将映射到内部行键，posted_at 将映射到单元格名称（的第一部分）。

如果桌子设计是

CREATE TABLE comments (
        author_id varchar,
        posted_at timestamp,
        article_id uuid,       
        author text,
        karma int,
        content text,
        PRIMARY KEY (author_id, posted_at, article_id)
    )

内部行键是否会映射到复合索引的第一个 2 字段，article_id 映射到单元格名称，本质上对多达 20 亿个条目的文章进行切片，并且对 author_id 和 posted_at 组合的任何查询都是磁盘上的一次查找？
复合键中任意数量的字段的行为是否相同？

非常感谢您的回答。

score 2 · Accepted Answer

上面的观察是不正确的，正确的在这里

我亲自验证过：

In the first case:
article_id = partition key, posted_at = cluster key

In the second case:
author_id  = partition key, posted_at:article_id = cluster key

复合键的第一部分（author_id）称为“分区键”，其余部分（posted_at,article_id）是剩余键。
当使用复合键时，Cassandra 以不同的方式存储列。分区键成为行键。其余键与每个列名（“：”作为分隔符）连接以形成列名。列值保持不变。
其余键（分区键除外）是有序的，不允许在任何随机列上搜索，必须从第一个开始，然后可以移动到第二个，依此类推。这从“错误请求”错误中可以看出。

score 1 · Accepted Answer

Aaron Morton @他的网站thelastpickle有一个很好的解释。

In the first case:
article_id = partition key, posted_at = cluster key

In the second case:
author_id + posted_at = partition key, article_id = cluster key

因此，当您使用第二种方法时，请注意磁盘寻道，并且与第一种情况相比，该行不会变得太宽，并且会带来真正的好处。如果您没有超过 20 亿并且在限制范围内，请不要过度采用第二种方法，因为记录的分散发生在组合键上。

cassandra - 具有 3 个字段的 CQL3 复合索引如何在 thrift 列族世界中映射？

2 回答 2

Related

Reference