1

I am writing a chat server and, want to store my messages in cassandra. Because I need range queries and I know that I will expect 100 messages/day and maintain history for 6 months I will have 18000 messages for a user at a point.

Now, since I'll do range queries I need my data to be on the same machine. Either I have to use ByteOrderPartitioner, which I don't understand fully, or I can store all the message for a user on the same row.

create table users_conversations(jid1 bigint, jid2 bigint, archiveid timeuuid, stanza text, primary key((jid1, jid2), archiveid)) with CLUSTERING ORDER BY (archiveid DESC );

So I'll have 18000 columns. Do you think I'll have performance problems using this cluster key approach?

If yes, what alternative do I have?

Thanks

4

1 回答 1

2

不要使用 ByteOrderedPartitioner。我怎么强调这一点的重要性都不为过。

因为我会进行范围查询,所以我需要我的数据在同一台机器上。

使用您的 PRIMARY KEY 定义如下:

primary key((jid1, jid2), archiveid)

您当前的分区键 (jid1jid2) 将被组合和散列,以便特定值的所有消息jid1jid2一起存储在同一分区上。缺点是每个查询都需要jid1jid2。但是它们将被排序archiveid,您将能够按范围查询archiveid,并且只要您没有达到每个分区 20 亿列的限制,它应该会表现良好。

于 2015-03-15T15:32:00.193 回答