我正在使用 hector 在 java 中访问 cassandra。我有四个表:users、comments、user_like、user_recommend。user_like 和 user_recommend 表有一个计数器列。现在我想根据用户 ID 一起访问所有四个表中的数据。我该怎么做?我应该从所有表中单独获取给定的 userId 数据,还是有什么方法可以一次性获取?
1 回答
Unfortunaletly queries are bound to column families. So each column family will need one query.
If you want to read everything in one read think about putting everything in one column family, or even better in one row.
Here is how I would do it:
Right now you have :
CF "comments"
"user1" => //row
column "name1" = "value1" //column
CF "user_like" "user1" => //row column "name2" = "value2" //column
CF "user_recommend"
"user1" => //row
column "name3" = "value3" //column
This needs 3 queries because you have 3 Column families.
Everything in one column family would be :
CF "users"
."user1_comments" => //row
...column "name1" = "value1" //column
."user1_likes" => //row
...column "name2" = "value2" //column
."user1_recommends" => //row
...column "name3" = "value3" //column
This is way better but still sub-optimal. You cane get everything with multi_get queries. Those queries are slower since they could need to wait on many nodes in the cluster to return (different keys can fall in different nodes event if they are quite similar)
Optimal : Everything in one row.
CF "users"
."user1" => //row
...column "comments:name1" = "value1" //column
...column "likes:name2" = "value2" //column
...column "recommends:name3" = "value3" //column
You can get everything with one row read. If you only want to get comments, likes or recomments separately you can do it with range queries. Since everything is in one row your queries will tend to be a lot faster. Cassandra can handle very wide rows very well so you should not worry about those.
A good phylosophy for cassy is "If you read it together (at the same time) then save it together (at the same location)".
EDIT : Trick for doing counters without counters.
Cassandra is very good at writing new values all the time (in fact that is 100% of what it does in the background). Plus counting columns in a row or in a range of that row is quite fast. So I came up with a way to do "counters" that can be used to prevent counting duplicates. NOTE : this works only for unitary increments (+1 ... liks likes, upvotes and whatnot).
All you need to do is write a new column in a row that represents that counter: If you want to allow duplicate make the column name a timeUuid or a time stamp. Otherwize make it the id of the message he liked (this way if he clicks on the like twice it still counts as one like).
Now you have two solutions: Fire multiple queries to count the columns or read everything in one read and count using java.util.Collection.size().
This solution does use cassandra's strong points but it might not suit everybody especially if you want to avoid very wide rows. Know that Cassandra can handle very wide rows but you might get memory issues with the Collection you use to count in your app.
You could end up with something likes this:
CF "users"
."user1" =>
...column "comment:name1" = "value1"
...column "comment:name2" = "value2"
...column "likes:43f54880-a0fb-11e2-aafa-f1dce92b7e5b" = "1" //time uuids here
...column "likes:43f54881-a0fb-11e2-aafa-f1dce92b7e5b" = "1"
...column "likes:43f54882-a0fb-11e2-aafa-f1dce92b7e5b" = "1"
...column "likes:43f54883-a0fb-11e2-aafa-f1dce92b7e5b" = "1"
...column "recommend:7ba30e15-2b76-4aaa-b2d0-f8419a80a769" = "1" // uuid of the recommended item
...column "recomment:603879cc-d7b0-4767-ad27-e5dd4aa34f62" = "1"