当使用 Cassandra 推荐的 RandomPartitioner(或 Murmur3Partitioner)时,不可能对键进行有意义的范围查询,因为行使用键的 md5 散列分布在集群周围。这些哈希称为“令牌”。
尽管如此,通过为每个计算工作者分配一系列令牌来在许多计算工作者之间拆分一个大表将非常有用。使用 CQL3,似乎可以直接针对 tokens 发出查询,但是以下 python不起作用...编辑:在切换到针对 cassandra 数据库的最新版本(doh!)进行测试后工作,并且还更新每个语法以下注释:
## use python cql module
import cql
## If running against an old version of Cassandra, this raises:
## TApplicationException: Invalid method name: 'set_cql_version'
conn = cql.connect('localhost', cql_version='3.0.2')
cursor = conn.cursor()
try:
## remove the previous attempt to make this work
cursor.execute('DROP KEYSPACE test;')
except Exception, exc:
print exc
## make a keyspace and a simple table
cursor.execute("CREATE KEYSPACE test WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor = 1;")
cursor.execute("USE test;")
cursor.execute('CREATE TABLE data (k int PRIMARY KEY, v varchar);')
## put some data in the table -- must use single quotes around literals, not double quotes
cursor.execute("INSERT INTO data (k, v) VALUES (0, 'a');")
cursor.execute("INSERT INTO data (k, v) VALUES (1, 'b');")
cursor.execute("INSERT INTO data (k, v) VALUES (2, 'c');")
cursor.execute("INSERT INTO data (k, v) VALUES (3, 'd');")
## split up the full range of tokens.
## Suppose there are 2**k workers:
k = 3 # --> eight workers
token_sub_range = 2**(127 - k)
worker_num = 2 # for example
start_token = worker_num * token_sub_range
end_token = (1 + worker_num) * token_sub_range
## put single quotes around the token strings
cql3_command = "SELECT k, v FROM data WHERE token(k) >= '%d' AND token(k) < '%d';" % (start_token, end_token)
print cql3_command
## this fails with "ProgrammingError: Bad Request: line 1:28 no viable alternative at input 'token'"
cursor.execute(cql3_command)
for row in cursor:
print row
cursor.close()
conn.close()
理想情况下,我希望使用 pycassa 来完成这项工作,因为我更喜欢它的 Pythonic 界面。
有一个更好的方法吗?