cassandra - Cassandra中的复合列和“IN”关系

Question

我在 Cassandra 中有以下列族，用于将时间序列数据存储在少量非常“宽”的行中：

CREATE TABLE data_bucket (
  day_of_year int,
  minute_of_day int,
  event_id int,
  data ascii,
  PRIMARY KEY (data_of_year, minute_of_day, event_id)
)

在 CQL shell 上，我可以运行如下查询：

select * from data_bucket where day_of_year = 266 and minute_of_day = 244 
  and event_id in (4, 7, 11, 1990, 3433)

本质上，我修复了复合列名称 (minute_of_day) 的第一个组成部分的值，并希望根据第二个组成部分 (event_id) 的不同值选择一组不连续的列。由于“IN”关系被解释为等式关系，因此可以正常工作。

现在我的问题是，我如何在没有 CQL 的情况下以编程方式完成相同类型的复合列切片。到目前为止，我已经尝试过 Python 客户端 pycassa 和 Java 客户端 Astyanax，但没有任何成功。

任何想法都会受到欢迎。

编辑：

我正在添加通过 cassandra-cli 看到的列族的描述输出。由于我正在寻找基于 Thrift 的解决方案，也许这会有所帮助。

ColumnFamily: data_bucket
  Key Validation Class: org.apache.cassandra.db.marshal.Int32Type
  Default column value validator: org.apache.cassandra.db.marshal.AsciiType
  Cells sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.Int32Type)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Populate IO Cache on flush: false
  Replicate on write: true
  Caching: KEYS_ONLY
  Bloom Filter FP chance: default
  Built indexes: []
  Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
    sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

score 1 · Accepted Answer

Thrift API 中没有“IN”类型的查询。您可以对每个复合列值 ( , , )执行一系列get查询。day_of_yearminute_of_dayevent_id

如果您event_id的 s 是连续的（并且您的问题说它们不是），您可以执行单个get_slice查询，并传入范围（例如、day_of_year和minute_of_days 的范围event_id）。您可以通过这种方式抓取它们并自己以编程方式过滤响应（例如，抓取日期中事件ID 介于4-3433 之间的所有数据）。更多的数据传输，更多的客户端处理，所以不是一个很好的选择，除非你真的在寻找一个范围。

因此，如果您想在 Cassandra 中使用“IN”，则需要切换到基于 CQL 的解决方案。如果您正在考虑在 python 中使用 CQL，另一个选项是cassandra-dbapi2。这对我有用：

import cql

# Replace settings as appropriate
host = 'localhost'
port = 9160
keyspace = 'keyspace_name'

# Connect
connection = cql.connect(host, port, keyspace, cql_version='3.0.1')
cursor = connection.cursor()
print "connected!"

# Execute CQL
cursor.execute("select * from data_bucket where day_of_year = 266 and minute_of_day = 244 and event_id in (4, 7, 11, 1990, 3433)")
for row in cursor:
  print str(row) # Do something with your data

# Shut the connection
cursor.close()
connection.close()

（使用 Cassandra 2.0.1 测试。）

cassandra - Cassandra中的复合列和“IN”关系

1 回答 1

Related

Reference