我正在从 cassandra 2.0 中提取大量数据,但不幸的是出现超时异常。我的桌子:

CREATE KEYSPACE StatisticsKeyspace
  WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

CREATE TABLE StatisticsKeyspace.HourlyStatistics(
KeywordId text,
Date timestamp,
HourOfDay int,
Impressions int,
Clicks int,
AveragePosition double,
ConversionRate double,
AOV double,
AverageCPC double,
Cost double,
Bid double,
PRIMARY KEY(KeywordId, Date, HourOfDay)
CREATE INDEX ON StatisticsKeyspace.HourlyStatistics(Date);


SELECT KeywordId, Date, HourOfDay, Impressions, Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid 
FROM StatisticsKeyspace.hourlystatistics 
WHERE Date >= '2014-03-22' AND Date <= '2014-03-24'


read_request_timeout_in_ms: 60000
range_request_timeout_in_ms: 60000
write_request_timeout_in_ms: 40000
cas_contention_timeout_in_ms: 3000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 60000

但它仍然会在大约 10 秒内引发超时。有什么想法可以解决这个问题吗?


1 回答 1


如果使用 datastax 中的 java 客户端,则默认启用分页,行集为 5000。如果仍然超时,您可以尝试使用

public Statement setFetchSize(int fetchSize)


如果您使用的是 cli,则可能需要尝试某种手动分页:

SELECT KeywordId, Date, HourOfDay, Impressions, Clicks,AveragePosition,ConversionRate,AOV,AverageCPC,Bid 
FROM StatisticsKeyspace.hourlystatistics 
WHERE Date >= '2014-03-22' AND Date <= '2014-03-24' 
LIMIT 100;

SELECT * FROM ....  WHERE token(KeywordId) > token([Last KeywordId received]) AND ...
LIMIT 100;

要检测一些集群问题,您可以尝试使用限制为 1 的选择,可能存在潜在问题。


If you are still experiencing performance issues with your query, I would look at your secondary index, since the amount of data transferred seems to reasonable (only 'small' data types are returned). If I am right, changing the fetch size will not change much. Instead, do you insert dates only in your "Date" (timestamp) column? If you are inserting actual timestamps instead, the secondary index on this column will be very slow due to the cardinality. If you insert a date only, the timestamp will default to date + "00:00:00" + TZ which should reduce the cardinality and thus improve the look-up speed. (watch out for timezone issues!) To be absolutely sure, try a secondary index on a column with a different data type, like an int for Date (counting the days since 1970-01-01 or sth).

于 2014-06-16T15:47:27.203 回答