0

我有许多不同的项目,我想跟踪每个项目的命中次数,然后查询给定日期时间范围内每个项目的命中计数,直到每秒。

所以我开始将命中存储在一个排序集中,每秒一个排序集(unix epoch time),例如:

zincrby ItemCount:1346742000 item1 1    
zincrby ItemCount:1346742000 item2 1
zincrby ItemCount:1346742001 item1 1
zincrby ItemCount:1346742005 item9 1

现在获取给定日期范围内每个项目的总命中数:

1. Given a start datetime and end datetime:
   Calculate the range of epochs that fall under that range.

2. Generate the key names for each sorted set using the epoch values example:
   ItemCount:1346742001, ItemCount:1346742002, ItemCount:1346742003

3. Use Union store to aggregate all the values from different sorted sets 

   ZUINIONSTORE _item_count KEYS....

4. To get the final results out:

   ZRANGE _item_count 0, -1 withscores

所以它有点工作,但是当我有一个像 1 个月这样的大日期范围时遇到问题,从第 1 步和第 2 步计算的键名数量达到数百万(每天 86400 个纪元值)。使用如此大量的键,ZUINIONSTORE 命令失败 - 套接字被破坏。另外,循环并生成这么多密钥需要一段时间。

我怎样才能在 Redis 中以更有效的方式设计它,并且仍然将跟踪粒度一直保持在几秒钟而不是几分钟或几天。

4

1 回答 1

0

yeah, you should avoid big unions of sorted sets. a nice trick you can do, assuming you know the maximum hits an item can get per second.

  1. sorted set per item with timestamps as BOTH scores and values.
  2. but the scores are incremented by 1/(max_predicted_hits_per_second), if you are not the first client to write them. this way the number after the decimal dot is always hits/max_predicted_hits_per second, but you can still do range queries.

so let's say max_predicted_hits_per_second is 1000. what we do is this (python example):

#1. make sure only one client adds the actual timestamp, 
#by doing SETNX to a temporary key)

now = int(time.time())
rc = redis.setnx('item_ts:%s' % itemId, now)


#just the count part
val = float(1)/1000
if rc: #we are the first to incement this second
   val += now
   redis.expire('item_ts:%s' % itemId, 10) #we won't need that anymore soon, assuming all clients have the same clock

#2 increment the count
redis.zincrby('item_counts:%s' % itemId, now, amount = val) 

and now querying a range will be something like:

counts = redis.zrangebyscore('item_counts:%s' % itemId, minTime, maxTime + 0.999, withscores=True)

total = 0
for value, score in counts:
    count = (score - int(value))*1000
    total += count
于 2012-06-05T09:36:11.050 回答