1

I should solve a count-distinct problem in Redis without the use of HyperLogLog (because of the 0.81% of known error).

I got different requests with a list of objects [O1, O2, ... On] for a specific Key A. For each list of objects received, Redis should memorize the Objects not still saved and return the number of new objects saved.

For Example:

  • Request 1 : Key: A - Objects: [O1, O2, O3] -> Response 1: Number of new objects : 3
  • Request 2 : Key: A - Objects: [O1, O2, O4] -> Response 2: Number of new objects : 1
  • Request 3 : Key: A - Objects: [O1, O2, O4] -> Response 3: Number of new objects : 0

I have tried to solve this problem with the Hyper Log Log and it's working perfectly but with a growing dataset of objects, the number of new objects saved is not so accurate. With the sets and the hashmap, the memory is growing too much.

I have read some stuff about Bitmaps but is not too clear. Do you have any reference to projects that are already facing this problem?

Thanks in advance

4

1 回答 1

2

您可能需要考虑使用布隆过滤器。这可作为模块https://redis.com/redis-best-practices/bloom-filter-pattern/使用。

布隆过滤器允许快速测试具有 0 个假阴性和非常低的假阳性的成员资格,前提是您提前知道最大元素数是多少。您需要编写以下代码:

result = bf.exists(key, item)
if result == 0:
    bf.add(key, item)
    bf.inc(key_count)
于 2022-01-29T17:03:35.453 回答