redis - Redis - Count distinct problem (without hyper log log)

Question

I should solve a count-distinct problem in Redis without the use of HyperLogLog (because of the 0.81% of known error).

I got different requests with a list of objects [O1, O2, ... On] for a specific Key A. For each list of objects received, Redis should memorize the Objects not still saved and return the number of new objects saved.

For Example:

Request 1 : Key: A - Objects: [O1, O2, O3] -> Response 1: Number of new objects : 3
Request 2 : Key: A - Objects: [O1, O2, O4] -> Response 2: Number of new objects : 1
Request 3 : Key: A - Objects: [O1, O2, O4] -> Response 3: Number of new objects : 0

I have tried to solve this problem with the Hyper Log Log and it's working perfectly but with a growing dataset of objects, the number of new objects saved is not so accurate. With the sets and the hashmap, the memory is growing too much.

I have read some stuff about Bitmaps but is not too clear. Do you have any reference to projects that are already facing this problem?

Thanks in advance

score 2 · Accepted Answer

您可能需要考虑使用布隆过滤器。这可作为模块https://redis.com/redis-best-practices/bloom-filter-pattern/使用。

布隆过滤器允许快速测试具有 0 个假阴性和非常低的假阳性的成员资格，前提是您提前知道最大元素数是多少。您需要编写以下代码：

result = bf.exists(key, item)
if result == 0:
    bf.add(key, item)
    bf.inc(key_count)

redis - Redis - Count distinct problem (without hyper log log)

1 回答 1

Related

Reference