3

Redis recently released their new data structure called the HyperLogLog. It allows us to keep a count of unique objects and only takes up a size of 12k bytes. What I don't understand is that Redis's PFCOUNT command is said to be technically a write command. Why is this the case?

Note: as a side effect of calling this function, it is possible that the HyperLogLog is modified, since the last 8 bytes encode the latest computed cardinality for caching purposes. So PFCOUNT is technically a write command.

4

1 回答 1

4

HyperLogLog 对象的头部如下:

struct hllhdr {
    char magic[4];      /* "HYLL" */
    uint8_t encoding;   /* HLL_DENSE or HLL_SPARSE. */
    uint8_t notused[3]; /* Reserved for future use, must be zero. */
    uint8_t card[8];    /* Cached cardinality, little endian. */
    uint8_t registers[]; /* Data bytes. */
};

注意 card 字段:它包含算法评估的最后一个基数。计算基数的估计是一项昂贵的操作,因此 Redis 会缓存该值并将其保存在该字段中。

当调用 PFADD 时,HyperLogLog 对象可能会更新或不更新(很有可能不会更新)。如果没有更新,调用 PFCOUNT 将重用缓存的值(卡片字段)。如果更新,则卡片字段无效,因此下一个PFCOUNT将执行计数算法,并将新值写入卡片字段。

这就是 PFCOUNT 可以改变 HyperLogLog 对象的原因。

于 2014-04-22T18:05:27.223 回答