我知道 Hyperloglog 是如何工作的,但我想了解它真正适用于哪些实际情况,即使用 Hyperloglog 是否有意义,为什么?如果您曾用于解决任何现实世界的问题,请分享。我正在寻找的是,考虑到 Hyperloglog 的标准错误,它今天真正在哪些实际应用程序中使用,为什么它会起作用?


("Applications for cardinality estimation", too broad? I would like to add this simply as a comment but it won't fit).

I would suggest you turn to the numerous academic research of the subject; usually academic papers contain some information of "prior research on the subject" as well as "applications for which the subject has been used". You could start with traversing the references of interest as referenced by the following article:

... This problem has received a great deal of attention over the past two decades, finding an ever growing number of applications in networking and traffic monitoring, such as the detection of worm propagation, of network attacks (e.g., by Denial of Service), and of link-based spam on the web [3]. For instance, a data stream over a network consists of a sequence of packets, each packet having a header, which contains a pair (source–destination) of addresses, followed by a body of specific data; the number of distinct header pairs (the cardinality of the multiset) in various time slices is an important indication for detecting attacks and monitoring traffic, as it records the number of distinct active flows. Indeed, worms and viruses typically propagate by opening a large number of different connections, and though they may well pass unnoticed amongst a huge traffic, their activity becomes exposed once cardinalities are measured (see the lucid exposition by Estan and Varghese in [11]). Other applications of cardinality estimators include data mining of massive data sets of sorts—natural language texts [4, 5], biological data [17, 18], very large structured databases, or the internet graph, where the authors of [22] report computational gains by a factor of 500+ attained by probabilistic cardinality estimators.

在我的工作中,HyperLogLog 用于估计在线服务中访问不同代码路径的唯一用户或唯一设备的数量。例如,每种类型的服务错误影响了多少用户?有多少用户使用每个功能?HyperLogLog 允许我们回答许多有趣的问题。

