columnstore - 是否可以在 clickhouse 中通过插入查询直接存储 HyperLogLog / uniqState() 状态？

Question

我们可以使用 AggregatedMergeTree 表引擎，它可用于聚合行。

通常在聚合数据中，我们对存储所有唯一标识符不感兴趣，并且仍然希望进行不同的计数。我们仍然希望能够进行另一个聚合以在之后获得这些行的唯一计数（通过选择查询中的分组行）。这就是 HyperLogLog 派上用场的地方，它被实现为 clickhouse 中的 uniqState 函数。

我想通过插入查询直接存储一个超级日志，并从我的客户端应用程序将其提供给 clickhouse 表。这可能吗？

score 3 · Accepted Answer

因此，我仅使用 clickhouse 查询就实现了这一壮举。它工作得很好！

CREATE TABLE demo_db.aggregates
(
    name String,
    date Date,
    ids AggregateFunction(uniq, UInt8)
) ENGINE = MergeTree(date, date, 8192)

//So here the declaration of a set of ids in the insert query will lead to a binary hash tree being stored    
INSERT INTO aggregates SELECT
    'Demo',
    toDate('2016-12-03'),
    uniqState(arrayJoin([1, 5, 6, 7])) 

SELECT
    name,
    date,
    uniqMerge(ids) //our hashtree can be grouped and give us unique count over the grouped rows
FROM aggregates
GROUP BY name, date

columnstore - 是否可以在 clickhouse 中通过插入查询直接存储 HyperLogLog / uniqState() 状态？

1 回答 1

Related

Reference