0

在我的 Crate.io 数据库中,我有一个表,目前有 50 mio 行,大小为 16GB。如果我尝试使用以下语句获取每天的条目数量,则一切正常(性能除外,但这目前不应该是问题):

SELECT
date_format('%Y-%m-%d', date_trunc('day', "days")) AS "Day",
count(*) AS "Count"
FROM "doc"."mytable" 
WHERE
    date_format('%Y-%m-%d', date_trunc('day', "days")) BETWEEN date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -2592000000))
    AND date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -86400000)) 
GROUP BY date_format('%Y-%m-%d', date_trunc('day', "days")) 
ORDER BY date_format('%Y-%m-%d', date_trunc('day', "days")) ASC limit 100;

但是,如果我尝试在另一列这样区分:

SELECT
date_format('%Y-%m-%d', date_trunc('day', "days")) AS "Day",
count(DISTINCT customerid) AS "Count"
FROM "doc"."mytable" 
WHERE
    date_format('%Y-%m-%d', date_trunc('day', "days")) BETWEEN date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -2592000000))
    AND date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -86400000)) 
GROUP BY date_format('%Y-%m-%d', date_trunc('day', "days")) 
ORDER BY date_format('%Y-%m-%d', date_trunc('day', "days")) ASC limit 100;

该语句将失败

SQLActionException[CircuitBreakingException: [query] 数据太大,[collect: 0] 的数据将大于 [1267571097/1.1gb] 的限制]

有谁知道为什么 COUNT(DISTINCT col) 有太多数据的问题,但 COUNT(*) 没有?我该如何解决这个问题?

4

1 回答 1

0

A count is a very lightweight operation because just one numeric (long) is needed which will be incremented for each row. Whereas for a distinct count all values must be saved in memory to be able to decide if a value already exist (no increment) or is a new one (increment counter).

To get around the CircuitBreakingException (which btw. saves you from a stucked node, otherwise an OutOfMemory would be thrown and your node would be unusable), increase the HEAP for the crate process. How to set the HEAP size varies depending on the used distribution, normally a CRATE_HEAP_SIZE environment variable can be used.

Increasing the HEAP could also give you better performance for your 1st group by query. A good rule is to give crate 50% of the available memory, so the other 50% can be used by the OS file system cache (which crate also benefits from).

于 2017-05-11T06:31:15.433 回答