我有一个包含以下列的表格:
第 1 列 | 第 2 列 | 时间戳 | event_id |
---|---|---|---|
c1v1 | c2v1 | 2021-03-11 00:00:00 | 1 |
c1v2 | c2v2 | 2021-03-11 01:03:00 | 1 |
c1v3 | c2v3 | 2021-03-12 10:00:00 | 2 |
c1v4 | c2v4 | 2021-03-13 20:00:00 | 1 |
c1v5 | c2v5 | 2021-03-13 11:00:00 | 2 |
c1v6 | c2v6 | 2021-03-13 00:00:00 | 3 |
c1v7 | c2v7 | 2021-03-14 00:00:00 | 2 |
我有start_time = 2021-03-10 05:14:00
和end_time = 2021-03-15 15:12:19
我正在对这些数据进行弹性搜索查询
- 从 start_time 到 end_time 分区 1 天
- 计算每个分区中的文档数(具有 0 个文档的分区也因为扩展边界)
- 对于每个分区,在 event_id 列中查找唯一值的数量
{
"query": {
"bool": {
"filter":
[
{
"term": {"column1": "some_value"}
},
{
"term": {"column2": "some_value"}
},
{
"range": {
"timestamp": {
"gte": "<start_time>",
"lt": "<end_time>"
}
}
}
]
}
},
"aggs": {
"timestamp": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1d",
"extended_bounds": {
"min": "<start_time>",
"max": "<end_time>"
}
},
"aggs": {
"unique_values": {
"cardinality": {
"field": "event_id"
}
}
}
}
}
}
我需要帮助来创建一个相同的 sql 查询。
输出:
时间戳 | doc_count | 唯一值 |
---|---|---|
2021-03-10 | 0 | 0 |
2021-03-11 | 2 | 1 |
2021-03-12 | 1 | 1 |
2021-03-13 | 3 | 3 |
2021-03-14 | 1 | 1 |
2021-03-15 | 0 | 0 |
更新:我提出了这个查询,但我得到的值接近 es 中的值,但不准确。这也不会返回包含 0 个文档的日期。
SELECT
date_floor,
count(date_floor) AS cnt_date_floor,
count(DISTINCT(event_id)) AS cnt_dst_event_id
FROM (
SELECT
event_id,
DATE(timestamp) AS date_floor
FROM
<table_name>
WHERE
date BETWEEN date'<start_date>' AND date'<end_date>' AND
timestamp >= timestamp'<start_time>' AND
timestamp < timestamp'<end_time>' AND
column1 IN ('some val') AND
column2 = some_val)
GROUP BY date_floor
其中 start_date 和 end_date 是 start_time 和 end_time 的 floor_dates