当然,没有从 SQL 到 Elasticsearch DSL 的直接路径,但有一些非常常见的相关性。
对于初学者来说,任何GROUP BY
/HAVING
都将归结为聚合。普通查询语义通常可以被 Query DSL 覆盖(甚至更多)。
如何在聚合之前准备一个子数据(如本例中每个水果范围内的最新行)
所以,你有点要求两种不同的东西。
如何在聚合之前准备一个子数据
这是查询阶段。
(就像在这个例子中,每个水果范围内的最新行)
从技术上讲,您要求它聚合以获得此示例的答案:不是普通查询。在您的示例中,您正在执行MAX
此操作,这实际上是使用 GROUP BY 来获取它。
如何按多个字段对结果进行分组
这取决于。您希望它们分层(通常是)还是希望它们在一起。
如果您希望它们分层,那么您只需使用子聚合来获得您想要的。如果您希望将它们组合在一起,那么您通常只需filters
对不同的分组使用聚合。
将它们重新组合在一起:在给定特定过滤日期范围的情况下,您希望每个水果最近一次购买。日期范围只是普通的查询/过滤器:
{
"query": {
"bool": {
"filter": [
{
"range": {
"BoughtDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
},
{
"range": {
"BestBeforeDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
}
]
}
}
}
这样,请求中不会包含不在两个字段的日期范围内的文档(实际上是AND
)。因为我使用了过滤器,所以它是不计分且可缓存的。
现在,您需要开始聚合以获取其余信息。让我们首先假设文档已使用上述过滤器进行过滤,以简化我们正在查看的内容。我们将在最后结合它。
{
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "BoughtDate",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"group_by_store": {
"terms": {
"field": "BoughtInStore"
},
"aggs": {
"group_by_person": {
"terms": {
"field": "BiteBy"
}
}
}
}
}
}
}
}
你想要"size" : 0
在顶级,因为你实际上并不关心命中。您只需要汇总结果。
您的第一个聚合实际上是按最近日期分组的。我对其进行了一些更改以使其更加逼真(每天),但实际上是相同的。您使用 的方式MAX
,我们可以使用与 的terms
聚合"size": 1
,但这更符合您在涉及日期(可能是时间!)时想要的方式。我还要求它忽略匹配文档中没有数据的日子(因为它从头到尾都是如此,我们实际上并不关心那些日子)。
如果您真的只想要最后一天,那么您可以使用管道聚合来删除除最大存储桶之外的所有内容,但此类请求的实际使用需要完整的日期范围。
因此,我们然后继续按商店分组,这就是您想要的。BiteBy
然后,我们按人 ( )进行分组。这会给你隐含的计数。
将它们重新组合在一起:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"BoughtDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
},
{
"range": {
"BestBeforeDate": {
"gte": "2016-01-01",
"lte": "2016-01-31"
}
}
}
]
}
},
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "BoughtDate",
"interval": "day",
"min_doc_count": 1
},
"aggs": {
"group_by_store": {
"terms": {
"field": "BoughtInStore"
},
"aggs": {
"group_by_person": {
"terms": {
"field": "BiteBy"
}
}
}
}
}
}
}
}
注意:这是我索引数据的方式。
PUT /grocery/store/_bulk
{"index":{"_id":"1"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-01","BestBeforeDate":"2016-01-02","BiteBy":"John"}
{"index":{"_id":"2"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-02","BestBeforeDate":"2016-01-04","BiteBy":"Mat"}
{"index":{"_id":"3"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-03","BestBeforeDate":"2016-01-05","BiteBy":"Mark"}
{"index":{"_id":"4"}}
{"Fruit":"Banana","BoughtInStore":"Jungle","BoughtDate":"2016-01-04","BestBeforeDate":"2016-02-01","BiteBy":"Simon"}
{"index":{"_id":"5"}}
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-12","BestBeforeDate":"2016-01-12","BiteBy":"John"}
{"index":{"_id":"6"}}
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-14","BestBeforeDate":"2016-01-16","BiteBy":"Mark"}
{"index":{"_id":"7"}}
{"Fruit":"Orange","BoughtInStore":"Jungle","BoughtDate":"2016-01-20","BestBeforeDate":"2016-01-21","BiteBy":"Simon"}
{"index":{"_id":"8"}}
{"Fruit":"Kiwi","BoughtInStore":"Shop","BoughtDate":"2016-01-21","BestBeforeDate":"2016-01-21","BiteBy":"Mark"}
{"index":{"_id":"9"}}
{"Fruit":"Kiwi","BoughtInStore":"Jungle","BoughtDate":"2016-01-21","BestBeforeDate":"2016-01-21","BiteBy":"Simon"}
您要在(商店和人员)上聚合的字符串值是s(在 ES 5.0 中) ,这一点至关重要!否则它将使用所谓的 fielddata,这不是一件好事。not_analyzed
string
keyword
映射在 ES 1.x / ES 2.x 中看起来像这样:
PUT /grocery
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"store": {
"properties": {
"Fruit": {
"type": "string",
"index": "not_analyzed"
},
"BoughtInStore": {
"type": "string",
"index": "not_analyzed"
},
"BiteBy": {
"type": "string",
"index": "not_analyzed"
},
"BestBeforeDate": {
"type": "date"
},
"BoughtDate": {
"type": "date"
}
}
}
}
}
所有这些加在一起,你得到的答案是:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_date": {
"buckets": [
{
"key_as_string": "2016-01-01T00:00:00.000Z",
"key": 1451606400000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-02T00:00:00.000Z",
"key": 1451692800000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mat",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-03T00:00:00.000Z",
"key": 1451779200000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mark",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-12T00:00:00.000Z",
"key": 1452556800000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-14T00:00:00.000Z",
"key": 1452729600000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mark",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-20T00:00:00.000Z",
"key": 1453248000000,
"doc_count": 1,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Simon",
"doc_count": 1
}
]
}
}
]
}
},
{
"key_as_string": "2016-01-21T00:00:00.000Z",
"key": 1453334400000,
"doc_count": 2,
"group_by_store": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jungle",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Simon",
"doc_count": 1
}
]
}
},
{
"key": "Shop",
"doc_count": 1,
"group_by_person": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Mark",
"doc_count": 1
}
]
}
}
]
}
}
]
}
}
}