好吧,在这里。
我仍然不知道 ElasticSearch 是否允许像我在原始问题中描述的那样制定聚合。
我为解决这个问题所做的就是采取不同的方法。我会把它贴在这里以防万一它对其他人有帮助。
所以,
POST hostname:9200/index/type/_search
和
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"group": {
"terms": {
"field": "type"
},
"aggs": {
"histogramAgg": {
"histogram": {
"field": "value",
"interval": 10,
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": true,
"min_doc_count": 0
},
"aggs": {
"statsAgg": {
"stats": {
"field": "value"
}
}
}
},
"extStatsAgg": {
"extended_stats": {
"field": "value",
"sigma": 2
}
}
}
}
}
}
会产生这样的结果
{
"took": 100,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 100000,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"group": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": "A",
"doc_count": 10000,
"histogramAgg": {
"buckets": {
"0.0": {
"key": 0.0,
"doc_count": 1234,
"statsAgg": {
"count": 1234,
"min": 0.0,
"max": 9.0,
"avg": 0.004974220783280196,
"sum": 7559.0
}
},
"10.0": {
"key": 10.0,
"doc_count": 4567,
"statsAgg": {
"count": 4567,
"min": 10.0,
"max": 19.0,
"avg": 15.544345993923,
"sum": 331846.0
}
},
[...]
}
},
"extStatsAgg": {
"count": 10000,
"min": 0.0,
"max": 104.0,
"avg": 16.855123857,
"sum": 399079395E10,
"sum_of_squares": 3.734838645273888E15,
"variance": 1.2690056384124432E9,
"std_deviation": 35.10540102369,
"std_deviation_bounds": {
"upper": 87.06592590438,
"lower": -54.35567819038
}
}
},
[...]
]
}
}
}
如果您注意类型:“A”的组聚合结果,您会注意到我们现在有了直方图每个子组的平均值和计数。您也会注意到extStatsAgg聚合(直方图聚合的兄弟)的结果显示了每个桶组的std_deviation_bounds(对于类型:“A”,类型:“B”,...)
您可能已经注意到,这并没有提供我正在寻找的解决方案。我需要对我的代码进行一些计算。伪代码中的示例
for bucket in buckets_groupAggregation
Long totalCount = 0
Double accumWeightedAverage = 0.0
ExtendedStats extendedStats = bucket.extendedStatsAggregation
Double upperLimit = extendedStats.std_deviation_bounds.upper
Double lowerLimit = extendedStats.std_deviation_bounds.lower
Histogram histogram = bucket.histogramAggregation
for group in histogram
Stats stats = group.statsAggregation
if group.key > lowerLimit & group.key < upperLimit
totalCount += group.count
accumWeightedAverage += group.count * stats.average
Double average = accumWeightedAverage / totalCount
注意:直方图区间的大小将决定最终平均值的“准确性”。更精细的间隔将在增加聚合时间的同时获得更准确的结果。
我希望它可以帮助别人