filter - Elasticsearch 范围过滤倒排索引

Question

拥有百亿份文件。文档的一个字段是时间戳（毫秒），索引时使用以下映射。

  timestamp:
    type: "date"
    format: "YYYY-MM-dd HH:mm:ss||YYYY-MM-dd HH:mm:ss.SSS"
    ignore_malformed: true
    doc_values: true

搜索时，使用范围过滤器。由于使用了 doc_value，范围过滤器内部使用反转索引进行搜索。这是一种缓慢。

The execution option controls how the range filter internally executes. 
The execution option accepts the following values:
index: Uses the field’s inverted index in order to determine whether documents fall within the specified range.

如果我以另一种方式更改映射，即使用天而不是小时/秒/毫秒。

  day:
    type: "date"
    format: "YYYY-MM-dd"
    ignore_malformed: true
    doc_values: true

搜索时，使用范围过滤器，它更快。

有人可以帮助解释为什么性能不同。

第一个（使用秒/毫秒），反转索引（在内部假设它是一种哈希表）有大量的键。而第二个（仅使用天数），反转索引的键少得多。是这个原因吗？

score 1 · Accepted Answer

你的假设是正确的。当日期的时间部分没有被索引时，唯一值的数量会更少。在进行范围查询时，Elasticsearch 必须“循环”较少数量的发布列表，从而观察到性能改进。

filter - Elasticsearch 范围过滤倒排索引

1 回答 1

Related

Reference