3

我们有按年份划分的索引,例如:

items-2019
items-2020

考虑以下数据:

POST items-2019/_doc
{
  "@timestamp": "2019-01-01"
}

POST items-2020/_doc
{
  "@timestamp": "2020-01-01"
}


POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "items-*",
        "alias": "items"
      }
    }
  ]
}

现在,当我查询数据并显式排序结果时,它会跳过items-2020分片:

GET items/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": "2020-01-01"
      }
    }
  },
  "sort": {
    "@timestamp": "desc"
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 1,    <--- skipped
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "items-2019",
        "_type" : "_doc",
        "_id" : "BTdSb3UBRFH0Yqe1vm_W",
        "_score" : null,
        "_source" : {
          "@timestamp" : "2019-01-01"
        },
        "sort" : [
          1546300800000
        ]
      }
    ]
  }
}

但是,当我不明确对结果进行排序时,它不会跳过分片,但是 ES 会发出 MatchNoDocsQuery:

GET items/_search
{
  "profile": "true",
  "query": {
    "range": {
      "@timestamp": {
        "lt": "2020-01-01"
      }
    }
  }
}
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,    <--- nothing skipped
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "items-2019",
        "_type" : "_doc",
        "_id" : "BTdSb3UBRFH0Yqe1vm_W",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2019-01-01"
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[Axyv60mYQEGAREa2TwbgMQ][items-2019][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "ConstantScoreQuery",
                "description" : "ConstantScore(DocValuesFieldExistsQuery [field=@timestamp])",
                "time_in_nanos" : 69525,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 3766,
                  "match" : 0,
                  "next_doc_count" : 1,
                  "score_count" : 1,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 4123,
                  "advance_count" : 1,
                  "score" : 1123,
                  "build_scorer_count" : 2,
                  "create_weight" : 29745,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 30768
                },
                "children" : [
                  {
                    "type" : "DocValuesFieldExistsQuery",
                    "description" : "DocValuesFieldExistsQuery [field=@timestamp]",
                    "time_in_nanos" : 18317,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 1474,
                      "match" : 0,
                      "next_doc_count" : 1,
                      "score_count" : 0,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 1541,
                      "advance_count" : 1,
                      "score" : 0,
                      "build_scorer_count" : 2,
                      "create_weight" : 1184,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 14118
                    }
                  }
                ]
              }
            ],
            "rewrite_time" : 4660,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 22374
              }
            ]
          }
        ],
        "aggregations" : [ ]
      },
      {
        "id" : "[Axyv60mYQEGAREa2TwbgMQ][items-2020][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "MatchNoDocsQuery",
                "description" : """MatchNoDocsQuery("User requested "match_none" query.")""", <-- here
                "time_in_nanos" : 4166,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 1,
                  "create_weight" : 1791,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 2375
                }
              }
            ],
            "rewrite_time" : 4353,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 12887
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

所以这里有几个问题:

  • 跳过真的跳过分片吗?
  • 跳过的分片和 MatchNoDocsQuery 有何不同?
  • MatchNoDocsQuery 的成本是多少?
  • 排序如何允许跳过分片?
  • 如果我们对结果进行排序,我们真的会完全跳过分片,甚至在搜索过程中不碰它们吗?
4

1 回答 1

2

这是一个捆绑在一起的大量问题,但这是我的尝试:

跳过真的跳过分片吗?

排序如何允许跳过分片?

如果我们对结果进行排序,我们真的会完全跳过分片,甚至在搜索过程中不碰它们吗?

是的,ES 试图足够聪明地在实际将查询发送到这些分片之前确定要命中哪些分片。_search_shardsAPI在这里有所帮助,但不仅仅是从本期的解释中可以看出。

如果您搜索关键字的问题can_match,您会发现许多其他优化都在各处实施,旨在使 ES 执行计划更智能、更快。skipshard

如果你想看看这是如何编码的,你可以从SearchService.canMatch()方法开始。这就是服务可以决定是否可以将查询重写为MatchNoDocsQuery. 如果你添加一个suggestorglobal聚合(无论如何它必须访问所有文档),你会看到分片不再被跳过,即使是sort现在。

MatchNoDocsQuery 的成本是多少?

我不会担心它,因为它不仅可以忽略不计,而且不在你的手中。

排序如何允许跳过分片?

正如我在上面链接的问题 #51852 中所解释的,This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard.换句话说,ES 足够聪明,可以根据排序值知道哪些将包含有效命中或不包含有效命中。在您的情况下,由于时间戳上的排序排除了 2020 年的所有值,因此 ES 知道可以排除 2020 年索引中的分片,因为没有一个会匹配。

另一种可能性是利用索引排序,以便在索引时对术语进行排序。术语在索引的每个段中进行排序,但每次合并段时,都需要再次使用新合并的术语集,因此这可能会对性能产生影响。使用前测试!

于 2020-11-05T08:41:45.510 回答