elasticsearch - Elasticsearch 更深层次的父子关系（孙子）

Question

我需要索引 3 个级别（或更多）的子父级。例如，级别可能是作者、一本书和该书中的人物。

但是，当索引超过两级时，has_child 和 has_parent 查询和过滤器会出现问题。如果我有 5 个分片，则在最低级别（字符）上运行“has_parent”查询或在第二级（书籍）上运行 has_child 查询时，我会得到大约五分之一的结果。

我的猜测是，一本书通过它的父 id 被索引到一个分片，因此将与他的父（作者）一起存在，但是一个字符被索引到一个基于书 id 的哈希的分片，这不一定符合这本书被索引的实际碎片。

因此，这意味着同一作者的书籍的所有角色不一定都位于同一个碎片中（这确实削弱了整个孩子-父母的优势）。

难道我做错了什么？我该如何解决这个问题，因为我确实需要复杂的查询，例如“作者写了哪些女性角色的书”。

我发疯了一个显示问题的要点，在： https ://gist.github.com/eranid/5299628

底线是，如果我有一个映射：

"author" : {          
      "properties" : {
    "name" : {
      "type" : "string"
    }
      }
    },
"book" : {        
      "_parent" : {
    "type" : "author"
      },
      "properties" : {
    "title" : {
      "type" : "string"
    }
      }
    },

"character" : {       
      "_parent" : {
    "type" : "book"
      },
      "properties" : {
    "name" : {
      "type" : "string"
    }
      }
    }

和 5 个分片索引，我无法使用“has_child”和“has_parent”进行查询

查询：

curl -XPOST 'http://localhost:9200/index1/character/_search?pretty=true' -d '{
  "query": {
    "bool": {
      "must": [
        {
          "has_parent": {
            "parent_type": "book",
            "query": {
              "match_all": {}
            }
          }
        }
      ]
    }
  }
}'

只返回五分之一（大约）的字符。

score 26 · Accepted Answer

您是对的，只有当给定父级的所有子级都与父级位于同一分片中时，父/子关系才能起作用。Elasticsearch 通过使用 parent id 作为路由值来实现这一点。它在一个层面上效果很好。但是，它在第二个和连续级别上中断。当您有父/子/孙关系时，父母会根据他们的 id 进行路由，孩子会根据父母 id 进行路由（有效），但是孙子会根据子 id 进行路由，最终会进入错误的分片。为了在示例中进行演示，假设我们正在索引 3 个文档：

curl -XPUT localhost:9200/test-idx/author/Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/book/Mostly-Harmless?parent=Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/character/Arthur-Dent?parent=Mostly-Harmless -d '{...}'

Elasticsearch 使用 valueDouglas-Adams来计算文档的路由Douglas-Adams——这并不奇怪。对于文档Mostly-Harmless，Elasticsearch 看到它有 parent Douglas-Adams，所以它再次使用Douglas-Adams来计算路由，一切都很好——相同的路由值意味着相同的分片。但是对于文档Arthur-Dent，Elasticsearch 认为它有 parent Mostly-Harmless，所以它使用 valueMostly-Harmless作为路由，结果文档Arthur-Dent最终进入了错误的分片。

解决方案是明确指定孙辈的路由值等于祖父母的 id：

curl -XPUT localhost:9200/test-idx/author/Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/book/Mostly-Harmless?parent=Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/character/Arthur-Dent?parent=Mostly-Harmless&routing=Douglas-Adams -d '{...}'

score 0 · Accepted Answer

对于祖父文档，您需要获取 _id 作为 _routing。对于父亲文档，只需使用 _parent (grandpa._id) 作为 _routing。对于子文档，只需使用 grandpa._id 作为 _routing。

elasticsearch - Elasticsearch 更深层次的父子关系（孙子）

2 回答 2

Related

Reference