elasticsearch - 复合 Elasticsearch 过滤器中的空 inner_hits

Question

我在嵌套布尔查询中看到了 inner_hits 结果中的异常行为。

测试数据（为简洁起见）：

# MAPPING
PUT unit_testing
{
    "mappings": {
        "document": {
            "properties": {
                "display_name": {"type": "text"},
                "metadata": {
                    "properties": {
                        "NAME": {"type": "text"}
                    }
                }
            }
        },
        "paragraph": {
            "_parent": {"type": "document"},
            "_routing": {"required": true},
            "properties": {
                "checksum": {"type": "text"},
                "sentences": {
                    "type": "nested",
                    "properties": {
                        "text": {"type": "text"}
                    }
                }
            }
        }
    }
}

# DOCUMENT X 2 (d0, d1)
PUT unit_testing/document/doc_id_d0
{
    "display_name": "Test Document d0",
    "paragraphs": [
        "para_id_d0p0",
        "para_id_d0p1"
    ],
    "metadata": {"NAME": "Test Document d0 Metadata"}
}

# PARAGRAPH X 2 (d0p0, d1p0)
PUT unit_testing/paragraph/para_id_d0p0?parent=doc_id_d0
{
    "checksum": "para_checksum_d0p0",
    "sentences": [
        {"text": "Test sentence d0p0s0"},
        {"text": "Test sentence d0p0s1 ODD"},
        {"text": "Test sentence d0p0s2 EVEN"},
        {"text": "Test sentence d0p0s3 ODD"},
        {"text": "Test sentence d0p0s4 EVEN"}
    ]
}

这个初始查询的行为与我预期的一样（我知道在这个示例中元数据过滤器实际上并不是必需的）：

GET unit_testing/paragraph/_search
{
    "_source": "false", 
    "query": {
        "bool": {
            "must": [
                {
                    "has_parent": {
                        "query": {
                            "match_phrase": {
                                "metadata.NAME": "Test Document d0 Metadata"
                            }
                        }, 
                        "type": "document"
                    }
                }, 
                {
                    "nested": {
                        "inner_hits": {}, 
                        "path": "sentences", 
                        "query": {
                            "match": {
                                "sentences.text": "d0p0s0"
                            }
                        }
                    }
                }
            ]
        }
    }
}

它产生一个 inner_hits 对象，其中包含与谓词匹配的一个句子（为清楚起见，删除了一些字段）：

{
  "hits": {
    "hits": [
      {
        "_source": {},
        "inner_hits": {
          "sentences": {
            "hits": {
              "hits": [
                {
                  "_source": {
                    "text": "Test sentence d0p0s0"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

以下查询尝试将上述查询嵌入父“应该”子句中，以在初始查询和匹配单个句子的附加查询之间创建逻辑 OR：

GET unit_testing/paragraph/_search
{
    "_source": "false", 
    "query": {
        "bool": {
            "should": [
                {
                    "bool": {
                        "must": [
                            {
                                "has_parent": {
                                    "query": {
                                        "match_phrase": {
                                            "metadata.NAME": "Test Document d0 Metadata"
                                        }
                                    }, 
                                    "type": "document"
                                }
                            }, 
                            {
                                "nested": {
                                    "inner_hits": {}, 
                                    "path": "sentences", 
                                    "query": {
                                        "match": {
                                            "sentences.text": "d0p0s0"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }, 
                {
                    "nested": {
                        "inner_hits": {}, 
                        "path": "sentences", 
                        "query": {
                            "match": {
                                "sentences.text": "d1p0s0"
                            }
                        }
                    }
                }
            ]
        }
    }
}

虽然“d1”查询输出了人们期望的结果，inner_hits 对象包含匹配的句子，但原来的“d0”查询现在产生一个空的 inner_hits 对象：

{
  "hits": {
    "hits": [
      {
        "_source": {},
        "inner_hits": {
          "sentences": {
            "hits": {
              "total": 0,
              "hits": []
            }
          }
        }
      },
      {
        "_source": {},
        "inner_hits": {
          "sentences": {
            "hits": {
              "hits": [
                {
                  "_source": {
                    "text": "Test sentence d1p0s0"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

尽管我使用 elasticsearch_dsl Python 库来构建和组合这些查询，而且我对 Query DSL 还是个新手，但查询格式对我来说看起来很可靠。

我错过了什么？

score 1 · Accepted Answer

我认为缺少的是name参数 for inner_hits- 您inner_hits在两个不同的查询中有两个子句，它们最终会具有相同的名称。尝试给inner_hits一个name参数（0）。

0 - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html#_options

elasticsearch - 复合 Elasticsearch 过滤器中的空 inner_hits

1 回答 1

Related

Reference