elasticsearch - Search through all document fields (nested and root document) in one multi match query

Question

Let's assume these basic documents as an example:

{
  "name": "pants",
  "description": "with stripes",
  "items": [
    {
      "color": "red",
      "size": "44"
    },
    {
      "color": "blue",
      "size": "38"
    }
  ]
}

{
  "name": "shirt",
  "description": "with stripes",
  "items": [
    {
      "color": "green",
      "size": "40"
    }
  ]
}

{
  "name": "pants",
  "description": "with dots",
  "items": [
    {
      "color": "green",
      "size": "38"
    },
    {
      "color": "blue",
      "size": "38"
    }
  ]
}

I need to find the first document with a search term like pants stripes blue 38. All terms should be connected with AND as I'm not interested in pants with dots or other size and color combinations.

My mapping looks like this:

{
  "settings": {
    "index.queries.cache.enabled": true,
    "index.number_of_shards": 1,
    "index.number_of_replicas": 2,
    "analysis": {
      "filter": {
        "german_stop": {
          "type": "stop",
          "stopwords": "_german_"
        },
        "german_stemmer": {
          "type": "stemmer",
          "language": "light_german"
        },
        "synonym": {
          "type": "synonym_graph",
          "synonyms_path": "dictionaries/de/synonyms.txt",
          "updateable" : true
        }
      },
      "analyzer": {
        "index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "german_stop",
            "german_normalization",
            "german_stemmer"
          ]
        },
        "search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "synonym",
            "german_stop",
            "german_normalization",
            "german_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "index_analyzer",
        "search_analyzer": "search_analyzer"
      },
      "description": {
        "type": "text",
        "analyzer": "index_analyzer",
        "search_analyzer": "search_analyzer"
      },
      "items": {
        "type": "nested",
        "properties": {
          "color": {
            "type": "text",
            "analyzer": "index_analyzer",
            "search_analyzer": "search_analyzer"
          },
          "size": {
            "type": "text",
            "analyzer": "index_analyzer",
            "search_analyzer": "search_analyzer"
          }
        }
      }
    }
  }
}

Please ignore the fact that I'm using german stop words and such. I kept the example files above in english so that everyone can understand it but didn't adjust the mapping as the original example is in german.

So ideally what I want my query to look like is this:

{
  "query": {
    "nested": {
      "path": "items",
      "query": {
        "multi_match": {
          "query": "pants stripes blue 38",
          "fields": [
            "name",
            "description", 
            "items.color",
            "items.size"
          ],
          "type": "cross_fields",
          "operator": "and", 
          "auto_generate_synonyms_phrase_query": "false",
          "fuzzy_transpositions": "false"
        }
      }
    }
  }
}

And the Search Profiler from Kibana show that the query will be executed like this:

ToParentBlockJoinQuery (
+(
    +(items.color:pant | items.size:pant | name:pant | description:pant)
    +(items.color:strip | items.size:strip | name:strip | description:strip)
    +(items.color:blu | items.size:blu | name:blu | description:blu)
    +(items.color:38 | items.size:38 | name:38 | description:38)
) #_type:__items)

Which looks to be exactly what I need in terms of AND and OR logic. Search through every attribute with every term and connect those results with AND. So every search term needs to be in one of the fields but it doesn't matter in which it was found.

But this query seems to only search inside the nested documents. In fact it seems like each query can only search through nested objects or the root document. Not both at the same time. If I remove the nested part the Search Profiler shows the difference:

{
  "query": {
    "multi_match": {
      "query": "pants stripes blue 38",
      "fields": [
        "name",
        "description",
        "items.color",
        "items.size"
      ],
      "type": "cross_fields",
      "operator": "and",
      "auto_generate_synonyms_phrase_query": "false",
      "fuzzy_transpositions": "false"
    }
  }
}

Results in:

+(
    +(items.color:pant | items.size:pant | name:pant | description:pant)
    +(items.color:strip | items.size:strip | name:strip | description:strip)
    +(items.color:blu | items.size:blu | name:blu | description:blu)
    +(items.color:38 | items.size:38 | name:38 | description:38)
) #DocValuesFieldExistsQuery [field=_primary_term]

Both queries return zero results.

So my question is if there is a way to make the above query work and to be able to truly search across all defined fields (nested and root doc) within a multi match query on a term by term basis.

I would like to avoid doing any preprocessing to the search terms in order to split them up based on them being in a nested or root document as that has it's own set of challenges. But I do know that this is a solution to my problem.

Edit The original files have a lot more attributes. The root document might have up to 250 fields and each nested document might add another 20-30 fields to it. Because the search terms need to search through lot of the fields (possibly not all) any sort of concatenation of nested and root document attributes to make them "searchable" seems unpractical.

A flattened index might be a practical solution. By that I mean copying all root documents fields to the nested document and only indexing the nested docs. But in this question I would like to know if it also works with nested objects without modifying the original structure.

score 0 · Accepted Answer

您对展平的直觉是正确的，但您不需要将根属性复制到嵌套字段中。您可以做相反的事情 - 通过include_in_root映射参数。

当您像这样更新映射时：

PUT inventory
{
  "settings": {
      ... 
    }
  },
  "mappings": {
    "properties": {
      ...
      "items": {
        "type": "nested",
        "include_in_root": true,     <---
        "properties": {
          ...
        }
      }
    }
  }
}

然后索引一些示例文档（至少其中一个包括，pants因为您的原始问题不包含任何内容）：

POST inventory/_doc
{"name":"shirt","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"38"}]}

POST inventory/_doc
{"name":"shirt","description":"with stripes","items":[{"color":"green","size":"40"}]}

POST inventory/_doc
{"name":"shirt","description":"with dots","items":[{"color":"green","size":"38"},{"color":"blue","size":"38"}]}

// this one *should* match
POST inventory/_doc
{"name":"pants","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"39"}]}

POST inventory/_doc
{"name":"pants","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"38"}]}

然后，您可以使用第二个查询并保持嵌套字段路径不变，因为它们现在在根目录中可用，尽管在相同的点路径下有些混乱：

POST inventory/_search
{
  "query": {
    "multi_match": {
      "query": "pants stripes blue 38",
      "fields": [
        "name",
        "description",
        "items.color",
        "items.size"
      ],
      "type": "cross_fields",
      "operator": "AND",
      "auto_generate_synonyms_phrase_query": "false",
      "fuzzy_transpositions": "false"
    }
  }
}

并且只会返回一个完全匹配的文档：

{
  "name":"pants",
  "description":"with stripes",
  "items":[
    {
      "color":"red",
      "size":"44"
    },
    {
      "color":"blue",
      "size":"38"
    }
  ]
}

elasticsearch - Search through all document fields (nested and root document) in one multi match query

1 回答 1

Related

Reference