Let's assume these basic documents as an example:
{
"name": "pants",
"description": "with stripes",
"items": [
{
"color": "red",
"size": "44"
},
{
"color": "blue",
"size": "38"
}
]
}
{
"name": "shirt",
"description": "with stripes",
"items": [
{
"color": "green",
"size": "40"
}
]
}
{
"name": "pants",
"description": "with dots",
"items": [
{
"color": "green",
"size": "38"
},
{
"color": "blue",
"size": "38"
}
]
}
I need to find the first document with a search term like pants stripes blue 38
. All terms should be connected with AND as I'm not interested in pants with dots or other size and color combinations.
My mapping looks like this:
{
"settings": {
"index.queries.cache.enabled": true,
"index.number_of_shards": 1,
"index.number_of_replicas": 2,
"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
},
"synonym": {
"type": "synonym_graph",
"synonyms_path": "dictionaries/de/synonyms.txt",
"updateable" : true
}
},
"analyzer": {
"index_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"german_stop",
"german_normalization",
"german_stemmer"
]
},
"search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym",
"german_stop",
"german_normalization",
"german_stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"description": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"items": {
"type": "nested",
"properties": {
"color": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"size": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
}
}
}
Please ignore the fact that I'm using german stop words and such. I kept the example files above in english so that everyone can understand it but didn't adjust the mapping as the original example is in german.
So ideally what I want my query to look like is this:
{
"query": {
"nested": {
"path": "items",
"query": {
"multi_match": {
"query": "pants stripes blue 38",
"fields": [
"name",
"description",
"items.color",
"items.size"
],
"type": "cross_fields",
"operator": "and",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": "false"
}
}
}
}
}
And the Search Profiler from Kibana show that the query will be executed like this:
ToParentBlockJoinQuery (
+(
+(items.color:pant | items.size:pant | name:pant | description:pant)
+(items.color:strip | items.size:strip | name:strip | description:strip)
+(items.color:blu | items.size:blu | name:blu | description:blu)
+(items.color:38 | items.size:38 | name:38 | description:38)
) #_type:__items)
Which looks to be exactly what I need in terms of AND and OR logic. Search through every attribute with every term and connect those results with AND. So every search term needs to be in one of the fields but it doesn't matter in which it was found.
But this query seems to only search inside the nested documents. In fact it seems like each query can only search through nested objects or the root document. Not both at the same time. If I remove the nested part the Search Profiler shows the difference:
{
"query": {
"multi_match": {
"query": "pants stripes blue 38",
"fields": [
"name",
"description",
"items.color",
"items.size"
],
"type": "cross_fields",
"operator": "and",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": "false"
}
}
}
Results in:
+(
+(items.color:pant | items.size:pant | name:pant | description:pant)
+(items.color:strip | items.size:strip | name:strip | description:strip)
+(items.color:blu | items.size:blu | name:blu | description:blu)
+(items.color:38 | items.size:38 | name:38 | description:38)
) #DocValuesFieldExistsQuery [field=_primary_term]
Both queries return zero results.
So my question is if there is a way to make the above query work and to be able to truly search across all defined fields (nested and root doc) within a multi match query on a term by term basis.
I would like to avoid doing any preprocessing to the search terms in order to split them up based on them being in a nested or root document as that has it's own set of challenges. But I do know that this is a solution to my problem.
Edit The original files have a lot more attributes. The root document might have up to 250 fields and each nested document might add another 20-30 fields to it. Because the search terms need to search through lot of the fields (possibly not all) any sort of concatenation of nested and root document attributes to make them "searchable" seems unpractical.
A flattened index might be a practical solution. By that I mean copying all root documents fields to the nested document and only indexing the nested docs. But in this question I would like to know if it also works with nested objects without modifying the original structure.