elasticsearch - Applying a dynamic template to multiple types - for managing tokens for sorting

Question

We are having some difficulty on figuring out how to best manage our tokenized and untokenized fields for both searching and sorting. Our goals are pretty straightforward:

Support Partial word searches
Support Sorting on all all fields
Our mapping must be dynamic, customers add new fields at runtime.

We're able to accomplish this using a dynamic template. We Store Strings using the default tokenizer, a custom, ngram tokenizer, and an unanalyzed tokenizer. The mapping:

curl -XPUT 'http://testServer:9200/test/' -d '{
        "settings": {
            "analysis": {
                "analyzer": {
                    "my_ngram_analyzer": {
                        "tokenizer": "my_ngram_tokenizer",
                        "filter": [
                            "lowercase"
                        ],
                        "type" : "custom"
                    },
                    "default_search": {
                        "tokenizer" : "keyword",
                        "filter" : [
                            "lowercase"
                        ]
                    }
                },
                "tokenizer": {
                    "my_ngram_tokenizer": {
                        "type": "nGram",
                        "min_gram": "3",
                        "max_gram": "100",
                        "token_chars": []
                    }
                }
            }
        },
        "mappings": {
            "TestObject": {
                "dynamic_templates": [
                    {
                        "metadata_template": {
                            "match_mapping_type": "string",
                            "path_match": "*",
                            "mapping": {
                                "type": "multi_field",
                                "fields": {
                                    "ngram": {
                                        "type": "{dynamic_type}",
                                        "index": "analyzed",
                                        "index_analyzer": "my_ngram_analyzer",
                                        "search_analyzer" : "default_search"
                                    },
                                    "{name}": {
                                        "type": "{dynamic_type}",
                                        "index": "analyzed",
                                        "index_analyzer" : "standard",
                                        "search_analyzer" : "default_search"
                                    },
                                    "sortable": {
                                        "type": "{dynamic_type}",
                                        "index": "analyzed",
                                        "analyzer" : "default_search"
                                    }
                                }
                            }
                        }
                    }
                ]
            }
        }
    }'

We're really only keeping the unanalyzed field around for sorting and exact matches (We even call it, 'sortable'. ) This configuration makes it easy for us to get partial word searches, if the query is a "contains" query- we append ".ngram" to the query target. The problem that we are having is deciding when to use the ".sortable" suffix. If the we receive a request to sort on dateUpdated, for example, we don't want to use .sortable, since that field is a date. If The request is to sort on 'name', we do want to use it, since that field is a string, and not use it if we are trying to sort on 'price'.

The logic to check the type of a field before sorting seems a little kludgy (we check in our model, rather than checking the type in elasticsearch).It would be nice to ALWAYS have a '.sortable' field around, but we can't run non-string types through the dynamic template below- booleans and numbers can't be run through an ngram filter.

Does anyone have a suggestion for how we can always have a ".sortable" field for sorting, that would never be tokenized regardless of the type? Or maybe you have a better solution for this kind of problem that we're not seeing? Thanks in advance!

score 6 · Accepted Answer

这真正归结为我们一直希望在每个映射字段上都有一个“可排序”字段（我们将其重命名为“未分析”，因为它还有其他用途）。这样做的真正诀窍是，在不为每种类型添加新的动态模板的情况下，创建一个适用于除字符串以外的所有类型的动态模板。为此，您需要设置match_pattern为正则表达式：

           {
                "other_types": {
                    "match_mapping_type": "date|boolean|double|long|integer",
                    "match_pattern": "regex",
                    "path_match": ".*",
                    "mapping": {
                        "type": "multi_field",
                        "fields": {
                            "{name}": {
                                "type": "{dynamic_type}",
                                "index": "not_analyzed"
                            },
                            "unanalyzed": {
                                "type": "{dynamic_type}",
                                "index": "not_analyzed"
                            }
                        }
                    }
                }
            }

请注意，您还需要对“path_match”进行小幅更改——您必须使用真正的正则表达式（而不是 '*'，它是一个 ES 'simple' 表达式。）

这样做的一个缺点是我们增加了索引的大小——我们将所有这些类型存储了两次。不过，出于我们的目的，我们的索引（我们有很多）有足够的增长空间，值得避免在进行排序或完全匹配查询之前对每个字段进行类型查找（只是总是使用 $ {fieldName}.unanalyzed）。

elasticsearch - Applying a dynamic template to multiple types - for managing tokens for sorting

1 回答 1

Related

Reference