We are having some difficulty on figuring out how to best manage our tokenized and untokenized fields for both searching and sorting. Our goals are pretty straightforward:
- Support Partial word searches
- Support Sorting on all all fields
- Our mapping must be dynamic, customers add new fields at runtime.
We're able to accomplish this using a dynamic template. We Store Strings using the default tokenizer, a custom, ngram tokenizer, and an unanalyzed tokenizer. The mapping:
curl -XPUT 'http://testServer:9200/test/' -d '{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": [
"lowercase"
],
"type" : "custom"
},
"default_search": {
"tokenizer" : "keyword",
"filter" : [
"lowercase"
]
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "3",
"max_gram": "100",
"token_chars": []
}
}
}
},
"mappings": {
"TestObject": {
"dynamic_templates": [
{
"metadata_template": {
"match_mapping_type": "string",
"path_match": "*",
"mapping": {
"type": "multi_field",
"fields": {
"ngram": {
"type": "{dynamic_type}",
"index": "analyzed",
"index_analyzer": "my_ngram_analyzer",
"search_analyzer" : "default_search"
},
"{name}": {
"type": "{dynamic_type}",
"index": "analyzed",
"index_analyzer" : "standard",
"search_analyzer" : "default_search"
},
"sortable": {
"type": "{dynamic_type}",
"index": "analyzed",
"analyzer" : "default_search"
}
}
}
}
}
]
}
}
}'
We're really only keeping the unanalyzed field around for sorting and exact matches (We even call it, 'sortable'. ) This configuration makes it easy for us to get partial word searches, if the query is a "contains" query- we append ".ngram" to the query target. The problem that we are having is deciding when to use the ".sortable" suffix. If the we receive a request to sort on dateUpdated, for example, we don't want to use .sortable, since that field is a date. If The request is to sort on 'name', we do want to use it, since that field is a string, and not use it if we are trying to sort on 'price'.
The logic to check the type of a field before sorting seems a little kludgy (we check in our model, rather than checking the type in elasticsearch).It would be nice to ALWAYS have a '.sortable' field around, but we can't run non-string types through the dynamic template below- booleans and numbers can't be run through an ngram filter.
Does anyone have a suggestion for how we can always have a ".sortable" field for sorting, that would never be tokenized regardless of the type? Or maybe you have a better solution for this kind of problem that we're not seeing? Thanks in advance!