我有一个我设置的 Elasticsearch 索引"max_ngram_diff": 50
,但不知何故,它似乎只适用于分edge_ngram
词器,但不适用于分ngram
词器。
我已经针对同一个 URL 提出了这两个请求http://localhost:9201/index-name/_analyze
:
请求 1
{
"tokenizer":
{
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
请求 2
{
"tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
第一个请求返回预期结果:
{
"tokens": [
{
"token": "123",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "1234",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "12345",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 2
},
{
"token": "123456",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 3
},
// more tokens
]
}
但是第二个请求只返回这个:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[ffe18f1a89e6][172.18.0.3:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [17]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
},
"status": 400
}
发生了什么,带有标记器的第一个请求可以在和之间有更大edge_ngram
的差异,但带有标记器的第二个请求不能?max_gram
min_gram
1
ngram
这是我的映射:
{
"settings": {
"index": {
"max_ngram_diff": 50,
// further settings
}
}
}
使用的 Elastisearch 版本是7.2.0
谢谢你的帮助!