还有另一个更理想的选择可以准确地实现您想要的。您可以利用摄取 API 管道并使用script
处理器,您可以在索引时创建另一个数字字段,然后您可以在搜索时更有效地使用它。
下面的摄取管道包含一个script
处理器,它将创建另一个名为的字段,该字段numField
仅包含数值。
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"source": """
ctx.numField = /\D/.matcher(ctx.testField).replaceAll("");
"""
}
}
]
},
"docs": [
{
"_source": {
"testField": "123"
}
},
{
"_source": {
"testField": "abc123"
}
},
{
"_source": {
"testField": "123abc"
}
},
{
"_source": {
"testField": "abc"
}
}
]
}
使用 4 个混合了字母数字内容的不同文档模拟此管道,将产生以下结果:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "123"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "abc123"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "123abc"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "",
"testField" : "abc"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
}
]
}
使用此管道索引您的文档后,您可以运行范围查询numField
而不是testField
. 与其他解决方案(对不起@Kamal)相比,它将脚本负担转移到在索引时每个文档只运行一次,而不是在搜索时每次在每个文档上运行。
{
"query": {
"range": {
"numField": {
"gte": 0,
"lte": 2000000
}
}
}
}