我也不明白你在问什么。可能是你对渗滤器不是很了解?这是我现在刚刚尝试的一个例子。
假设您有一个索引(我们称之为索引test
),您想在其中索引一些文档。该索引具有以下映射(只是我的测试设置中的随机测试索引):
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
您注意到它有一个自定义email
分析器,可以将类似的东西拆分foo@bar.com
为这些标记:foo@bar.com
, foo
, bar.com
, bar
, com
.
正如文档所说,您可以创建一个单独的过滤器索引,该索引将仅保存您的过滤器查询,而不是文档本身。而且,即使 percolator 索引不包含文档本身,它也应该保存应该保存文档的索引的映射(test
在我们的例子中)。
这是渗透器索引(我称之为它percolator_index
)的映射,它还具有用于拆分email
字段的特殊分析器:
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
它的映射和设置几乎与我的原始索引相同,唯一的区别是添加到映射query
中的类型的附加字段。percolator
你感兴趣的查询吧simple_query_string
——应该放到一个文档里面percolator_index
。像这样:
PUT /percolator_index/_doc/1?refresh
{
"query": {
"simple_query_string" : {
"query" : "month foo@bar.com",
"fields": ["part", "email"]
}
}
}
为了让它更有趣,我在其中添加了email
要在查询中专门搜索的字段(默认情况下,所有这些都被搜索)。
现在,目的是测试一个文档,该文档最终应该从您的渗透器索引test
中针对该simple_query_string
查询进入索引。例如:
GET /percolator_index/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com"
}
}
}
}
document
显然,下面是您未来(尚不存在)的文件。这将与上面定义simple_query_string
的匹配,并将导致匹配:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.39324823,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.39324823,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
如果我改为渗透此文档会怎样:
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
}
}
}
}
(请注意,电子邮件只是foo
)这是结果:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.26152915,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.26152915,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
请注意,分数略低于第一个渗透文档。这可能是这样的,因为foo
(我的电子邮件)只匹配了我所分析的术语中的一个foo@bar.com
,而foo@bar.com
会匹配所有的术语(从而给出更好的分数)
不知道你在说什么分析仪。我认为上面的示例涵盖了我认为可能有点令人困惑的唯一“分析器”问题/未知。