我的任务是: * 制作procter&gamble
并procter & gamble
产生相同的结果,包括分数 * 使其具有通用性,而不是通过同义词,因为它可以是任何其他Somehow&Somewhat
* 突出显示procter&gamble
或procter & gamble
,如果短语匹配,则不是单独的标记 * 我想使用simple_query_string
,因为我允许搜索运算符 *也AT&T
可搜索
这是我的片段。procter&gamble
或procter & gamble
搜索产生不同分数的问题和这个不同的文件作为结果。但用户期望procter&gamble
或得到相同的结果procter & gamble
DELETE /english_example
PUT /english_example
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_keywords": {
"type": "keyword_marker",
"keywords": ["example"]
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"acronymns": {
"type": "word_delimiter_graph",
"catenate_all" : true,
"preserve_original":true
},
"acronymns_": {
"type": "word_delimiter_graph",
"catenate_all" : true,
"preserve_original":true
},
"custom_stop_words_filter": {
"type": "stop",
"ignore_case": true,
"stopwords": [ "t" ]
}
},
"analyzer": {
"default": {
"tokenizer": "whitespace",
"char_filter": [
"ampersand_filter"
],
"filter": [
"english_possessive_stemmer",
"lowercase",
"acronymns",
"flatten_graph",
"english_stop",
"custom_stop_words_filter",
"english_keywords",
"english_stemmer"
]
}
},
"char_filter": {
"ampersand_filter": {
"type": "pattern_replace",
"pattern": "(?=[^&]*)( {0,}& {0,})(?=[^&]*)",
"replacement": "_and_"
},
"ampersand_filter2": {
"type": "mapping",
"mappings": [
"& => _and_"
]
}
}
}
}
}
PUT /english_example/_bulk
{ "index" : { "_id" : "1" } }
{ "description" : "wi-fi AT&T BB&T Procter & Gamble, some\nOther $500 games with Peter's", "contents" : "Much text with somewhere I meet Procter or Gamble" }
{ "index" : { "_id" : "2" } }
{ "description" : "Procter & Gamble", "contents" : "Much text with somewhere I meet Procter and Gamble" }
{ "index" : { "_id" : "3" } }
{ "description" : "Procter&Gamble", "contents" : "Much text with somewhere I meet Procter & Gamble" }
{ "index" : { "_id" : "4" } }
{ "description" : "Come Procter&Gamble", "contents" : "Much text with somewhere I meet Procter&Gamble" }
{ "index" : { "_id" : "5" } }
{ "description" : "Tome Procter & Gamble", "contents" : "Much text with somewhere I don't meet AT&T" }
# "query": "procter & gamble",
GET english_example/_search
{
"query": {
"simple_query_string": {
"query": "procter & gamble",
"default_operator": "or",
"fields": [
"description^2",
"contents^80"
]
}
},
"highlight": {
"fields": {
"description": {},
"contents": {}
}
}
}
# "query": "procter&gamble",
GET english_example/_search
{
"query": {
"simple_query_string": {
"query": "procter&gamble",
"default_operator": "or",
"fields": [
"description^2",
"contents^80"
]
}
},
"highlight": {
"fields": {
"description": {},
"contents": {}
}
}
}
# "query": "at&t",
GET english_example/_search
{
"query": {
"simple_query_string": {
"query": "at&t",
"default_operator": "or",
"fields": [
"description^2",
"contents^80"
]
}
},
"highlight": {
"fields": {
"description": {},
"contents": {}
}
}
}
在我的代码片段中,我使用
word_delimiter_graph
和whitespace
标记器重新定义了默认分析器来搜索AT&T
匹配项。