elasticsearch - 如果搜索字符串长于搜索字段，则文档不匹配

Question

我有一个我正在寻找的标题

标题是，并以“警察日记：stefan zweig”的形式存储在文档中

当我搜索“警察”时，我得到了结果。但是当我搜索 Policeman 时，我没有得到结果。

这是查询：

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "fields": [
              "title",
              omitted because irrelevance...
            ],
            "query": "Policeman",
            "fuzziness": "1.5",
            "prefix_length": "2"
          }
        }
      ],
      "must": {
        omitted because irrelevance...
      }
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ]
}

这是映射

{
    "books": {
        "mappings": {
            "book": {
                "_all": {
                    "analyzer": "nGram_analyzer", 
                    "search_analyzer": "whitespace_analyzer"
                },
                "properties": {
                    "title": {
                        "type": "text",
                        "fields": {
                            "raw": {
                                "type": "keyword"
                            },
                            "sort": {
                                "type": "text",
                                "analyzer": "to order in another language, (creates a string with symbols)",
                                "fielddata": true
                            }
                        }
                    }
                }
            }
        }
    }
}

应该注意的是，如果我搜索“某人的头衔”，我有一个标题为“某个头衔”的文档会被点击。

我不明白为什么警察的书没有出现。

score 1 · Accepted Answer

所以你的问题有两个部分。

police搜索时要搜索包含的标题policeman。
想知道为什么some title文档与文档匹配，someone title并据此您希望第一个也匹配。

让我首先解释一下为什么第二个查询匹配以及为什么第一个不匹配，然后告诉你如何使第一个查询起作用。

您的文档包含some title创建以下令牌，您可以使用分析器 API验证这一点。

POST /_analyze

{
    "text": "some title",
    "analyzer" : "standard" --> default analyzer for text field
}

生成的令牌

{
    "tokens": [
        {
            "token": "some",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "title",
            "start_offset": 5,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

现在，当您someone title使用匹配查询进行搜索时，该查询已被分析并使用index time在字段上使用的相同分析器。

因此，它创建了 2 个标记someone，title并且匹配查询与title标记匹配，这就是它出现在您的搜索结果中的原因，您还可以使用Explain API来验证并查看它的内部细节是如何匹配的。

`police`搜索时如何带标题`policeman`

您需要使用同义词标记过滤器，如下例所示。

索引定义

{
    "settings": {
        "analysis": {
            "analyzer": {
                "synonyms": {
                    "filter": [
                        "lowercase",
                        "synonym_filter"
                    ],
                    "tokenizer": "standard"
                }
            },
            "filter": {
                "synonym_filter": {
                    "type": "synonym",
                    "synonyms" : ["policeman => police"] --> note this
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "": {
                "type": "text",
                "analyzer": "synonyms"
            }
        }
    }
}

索引示例文档

{
  "dialog" : "police"
}

具有术语的搜索查询`policeman`

{
    "query": {
        "match" : {
            "dialog" : {
                "query" : "policeman"
            }
        }
    }
}

和搜索结果

 "hits": [
      {
        "_index": "so_syn",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "dialog": "police" --> note source has `police` only.
        }
      }
    ]

elasticsearch - 如果搜索字符串长于搜索字段，则文档不匹配

1 回答 1

生成的令牌

police搜索时如何带标题policeman

索引定义

索引示例文档

具有术语的搜索查询policeman

和搜索结果

Related

Reference

`police`搜索时如何带标题`policeman`

具有术语的搜索查询`policeman`