5

我正在尝试在弹性搜索中实现完全匹配搜索。但我没有得到所需的结果。这是解释我面临的问题和我尝试过的事情的代码。

doc1 = {"sentence": "Today is a sunny day."}
doc2 = {"sentence": " Today is a sunny day but tomorrow it might rain"}
doc3 = {"sentence": "I know I am awesome"}
doc4 = {"sentence": "The taste of your dish is awesome"}
doc5 = {"sentence": "The taste of banana shake is good"}

# Indexing the above docs

es.index(index="english",doc_type="sentences",id=1,body=doc1)

es.index(index="english",doc_type="sentences",id=2,body=doc2)

es.index(index="english",doc_type="sentences",id=3,body=doc3)

es.index(index="english",doc_type="sentences",id=4,body=doc4)

es.index(index="english",doc_type="sentences",id=5,body=doc5)

查询 1

res = es.search(index="english",body={"from":0,"size":5,
                                  "query":
                                      {"match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},

                                          "explain":False})

查询 2

 res = es.search(index="english",body={"from":0,"size":5,
                                  "query":{
                                    "bool":{
                                            "must":{
                                            "match_phrase":
                                          {"sentence":{"query":"Today is a sunny day"}
                                          }},
                                            "filter":{
                                                    "term":{
                                                            "sentence.word_count": 5}},

                                          }
                                            }
                                            })

因此,当我运行查询 1 时,我得到 doc2 作为最高结果,而我希望 doc1 成为最高结果。

当我尝试使用相同的过滤器(将搜索长度限制为查询长度)时,如查询 2 中一样,我没有得到任何结果。

如果我能在解决这个问题上得到任何帮助,我将不胜感激。我想要给定查询的完全匹配,而不是包含该查询的匹配。

谢谢

4

3 回答 3

2

我的直觉告诉我,您的索引有 5 个主要分片,并且您没有足够的文档来使分数相关。如果您使用单个主分片创建索引,您的第一个查询将返回您期望的文档。您可以在以下文章中详细了解发生这种情况的原因:https ://www.elastic.co/blog/practical-bm25-part-1-how-shards-affect-relevance-scoring-in-elasticsearch

实现您想要的一种方法是使用keyword类型,但使用 anormalizer来小写数据,以便以不区分大小写的方式搜索精确匹配项更容易。

像这样创建索引:

PUT english
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lc_normalizer": {
          "type": "custom",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "sentences": {
      "properties": {
        "sentence": {
          "type": "text",
          "fields": {
            "exact": {
              "type": "keyword",
              "normalizer": "lc_normalizer"
            }
          }
        }
      }
    }
  }
}

然后,您可以像往常一样索引您的文档。

PUT english/sentences/1
{"sentence": "Today is a sunny day"}
PUT english/sentences/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}
...

最后你可以搜索一个完全匹配的词组,下面的查询只会返回 doc1

POST english/_search
{
  "query": {
    "match": {
      "sentence.exact": "today is a sunny day"
    }
  }
}
于 2018-09-21T11:00:05.887 回答
1

尝试使用布尔查询

    PUT test_index/doc/1
    {"sentence": "Today is a sunny day"}

    PUT test_index/doc/2
    {"sentence": "Today is a sunny day but tomorrow it might rain"}

 -#terms query for exact match with keyword and multi match - phrase for other matches
    GET test_index/_search
    {
      "query": {
        "bool": {
          "should": [
            {
              "terms": {
                "sentence.keyword": [
                  "Today is a sunny day"
                ]
              }
            },
            {  
              "multi_match":{  
                "query":"Today is a sunny day",
                "type":"phrase",
                "fields":[  
                    "sentence"
                ]
              }
            }
          ]
        }
      }
    }

另一个选项对关键字匹配作为第一个关键字匹配和 5 的提升以及其他没有提升的匹配使用多重匹配:

PUT test_index/doc/1
{"sentence": "Today is a sunny day"}

PUT test_index/doc/2
{"sentence": "Today is a sunny day but tomorrow it might rain"}


GET test_index/_search
{  
  "query":{  
    "bool":{  
      "should":[  
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
              "sentence.keyword"
            ],
            "boost":5
          }
        },
        {  
          "multi_match":{  
            "query":"Today is a sunny day",
            "type":"phrase",
            "fields":[  
                "sentence"
            ]
          }
        }
      ]
    }
  }
}
于 2018-09-17T19:57:52.943 回答
0

此查询将起作用 -

{
    "query":{
        "match_phrase":{
            "sentence":{
                "query":"Today is a sunny day"
            }
        }
    },
    "size":5,
    "from":0,
    "explain":false
}
于 2018-09-07T04:06:50.307 回答