2

我在弹性搜索上创建了一个索引,如下所示:

"settings" : {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
                "filter": {
                    "trigrams_filter": {
                        "type":     "ngram",
                        "min_gram": 3,
                        "max_gram": 3
                    }
                },
                "analyzer": {
                    "trigrams": {
                        "type":      "custom",
                        "tokenizer": "standard",
                        "filter":   [
                            "lowercase",
                            "trigrams_filter"
                        ]
                    }
                }
    }
},
"mappings": {
    "issue": {
        "properties": {
            "description": {
                "type":     "string",
                "analyzer": "trigrams"
            }
        }
    }
}

我的测试项目如下:

"alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis"

"otomatik onay işlemi gecikmiş"

"************* nolu iade islemi urun kargoya verilmedi zamaninda iade islemlerinde urun erorr hata veriyor"

我用下面的查询测试了这个索引:

GET issue/_search
{
  "query": {
      "match": {
            "description":{
                 "query": "otomatik onay istemi zamaninda gerceklesmemis"
            }
       }
   }
}

结果:

{
      ....
      "hits": {
            ....
                "max_score": 2.3507352,
                "hits": [
                          {
                              ....                                   
                              "_score": 2.3507352,
                              "_source": {
                                   "issue_id": "*******",
                                   "description": "alici onay verdi basarili satisiniz gerceklesti diyor ama hesabima para transferi gerceklesmemis"
                                          }
                           }
                        ]
                }
 }

但是postgresql上的相同数据具有以下SQL 响应另一个结果:

SELECT 
     public.tbl_issue_descriptions_big.description,
     similarity(description, 'otomatik onay islemi zamaninda gerceklesmemis') AS sml
FROM
     public.tbl_issue_descriptions_big
WHERE
     description %'otomatik onay islemi zamaninda gerceklesmemis'
ORDER BY
     sml DESC
LIMIT 10

结果是:

description                                           | sml
======================================================|======
otomatik onay islemi gecikmis                         |0,351852

为什么会产生这种差异?

4

1 回答 1

1

我对 postgres 的了解不够,无法在那里给出合格的答案(因为这也取决于被索引的文档以及它们的评分公式是否完全相同,我对此表示怀疑),但 Elasticsearch 有一个解释API和一个解释参数搜索,这可以帮助您找出为什么以这种方式对某个文档进行评分。

于 2017-07-18T09:33:50.063 回答