4

在 ElasticSearch 中,我创建了两个文档,其中一个字段是“CategoryMajor”

在 doc1 中,我将 CategoryMajor 设置为“Restaurants”

在 doc2 中,我将 CategoryMajor 设置为“餐厅餐厅餐厅餐厅”

如果我搜索 CategoryMajor:Restaurants,则 doc1 显示为比 doc2 更相关。这不是典型的 Lucene 行为,它给出的相关性越多,术语出现的次数越多。doc2 应该比 doc1 更相关。

我该如何解决这个问题?

4

1 回答 1

4

You can add &explain=true to your GET query to see that score of doc2 is lowered by "fieldNorm" factor. This is caused by default lucene similarity calculation formula, which lowers score for longer documents. Please read this document about default lucene similarity formula:

http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html

To disable this behaviour add "omit_norms=true" for CategoryMajor field to your index mapping by sending PUT request to:

http://localhost:9200/index/type/_mapping

with request body:

{
    "type": {
         properties": {
            "CategoryMajor": {
                "type": "string",
                "omit_norms": "true"
           }
        }
    }
}

I'm not certain, but it may be necessary to delete your index, create it again, put above mapping and then reindex your documents. Reindexing after changing mapping is necessary for sure :).

于 2012-09-07T20:19:43.040 回答