java - lucene boosting如何受lengthNorm相似性的影响

Question

我有两个文档，其中包含：

doc_1：one two three four five Bingo

doc_2：Bingo one two three four five

我在两个字段中分别编制索引，其中一个字段包含前 5 个术语，第二个包含最后一个术语。

TextField start_field = new TextField("start_words", content.substring(0, index), Field.Store.NO);
TextField end_field = new TextField("end_words", content.substring(index,content.length()-1, Field.Store.NO);
// index is index value of 5th ' '

为了更好地查看提升结果，我实现了以下相似性：

DefaultSimilarity customSimilarity = new DefaultSimilarity() {
     @Override
     public float lengthNorm(FieldInvertState state) {
         return 1; // So length of each field would not matter
     }
};

在不应用任何提升的情况下，在具有相同分数的两个文档中搜索Bingo结果（如预期和预期的那样）。但是，当对其中一个字段 ( ) 应用提升时start_field.setBoost(5)，两个分数保持相同，尽管doc_2的字段包含Bingo被提升。

如果我删除customSimilarity，提升按预期工作。

为什么会boosting停下来lengthNorm，我怎样才能使提升工作与给定的覆盖相似性？

score 0 · Accepted Answer

in的默认实现是.lengthNorm()DefaultSimilaritystate.getBoost() * lengthNorm(numTerms)

在您的实施中，您没有考虑到提升。为了让你的提升很重要，你可以让你的实现 return state.getBoost()。

java - lucene boosting如何受lengthNorm相似性的影响

1 回答 1

Related

Reference