java - 分层评分 Lucene，或长期治疗

Question

我试图将兴趣配置文件翻译成一些 Lucene 查询。

给定一个标题词和一些扩展词，采用 JSON 格式，例如

{"title":"Donald Trump", "Expansion":[["republic","republican"],["democratic","democrat"],["campaign"]]}

对应的 Lucene 查询可以是 BooleanQuery，如下所示（设置标题词提升因子为 3.0，扩展词提升因子为 1.0）。

+(text:donald^3.0 text:trump^3.0 (text:democrat text:democratic) (text:republic text:republican) text:campaign)

使用IndexSearcher's explain()方法，

一个匹配的文件，如，

I know people just want to find a way to be famous without taking any risks, republic republican Donald Trump Campaign.

得分为 9.0

3.0 = weight(text:donald^3.0 in 0) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=0,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = queryNorm
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  3.0 = weight(text:trump^3.0 in 0) [TitleExpansionSimilarity], result of:
    3.0 = score(doc=0,freq=1.0), product of:
      3.0 = queryWeight, product of:
        3.0 = boost
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = queryNorm
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  2.0 = sum of:
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
  1.0 = weight(text:campaign in 0) [TitleExpansionSimilarity], result of:
    1.0 = fieldWeight in 0, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = termFreq=1.0
      1.0 = idf(docFreq=201, maxDocs=201)
      1.0 = fieldNorm(doc=0)

有什么方法可以重写 Lucene 评分函数，对 BooleanQuery (text:republic text:republican) aka 进行评分。集群["republic","republican"]作为“republic”的匹配权重或“republican”的匹配权重的最大值？

1.0 = MAX(instead of sum) of:
    1.0 = weight(text:republic in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)
    1.0 = weight(text:republican in 0) [TitleExpansionSimilarity], result of:
      1.0 = fieldWeight in 0, product of:
        1.0 = tf(freq=1.0), with freq of:
          1.0 = termFreq=1.0
        1.0 = idf(docFreq=201, maxDocs=201)
        1.0 = fieldNorm(doc=0)

score 0 · Accepted Answer

不是通过 Lucene 的 QueryParser 语法，但您可以使用 a DisjunctionMaxQuery, 而不是 aBooleanQuery将查询和得分与其子查询的最大分数组合在一起，而不是子查询分数的总和。

java - 分层评分 Lucene，或长期治疗

1 回答 1

Related

Reference