java - 在 LUCENE 中使用带有 slop 的短语查询时遇到问题

Question

我在短语查询方面遇到了一些问题，所以编写一个小代码来准确了解短语查询实际上是如何与 slop 的东西一起工作的：

我有一个字符串“abc Institute of Technology”，我像这样索引了这个字符串的不同组合（更像是一个木瓦）

Document doc = new Document();
ArrayList<String> sh = new ArrayList<String>(); 
     sh.add("abc institute engineering technology");
     sh.add("abc institute engineering");
     sh.add("abc institute");
     sh.add("abc");
     sh.add("institute engineering technology");
     sh.add("institute engineering");
     sh.add("institute");
     sh.add("engineering technology");
     sh.add("engineering");
     sh.add("technology");
  for(String s : sh){
        doc.add(new Field("insti_shingles", s.toLowerCase(), Field.Store.YES,  Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
  }
  writer.addDocument(doc);

现在，当我从索引目录中读取所有标记时，我有这些标记集：

engineering technology
abc
institute
abc institute engineering technology
technology
abc institute
abc institute engineering
institute engineering technology
engineering
institute engineering

现在当我搜索术语“abc Institute Technology”时

IndexSearcher searcher = new IndexSearcher(dir);
BooleanQuery booleanQuery = new BooleanQuery();
PhraseQuery query = new PhraseQuery();
query.add(new Term("insti_shingles", "abc institute technology"));
query.setSlop(4);
booleanQuery.add(query, BooleanClause.Occur.SHOULD);
TopDocs hits = searcher.search(booleanQuery, 30);

现在根据带有 slop 的短语查询的文档，我应该得到一些结果，但我得到的是空的结果集。但是当我搜索与索引标记完全相同的术语时，我得到了结果。

我认为当我们使用短语查询时，“abc Institute technology”一词应该与“abc Institute Engineering Technology”令牌匹配？？？

我做错什么了吗？帮助

score 0 · Accepted Answer

您不需要特殊的分词器来使用带有 slop 的短语查询 - 实际上它会导致这些查询失败，正如您所注意到的。

只需使用 a 进行标记StandardAnalyzer，无需执行自定义 shingle 的操作。

java - 在 LUCENE 中使用带有 slop 的短语查询时遇到问题

1 回答 1

Related

Reference