我在使用 lucene 荧光笔时遇到了一些问题。我的 lucene 索引中有一个包含三个字段的记录:标题、内容和分类(cls)。
当我使用“+(TITLE:test CONTENT:test) +CLS:dummy”搜索索引时,我在标题命中找到了“dummy”这个词,这是我的分类字段,我不想突出显示它。我该如何避免这种情况?
这是我的测试代码:
Directory directory = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_36, analyzer);
IndexWriter iw = new IndexWriter(directory, iwc);
//====================================write indexed======================================================
Document document = new Document();
Field _f_title = new Field("TITLE", "this is a test title - dummy",Field.Store.YES, Field.Index.ANALYZED);
Field _f_content = new Field("CONTENT", "this is a test content",Field.Store.YES, Field.Index.ANALYZED);
Field _f_cls = new Field("CLS", "dummy",Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
document.add(_f_title);
document.add(_f_content);
document.add(_f_cls);
iw.addDocument(document);
iw.close();
//====================================search indexed=====================================================
SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter("<span style='color:red;'>", "</span>");
SimpleFragmenter fragmenter = new SimpleFragmenter(100);
IndexReader ir = IndexReader.open(directory);
IndexSearcher is = new IndexSearcher(ir);
QueryParser parser = new QueryParser(Version.LUCENE_36, "", analyzer);
Query query = parser.parse("+(TITLE:test CONTENT:test) +CLS:dummy");
TopDocs docs = is.search(query, 10);
Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
highlighter.setTextFragmenter(fragmenter);
for(ScoreDoc cDoc : docs.scoreDocs) {
Document _document = is.doc(cDoc.doc);
System.out.println("title:" + highlighter.getBestFragment(analyzer, "TITLE", _document.get("TITLE")));
System.out.println("content:" + highlighter.getBestFragment(analyzer, "CONTENT", _document.get("CONTENT")));
}
is.close();
该程序输出:
title:this is a <span style='color:red;'>test</span> title - <span style='color:red;'>dummy</span>
content:this is a <span style='color:red;'>test</span> content
实际上,我希望它是:
title:this is a <span style='color:red;'>test</span> title - dummy
content:this is a <span style='color:red;'>test</span> content
我能做些什么?