2

我在使用 lucene 荧光笔时遇到了一些问题。我的 lucene 索引中有一个包含三个字段的记录:标题、内容和分类(cls)。

当我使用“+(TITLE:test CONTENT:test) +CLS:dummy”搜索索引时,我在标题命中找到了“dummy”这个词,这是我的分类字段,我不想突出显示它。我该如何避免这种情况?

这是我的测试代码:

    Directory directory = new RAMDirectory();
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_36, analyzer);
    IndexWriter iw = new IndexWriter(directory, iwc);


    //====================================write indexed======================================================
    Document document = new Document();
    Field _f_title = new Field("TITLE", "this is a test title - dummy",Field.Store.YES, Field.Index.ANALYZED);
    Field _f_content = new Field("CONTENT", "this is a test content",Field.Store.YES, Field.Index.ANALYZED);
    Field _f_cls = new Field("CLS", "dummy",Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
    document.add(_f_title);
    document.add(_f_content);
    document.add(_f_cls);

    iw.addDocument(document);
    iw.close();
    //====================================search indexed=====================================================

    SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter("<span style='color:red;'>", "</span>");
    SimpleFragmenter fragmenter = new SimpleFragmenter(100);
    IndexReader ir = IndexReader.open(directory);
    IndexSearcher is = new IndexSearcher(ir);
    QueryParser parser = new QueryParser(Version.LUCENE_36, "", analyzer);
    Query query = parser.parse("+(TITLE:test CONTENT:test) +CLS:dummy");
    TopDocs docs = is.search(query, 10);

    Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
    highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
    highlighter.setTextFragmenter(fragmenter);

    for(ScoreDoc cDoc : docs.scoreDocs) {
        Document _document = is.doc(cDoc.doc);
        System.out.println("title:" +  highlighter.getBestFragment(analyzer, "TITLE", _document.get("TITLE")));
        System.out.println("content:" +  highlighter.getBestFragment(analyzer, "CONTENT", _document.get("CONTENT")));
    }
    is.close();

该程序输出:

title:this is a <span style='color:red;'>test</span> title - <span style='color:red;'>dummy</span>
content:this is a <span style='color:red;'>test</span> content

实际上,我希望它是:

title:this is a <span style='color:red;'>test</span> title - dummy
content:this is a <span style='color:red;'>test</span> content

我能做些什么?

4

0 回答 0