lucene - 如何从 Lucene Index 中提取特定文本？

Question

我想在 lucene 索引中添加 pdf 文件（可能我做了）。现在我想通过 lucene 邻近搜索查询提取特定文本。

邻近搜索查询仅返回文件名。

But i want to extract all texts within the proximity query range.

示例案例：test.pdf:-->“示例文本 A xxxxx B. Lucene 一直都很棒”

邻近查询是：AB ~5

我要提取：xxxx

我能怎么做......？

提前感谢您的帮助和提示............

问候，

森蒂尔·萨拉瓦南

score 0 · Accepted Answer

Please add while indexing file

            doc.add(new Field("contents", result, Field.Store.COMPRESS,
                        Field.Index.ANALYZED,
                        Field.TermVector.WITH_POSITIONS_OFFSETS));

here doc is of type org.apache.lucene.document.Document.

While Searching file please use com.java.search.HighlighterUtil.getFragmentsWithHighlightedTerms(Analyzer analyzer, Query query, String fieldName, String fieldContents, int fragmentNumber, int fragmentSize) for fragments.

lucene - 如何从 Lucene Index 中提取特定文本？

1 回答 1

Related

Reference