我在 Lucene 中进行索引,并且只对从 Lucene 中获取相关文档的 ID 感兴趣(即,不是字段值或任何突出显示的信息)。鉴于这些要求,我应该使用哪个术语向量,而不影响搜索性能(速度)或质量(结果)?我也将使用 MoreLikeThis 所以不想要
TermVector.YES—Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offsets information
TermVector.WITH_POSITIONS—Records the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets
TermVector.WITH_OFFSETS—Records the unique terms and their counts, with the offsets (start and end character position) of each occurrence of every term, but no positions
TermVector.WITH_POSITIONS_OFFSETS—Stores unique terms and their counts, along with positions and offsets
谢谢。