3

我使用 lucene 库来创建索引和搜索。但现在我想获得前 30 个单词是我文本中出现的大部分单词。我能做些什么?

4

2 回答 2

1

If you are using Lucene 4.0 or later, you can use the HighFreqTerms class, such as:

TermStats[] commonTerms = HighFreqTerms.getHighFreqTerms(reader, 30, "mytextfield");
for (TermStats commonTerm : commonTerms) {
    System.out.println(commonTerm.termtext.utf8ToString()); //Or whatever you need to do with it
}

From each TermStats object, you can get the frequencies, field name, and text.

于 2013-10-03T19:12:43.533 回答
0

SO中的快速搜索让我明白了:从Lucene索引中获取最高频率词

这对你有用吗?听起来像完全相同的问题..

于 2013-10-03T17:45:07.620 回答