0

我想在 Java 应用程序中使用 Lucene 来计算单词的支持和置信度。我有超过 500 个 .txt 文档,一个 ArrayList 包含两个术语,术语 i 和术语 j

The formula for counting Confidence

Dti-tj/Dti

Dti-tj: Total document contains term i,term j
Dti : Total document contains term i

The formula for counting Support

Dti-tj/D

Dti-tj = Total document contains term i,term j
D = Total Document in the collection

是否可以使用 Lucene 搜索和计算单词?我必须使用什么类?

4

1 回答 1

0

我会简单地搜索您的两个术语 termi和 term ,并从搜索的返回中j获取您的计数。totalHits

int docCount = indexReader.numDocs();
IndexSearcher searcher = new IndexSearcher(indexReader);

Query queryI = new TermQuery(new Term(fieldName, termI));
Query queryJ = new TermQuery(new Term(fieldName, termJ));

Query queryIJ = new BooleanQuery();
queryIJ.add(new BooleanClause(queryI, BooleanClause.Occur.SHOULD));
queryIJ.add(new BooleanClause(queryJ, BooleanClause.Occur.SHOULD));

int countI = searcher.search(nqueryI, maxDocs).totalHits;
int countIJ = searcher.search(, maxDocs).totalHits;

float confidence = (float)countIJ / (float)countI;
float support = (float)countIJ / (float)docCount;
于 2013-06-20T15:49:23.687 回答