我正在尝试计算 pylucene 中特定令牌的高 PMI 项。一位同事给了我一些有效的 Java 代码,但我无法将其转换为 Python。特别是,代码依赖于自定义收集器。这是初始查询代码:
def __init__(self, some_token, searcher, analyzer):
super(PMICalculator, self).__init__()
self.searcher = searcher
self.analyzer = analyzer
self.escaped_token = QueryParser.escape(some_token)
self.query = QueryParser("text",self.analyzer).parse(self.escaped_token)
self.term_count_collector = TermCountCollector(searcher)
self.searcher.search(self.query, self.term_count_collector)
self.terms = self.term_count_collector.getTerms()
这是术语计数收集器类:http ://snipt.org/vgGi8
self.searcher.search
此代码因错误而中断:
File <filename>, line 26, in __init__
self.searcher.search(self.query, self.term_count_collector)
lucene.JavaError: org.apache.jcc.PythonException: collect() takes exactly 2 arguments (3 given)
TypeError: collect() takes exactly 2 arguments (3 given)
Java stacktrace:
org.apache.jcc.PythonException: collect() takes exactly 2 arguments (3 given)
TypeError: collect() takes exactly 2 arguments (3 given)
at org.apache.pylucene.search.PythonHitCollector.collect(Native Method)
at org.apache.lucene.search.HitCollectorWrapper.collect(HitCollectorWrapper.java:46)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:86)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:74)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
at org.apache.lucene.search.Searcher.search(Searcher.java:110)
我做了一些谷歌搜索,但无济于事 - 我是 lucene 的新手,无法判断这是否只是 2.9.4 不支持的功能,或者它是否是 pylucene 问题,或者我的代码是否错误。请帮忙!