6

我正在使用 python3 和 nltk 以及 stanford 依赖解析器来解析句子列表。然后用这句话收集所有节点信息。以下是我的代码,它在 python3 和一个名为 .python 的 virtualenv 环境中执行:

from nltk.parse.stanford import StanfordDependencyParser
parser = StanfordDependencyParser('stanford-parser-full-2015-12-09/stanford-parser.jar', 'stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar');
graph_nodes = sum([[dep_graph.nodes for dep_graph in dep_graphs] for dep_graphs in parser.raw_parse_sents(sentences)], []);

我发现 stanford 依赖解析器在某些句子中不断抛出断言错误。这是我得到的错误:

    graph_nodes = sum([[dep_graph.nodes for dep_graph in dep_graphs] for dep_graphs in self.parser.raw_parse_sents(sentences)], []);
    File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/stanford.py", line 150, in raw_parse_sents
return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))
    File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/stanford.py", line 91, in _parse_trees_output
res.append(iter([self._make_tree('\n'.join(cur_lines))]))
    File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/stanford.py", line 339, in _make_tree
return DependencyGraph(result, top_relation_label='root')
    File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/dependencygraph.py", line 84, in __init__
top_relation_label=top_relation_label,
    File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/dependencygraph.py", line 328, in _parse
assert cell_number == len(cells)
AssertionError

然后我找到了导致这个错误的句子。这是 :

'for all of its insights into the dream world of teen life , and its electronic expression through cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 1/2-hour running time . \n'

我多次更改句子以查看触发断言错误的原因。似乎当我从中删除“/”时,可以解析该句子。当我在其中包含“/”时,会引发断言错误。

我想知道是否有导致问题的特殊符号。我回到 nltk 的源代码以检查导致此断言错误的原因(在网站中搜索“assert”:http ://www.nltk.org/_modules/nltk/parse/dependencygraph.html )但无法弄清楚是什么导致错误。

谁能解释为什么会抛出错误以及如何解决?

4

0 回答 0