斯坦福解析器的这种特殊输出格式称为“括号解析(树)”。它应该被解读为一个图表
- 单词作为节点(例如 As、an、accountant)
- 短语/从句作为标签(例如 S、NP、VP)
- 边是分层链接的,并且
- 通常解析 TOP 或根节点是一个幻觉
ROOT
(在这种情况下,您可以将其视为有向无环图 (DAG),因为它是单向且非循环的)
有一些图书馆可以阅读括号中NLTK
的解析,例如在nltk.tree.Tree
(http://www.nltk.org/howto/tree.html):
>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
(S
(PP (IN As) (NP (DT an) (NN accountant)))
(NP (PRP I))
(VP
(VBP want)
(S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
ROOT
|
S
______________________|________
| | VP
| | ________|____
| | | S
| | | |
| | | VP
| | | ________|___
PP | | | VP
___|___ | | | ________|___
| NP NP | | | NP
| ___|______ | | | | ___|_____
IN DT NN PRP VBP TO VB DT NN
| | | | | | | | |
As an accountant I want to make a payment
>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']