我正在探索惊人的 python 库,我得到了这个:
text='The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
token_pos=[token.pos_ for token in spacy_doc] token_tag=[token.tag_ for token in spacy_doc] token_dep=[token.dep_ for token in spacy_doc]
token_pos
['DET', 'PROPN', 'VERB', 'PART', 'VERB', 'ADP', 'DET', 'NOUN', 'SPACE', 'ADJ', 'PUNCT', 'CCONJ', 'PROPN', 'VERB', 'ADP', 'PROPN', 'PUNCT']
令牌标签
['DT', 'NNP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'NN', '_SP', 'JJ', ',', 'CC', 'NNP', 'VBD', 'IN', 'NNP', '.']
token_dep
['det', 'nsubj', 'ROOT', 'aux', 'xcomp', 'prep', 'det', 'pobj', '', 'advcl', 'punct', 'cc', 'nsubj', 'conj', 'prep', 'pobj', 'punct']
树
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
managed
_____________________|_________________________
| | | sail |
| | | _____|__________ |
| | | | | into went
| | | | | | ____|______
| | Titanic | | coast | | to
| | | | | ____|____ | | |
, and The to intact the Conan . Chicago
问题:我对“管理”和“去”之间的依赖关系感到困惑。这是一个“conj”。(1) 这是分类错误吗?如果是分类错误,那么正确的分类是什么?如果不是,你能解释为什么会这样吗?Spacy 将此解释为“连词”:(2)有没有办法将这种情况与下面的情况区分开来?
spacy.explain('conj')
Out[59]: 'conjunct'
根据斯坦福依赖手册:
连词是由并列连词连接的两个元素之间的关系,例如“and”、“or”等:
“比尔大而诚实”</p>
“他们要么滑雪,要么单板滑雪”</p>
conj(大,诚实)
conj(滑雪,滑雪板)
现在看最后一句话:
text='They either ski or snowboard.'
spacy_doc = nlp(text)
token_pos=[token.pos_ for token in spacy_doc]
token_tag=[token.tag_ for token in spacy_doc]
token_dep=[token.dep_ for token in spacy_doc]
print(token_pos)
['PRON', 'CCONJ', 'VERB', 'CCONJ', 'NOUN', 'PUNCT']
print(token_tag)
['PRP', 'CC', 'VBP', 'CC', 'NN', '.']
print(token_dep)
['ROOT', 'preconj', 'appos', 'cc', 'conj', 'punct']
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
They
__________|____
| ski
| __________|______
. either or snowboard
“ski”和“snowboard”之间的关系依赖也是“conj”,在这种情况下,它似乎是正确的分类。