0

我正在探索惊人的 python 库,我得到了这个:

text='The Titanic managed to sail into the coast  intact, and Conan went to Chicago.'

token_pos=[token.pos_ for token in spacy_doc] token_tag=[token.tag_ for token in spacy_doc] token_dep=[token.dep_ for token in spacy_doc]

token_pos

['DET', 'PROPN', 'VERB', 'PART', 'VERB', 'ADP', 'DET', 'NOUN', 'SPACE', 'ADJ', 'PUNCT', 'CCONJ', 'PROPN', 'VERB', 'ADP', 'PROPN', 'PUNCT']

令牌标签

['DT', 'NNP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'NN', '_SP', 'JJ', ',', 'CC', 'NNP', 'VBD', 'IN', 'NNP', '.']

token_dep

['det', 'nsubj', 'ROOT', 'aux', 'xcomp', 'prep', 'det', 'pobj', '', 'advcl', 'punct', 'cc', 'nsubj', 'conj', 'prep', 'pobj', 'punct']

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_

[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]

                    managed                                 
  _____________________|_________________________            
 |   |     |          sail                       |          
 |   |     |      _____|__________               |           
 |   |     |     |     |         into           went        
 |   |     |     |     |          |          ____|______     
 |   |  Titanic  |     |        coast       |    |      to  
 |   |     |     |     |      ____|____     |    |      |    
 ,  and   The    to  intact the           Conan  .   Chicago

问题:我对“管理”和“去”之间的依赖关系感到困惑。这是一个“conj”。(1) 这是分类错误吗?如果是分类错误,那么正确的分类是什么?如果不是,你能解释为什么会这样吗?Spacy 将此解释为“连词”:(2)有没有办法将这种情况与下面的情况区分开来?

spacy.explain('conj')
Out[59]: 'conjunct'

根据斯坦福依赖手册

连词是由并列连词连接的两个元素之间的关系,例如“and”、“or”等:

“比尔大而诚实”</p>

“他们要么滑雪,要么单板滑雪”</p>

conj(大,诚实)

conj(滑雪,滑雪板)

现在看最后一句话:

text='They either ski or snowboard.'

spacy_doc = nlp(text)

token_pos=[token.pos_ for token in spacy_doc]
token_tag=[token.tag_ for token in spacy_doc]
token_dep=[token.dep_ for token in spacy_doc]

print(token_pos)
['PRON', 'CCONJ', 'VERB', 'CCONJ', 'NOUN', 'PUNCT']

print(token_tag)
['PRP', 'CC', 'VBP', 'CC', 'NN', '.']

print(token_dep)
['ROOT', 'preconj', 'appos', 'cc', 'conj', 'punct']

[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
           They              
  __________|____             
 |              ski          
 |     __________|______      
 .  either       or snowboard

“ski”和“snowboard”之间的关系依赖也是“conj”,在这种情况下,它似乎是正确的分类。

4

2 回答 2

2

是的,我相信这是正确的。

text='泰坦尼克号完好无损驶入海岸,柯南芝加哥。

在本例中,单词“managed”和“went”与单词“and”相连,后者是一个并列连词。

这与您在斯坦福依赖手册中提供的定义完全一致:

连词是由并列连词连接的两个元素之间的关系,例如“and”、“or”等:

“比尔 诚实”</p>

“他们要么滑雪 ,要么 单板滑雪”</p>

于 2020-08-07T21:11:21.367 回答
1

我认为答案就在你的问题本身。"managed" 和 "went" 是通过协调连词连接的两个元素,这也是我们在 spacy 的输出中看到的:

text = 'The Titanic managed to sail into the coast  intact, and Conan went to Chicago.'

spacy_doc = nlp(text)
[(token.text, token.dep_) for token in spacy_doc]

输出:

[('The', 'det'),
 ('Titanic', 'nsubj'),
 ('managed', 'ROOT'),
 ('to', 'aux'),
 ('sail', 'xcomp'),
 ('into', 'prep'),
 ('the', 'det'),
 ('coast', 'pobj'),
 (' ', ''),
 ('intact', 'advmod'),
 (',', 'punct'),
 ('and', 'cc'),
 ('Conan', 'nsubj'),
 ('went', 'conj'),
 ('to', 'prep'),
 ('Chicago', 'pobj'),
 ('.', 'punct')]
于 2020-07-26T22:43:56.523 回答