0

使用 Python Spacy,我正在尝试从多个主题被动语态句子中提取实体。

句子 = “约翰和珍妮被大卫指控犯罪”

我的意图是从句子中提取“John and Jenny”作为nsubjpass.ent_

但是,我只能将“John”提取为 nsubjpass。

如何提取它们?

请注意,虽然在 .ents 中发现 John 作为实体,但 Jenny 被视为 conj 而不是 nsubjpass。如何改进它?

代码

each_sentence3 =  "John and Jenny were accused of crimes by David"
doc=nlp(each_sentence3)

passive_toks=[tok for tok in doc if (tok.dep_ == "nsubjpass") ]
if passive_toks != []:
    print(passive_toks)

结果:

[John]

实体列表显示:

代码

`

print(list(doc.ents)

结果

[John, Jenny, David]

现在,如果我们检查整个句子,我们会看到如下:

代码:

for tok in doc:   
        print(tok, tok.dep_)

结果

John nsubjpass
and cc
Jenny conj
were auxpass
accused ROOT
of prep
crimes pobj
by agent
David pobj

请注意,第二个被动主语 Jenny 在 Spacy 中被识别为 conj 而不是 nsubjpass。

4

1 回答 1

0

这是一个使用 POS 标签和依赖解析来提取主语及其所有连词的示例。

还有一个 Token.conjuncts 属性,但它只能直接连接到令牌。见https://github.com/explosion/spaCy/issues/795

each_sentence3 = "John and Jenny were accused of crimes by David"
sent = nlp(each_sentence3)

result = []
subj = None
for word in sent:
    if 'subj' in word.dep_:
        subj = word
        result.append(word)
    elif word.dep_ == 'conj' and word.head == subj:
        result.append(word)
print str(result)


[John, Jenny]
于 2017-02-14T08:02:34.527 回答