python - 如何在 spacy 中获得合取的跨度？

Question

我使用 spacytoken.conjuncts来获取每个标记的合取。

但是，返回类型token.conjuncts是tuple，但我想获取span类型，例如：

import spacy
nlp = spacy.load("en_core_web_lg")

sentence = "I like to eat food at the lunch time, or even at the time between a lunch and a dinner"
doc = nlp(sentence)
for token in doc:
    conj = token.conjuncts
    print(conj)

#output: <class 'tuple'>

有谁知道如何将其转换tuple为span类型？

或者也许我怎样才能直接获得span连词的类型？

我需要spantype 的原因是，我想使用conjuncts (span)来定位这个连词的位置，例如，这个连词属于哪个名词块或一个拆分（无论我用什么方式拆分它们）。

目前，我将tupleto转换str为迭代所有拆分或名词块以搜索拆分/名词块是否包含 this conjunct。

但是，存在一个错误，例如，当一个conjunct（令牌的）出现在多个拆分/名词块中时，定位包含它的确切拆分将是一个问题conjunct。因为我只考虑str而不考虑indexorid的conjunct。如果我能拥有span这个conjunct，那么我就可以定位到这个的确切位置conjunct。

请随时发表评论，在此先感谢！

score 2 · Accepted Answer

token.conjuncts返回一个令牌元组。要获得跨度，请致电doc[conj.i: conj.i+1]

import spacy

nlp = spacy.load('en_core_web_sm')


sentence = "I like oranges and apples and lemons."


doc = nlp(sentence)

for token in doc:
    if token.conjuncts:
        conjuncts = token.conjuncts             # tuple of conjuncts
        print("Conjuncts for ", token.text)
        for conj in conjuncts:
            # conj is type of Token
            span = doc[conj.i: conj.i+1]        # Here's span
            print(span.text, type(span))

python - 如何在 spacy 中获得合取的跨度？

1 回答 1

Related

Reference