Spacy 包括noun_chunks
检索名词短语集的功能。该功能english_noun_chunks
(附在下面)使用word.pos == NOUN
def english_noun_chunks(doc):
labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
'attr', 'root']
np_deps = [doc.vocab.strings[label] for label in labels]
conj = doc.vocab.strings['conj']
np_label = doc.vocab.strings['NP']
for i in range(len(doc)):
word = doc[i]
if word.pos == NOUN and word.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label
elif word.pos == NOUN and word.dep == conj:
head = word.head
while head.dep == conj and head.head.i < head.i:
head = head.head
# If the head is an NP, and we're coordinated to it, we're an NP
if head.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label
我想从一个保持一些正则表达式的句子中获取块。例如,I 短语由零个或多个形容词组成,后跟一个或多个名词。
{(<JJ>)*(<NN | NNS | NNP>)+}
是否有可能不覆盖该english_noun_chunks
功能?