1

只需说我有一个字符串,例如:

Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP

我想提取“/NNP/”之前出现的每个单词。这意味着我的输出是

讲座, 联合国教科文组织, 房子

我试过了:

print re.findall(r'/NNP/',string) 然后向后工作,但我不能随意。单词前面总是有一个空格,这可能会有所帮助。

编辑:删除列表中的错误。

4

2 回答 2

4

尝试这个:

s = 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP'

re.findall(r'(\S+)/NNP/', s)
=> ['Lecture', 'UNESCO', 'House']
于 2013-06-16T05:22:52.037 回答
2

向前看。

>>> re.findall('(?:\s|^)[^/]+(?=/NNP/)', 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP')
['Lecture', 'UNESCO', 'House']
于 2013-06-16T05:22:59.383 回答