2

我有一个使用 NLTK 查找名词和动词的代码。

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk


sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print sent

它返回:

[('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

我怎样才能从列表中只删除“NN”字样。

4

4 回答 4

4

您可以使用列表推导来删除“NN”元素:

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence)
print [s for s in sent if s[1] != 'NN']
于 2013-08-15T11:52:59.400 回答
1

这是另一种方法(利用元组的优势):

from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk

sentence = "Hello my name is Abhishek Mitra"
sentence = nltk.word_tokenize(sentence)
sent = pos_tag(sentence) 
sent_clean = [x for (x,y) in sent if y not in ('NN')]

print(sent_clean)

输出:

['Hello', 'my', 'is', 'Abhishek', 'Mitra']

说明: 在代码中:

sent_clean = [x for (x,y) in sent if y not in ('NN')]

在您对句子中的每个单词进行 POS 标记后,您将尝试为由于 POS 标记而创建的元组提取单词。您指定提取的条件是第二部分

同样,如果要消除多个 POS:

sent_clean2 = [x for (x,y) in sent if y not in ('PRP$', 'VBZ', 'NN')]

print(sent_clean2)

输出:

['Hello', 'Abhishek', 'Mitra']
于 2019-03-14T15:58:07.667 回答
0

我会使用过滤功能:

>>> filter(lambda (word, tag): tag != 'NN', sent)
[('Hello', 'NNP'), ('my', 'PRP$'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]
于 2013-08-15T15:40:57.587 回答
0
a = [('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Abhishek', 'NNP'), ('Mitra', 'NNP')]

c = [b  for b in a if b[-1] != 'NN']
于 2013-08-15T11:55:36.827 回答