我正在使用nltk
brown
语料库来获取简化的标签集
import nltk
from nltk.corpus import brown
brown_tags = []
for sent in brown.tagged_sents(categories = 'news', simplify_tags = True):
brown_tags.extend([tag for (word, tag) in sent])
tag_set = set(brown_tags)
然后我得到
set(['', 'FW', 'DET', 'WH', "''", 'VBZ', 'VB+PPO', "'", ')', 'ADJ', 'PRO', ' *', ',', '.', 'TO', 'NUM', 'NP', ':', 'ADV', '``', 'VD', 'VG', 'VN', 'N ', 'P', 'EX', 'V', 'CNJ', 'UH', '(', 'MOD'])
为什么有'', '""', ')',...
?如何删除这些符号?