如果我尝试这段代码:
import nltk
pattern = [(r'(March)$','MAR')]
tagger=nltk.RegexpTagger(pattern)
print tagger.tag('He was born in March 1991')
我得到这样的输出:
[('H', None), ('e', None), ('', None), ('w', None), ('a', None), ('s', None), (' ',无),>('b',无),('o',无),('r',无),('n',无),('',无),('i',无),('n',无),('',无),('M',无),('a',无),('r',无),('c',无), ('h', None), ('', None), ('1', None), ('9', None), ('9', None), ('1', None)]
事实上,我希望这个标注器能够识别带有“MAR”标签的“March”单词。