python - 如何在 nltk 中使用正则表达式标记器？

Question

如果我尝试这段代码：

import nltk
pattern = [(r'(March)$','MAR')]
tagger=nltk.RegexpTagger(pattern)
print tagger.tag('He was born in March 1991')

我得到这样的输出：

[('H', None), ('e', None), ('', None), ('w', None), ('a', None), ('s', None), (' '，无），>（'b'，无），（'o'，无），（'r'，无），（'n'，无），（''，无），（'i'，无），（'n'，无），（''，无），（'M'，无），（'a'，无），（'r'，无），（'c'，无）， ('h', None), ('', None), ('1', None), ('9', None), ('9', None), ('1', None)]

事实上，我希望这个标注器能够识别带有“MAR”标签的“March”单词。

score 6 · Accepted Answer

在这里试试这个：

import nltk
pattern = [(r'(March)$','MAR')]
tagger = nltk.RegexpTagger(pattern)
print tagger.tag(nltk.word_tokenize('He was born in March 1991'))

您必须标记单词。

这是我得到的输出：

[('He', None), ('was', None), ('born', None), ('in', None), ('March', 'MAR'), ('1991', None)]

python - 如何在 nltk 中使用正则表达式标记器？

1 回答 1

Related

Reference