python - 如何使 NLTK pos_tag 单词而不是字符？

Question

我有这段代码可以在句子中查找名词和动词。

 # -*- coding: utf-8 -*-
from nltk.corpus import wordnet as wn
from nltk import pos_tag
import nltk
syno =[]


sentence = '''His father suggested he study to become a parson instead, but Darwin was far more inclined to study natural history.DarwinDar·win (där'wĭn),Charles Robert.1809-1882.British naturalist who revolutionized the study of biology with his theory ofevolutionbased on natural selection
Like several scientists before him, Darwin believed all the life on earth evolved (developed gradually) over millions of years from a few common ancestors.'''

sent = pos_tag(word_tokenize(sentence))

这返回

[('H', 'NNP'), ('e', 'VBP'), ('l', 'NN'), ('l', 'NN'), ('o', 'NN'), (' ', ':'), ('m', 'NN'), ('y', 'NN'), (' ', ':'), ('n', 'NN'), ('a', 'DT'), ('m', 'NN'), ('e', 'NN'), (' ', ':'), ('i', 'PRP'), ('s', 'VBZ'), (' ', ':'), ('A', 'DT'), ('b', 'NN'), ('h', 'NN'), ('i', 'PRP'), ('s', 'VBZ'), ('h', 'JJ'), ('e', 'NN'), ('k', 'NN'), (' ', ':'), ('M', 'NNP'), ('i', 'PRP'), ('t', 'VBP'), ('r', 'JJ'), ('a', 'DT')]

我希望它对文字进行操作，而不是对字符进行操作！我该怎么做呢？

score 9 · Accepted Answer

You need to tokenize first:

>>> from nltk import pos_tag, word_tokenize
>>> sentence = "Hello my name is Derek. I live in Salt Lake city."
>>> pos_tag(word_tokenize(sentence))
[('Hello', 'NNP'), ('my', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Derek.', 'NNP'), ('I', 'PRP'), ('live', 'VBP'), ('in', 'IN'), ('Salt', 'NNP'), ('Lake', 'NNP'), ('city', 'NN'), ('.', '.')]

python - 如何使 NLTK pos_tag 单词而不是字符？

1 回答 1

Related

Reference