python - How to obtain better results using NLTK pos tag

Question

I am just learning nltk using Python. I tried doing pos_tag on various sentences. But the results obtained are not accurate. How can I improvise the results ?

broke = NN
flimsy = NN
crap = NN

Also I am getting lot of extra words being categorized as NN. How can I filter these out to get better results.?

score 10 · Accepted Answer

给出上下文，在那里你得到了这些结果。举个例子，我在上下文短语“他们打破了愚蠢的废话”上使用 pos_tag 获得了其他结果：

import nltk
text=nltk.word_tokenize("They broke flimsy crap")
nltk.pos_tag(text)

[（'他们'，'PRP'），（'破碎'，'VBP'），（'脆弱'，'JJ'），（'废话'，'NN'）]

无论如何，如果您发现在您看来很多单词被错误地归类为“NN”，您可以专门对那些标记为“NN”的单词应用一些其他技术。例如，您可以使用一些适当的标记语料库并使用三元标记器对其进行分类。（实际上与作者在http://nltk.googlecode.com/svn/trunk/doc/book/ch05.html上使用二元组的方式相同）。

像这样的东西：

pos_tag_results=nltk.pos_tag(your_text) #tagged sentences with pos_tag
trigram_tagger=nltk.TrigramTagger(tagged_corpora) #build trigram tagger based on your tagged_corpora
trigram_tag_results=trigram_tagger(your_text) #tagged sentences with trigram tagger
for i in range(0,len(pos_tag_results)):
    if pos_tag_results[i][1]=='NN':
        pos_tag_results[i][1]=trigram_tag_results[i][1]#for 'NN' take trigram_tagger instead

让我知道它是否可以改善您的结果。

python - How to obtain better results using NLTK pos tag

1 回答 1

Related

Reference