0

有人可以帮助我使用 hunpos 在 nltk 中标记语料库的语法吗?

  1. 我要为hunpos.HunPosTagger模块导入什么?

  2. 我如何 HunPosTag 语料库?请参阅下面的代码。


import nltk 
from nltk.corpus import PlaintextCorpusReader  
from nltk.corpus.util import LazyCorpusLoader  

corpus_root = './'  
reader = PlaintextCorpusReader (corpus_root, '.*')  

ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader)  
ntuen.fileids()  
isinstance (ntuen, PlaintextCorpusReader)  


# So how do I hunpos tag `ntuen`? I can't get the following code to work.
# please help me to correct my python syntax errors, I'm new to python 
# but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunPosTagger('english.model')
for sentence in ntu.sent() ##looping through the no. of sentence
     ht.tag(ntusent()[i])
4

1 回答 1

5
import nltk 
from nltk.tag.hunpos import HunposTagger
from nltk.tokenize import word_tokenize

corpus = "so how do i hunpos tag my ntuen ? i can't get the following code to work."
#please help me to correct my python syntax errors, i'm new to python 
#but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunposTagger('en_wsj.model')
print ht.tag(word_tokenize(corpus))

我觉得问题在于您没有对单词进行标记,但是代码可能无法正常工作还有其他原因(它是 HunposTagger,而不是 HunPosTagger)。我根据您的问题制作了这个简化的示例。如果您还有任何问题,请发表评论。

我从这里得到了一切:http ://code.google.com/p/hunpos/

蟒蛇hunpos.py

[('so', 'RB'), ('how', 'WRB'), ('do', 'VBP'), ('i', 'FW'), ('hunpos', 'NN') , ('tag', 'NN'), ('my', 'PRP$'), ('ntuen', 'NN'), ('?', '.'), ('i', 'FW' ), ('ca', 'MD'), ("n't", 'RB'), ('get', 'VB'), ('the', 'DT'), ('following', ' JJ'), ('code', 'NN'), ('to', 'TO'), ('work', 'VB'), ('.', '.')]

于 2011-02-23T22:16:01.473 回答