python - python中的词干、词形还原

翻译自：https://stackoverflow.com/questions/24837811 2014-07-19T07:17:59.183

1418 次

我检查了所有其他路径并使用了一些解决方案。我在使用端口词干分析器方面面临挑战。我正在尝试消除词缀，但是端口词干分析器将单词简化为一些奇怪的形式，例如语言变成了语言，强化变成了拼写不正确的强化。

我必须使用我正在使用 TextBlob 的单词来搜索句子。下面是我正在使用的代码。我从链接中提取了文本：http ://www.nltk.org/book/ch03.html 。我使用 porterstemmer 和 wordnetlemmatizer 搜索了语言。Wordnetlemma 仅将复数简化为单数。

url = 'http://www.nltk.org/book/ch03.html'
a = urllib.urlopen(url)
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent','Chrome')]
html = br.open(url).read()
raw = nltk.clean_html(html)
tokens = nltk.wordpunct_tokenize(raw)
t = [lmtzr.lemmatize(t) for t in tokens] 
text = nltk.Text(t)
sents = ' '.join([s.lower() for s in Text])
blob = TextBlob(sents)
matches = [str(s) for s in blob.sentences if search_words & set(s.words)]

python - python中的词干、词形还原

0 回答 0

Related

Reference