0

我正在使用停用词在 python 中进行自动语言检测

但是我在尝试测试代码时遇到了 KeyError 。这是代码

import nltk
from nltk.corpus import stopwords

def scoreFunction(wholetext):
    dictiolist={}
    scorelist={}
    NLTKlanguage = ["dutch","finnish","german","italian","portuguese","spanish","turkish","danish","english"," french","hungarian","norwegian","russian","swedish"]
    FREElanguages = [""]
    languages= NLTKlanguages + FREElanguages
    for lang in NLTKlanguages:
        dictiolist[lang]=stopwords.words(lang)
        tokens=nltk.tokenize.word_tokenize(wholetext)
        tokens=[t.lower() for t in tokens]
        freq_dist=nltk.FreqDist(tokens)
    for lang in languages:
        scorelist[lang]=0
    for word in freq_dist.keys()[0:20]:
        if word in dictiolist[lang]:
            scorelist[lang]+=1
    return scorelist

def whichLanguage(scorelist):
    maximum=0
    for item in scorelist:
        value = scorelist[item]
        if maximum<value:
            maximum = value
            lang = item
    return lang

当我运行它 scoreFunction("hillo 我的名字是 osfar,我是天才") 我得到错误 Traceback (last recent call last): File "", line 1, in

scoreFunction("hello my name is osfar and i'm very genius") 
File "C:/Users/osama1/Desktop
/fun-test", line 17, in scoreFunction 
if word in dictiolist[lang]:
KeyError: ''
4

1 回答 1

1

您的问题出在以下代码块中:

for word in freq_dist.keys()[0:20]:
    if word in dictiolist[lang]:
    scorelist[lang]+=1

lang在这个 for 循环中使用了变量,但你没有在任何地方定义它。这意味着它的值是未定义的;碰巧,它的值是“”(空字符串),因为这是它在上一个 for 循环中的最后一个值。

你显然打算做的是:

for word in freq_dist.keys()[0:20]:
    for lang in languages:
        if word in dictiolist[lang]:
        scorelist[lang]+=1

顺便说一句,有一种更简单的方法可以做你想做的事情:使用计数器。有关详细信息,请参阅http://docs.python.org/2.7/library/collections.html#counter-objects

于 2013-04-24T08:56:09.773 回答