我看到你的帖子,并认为我会做一些玩弄它。这就是我得到的。
我添加了一些打印语句以查看发生了什么:
from enchant.checker import SpellChecker
text = "this is sme text with a speling mistake."
chkr = SpellChecker("en_US", text)
for err in chkr:
print(err.word + " at position " + str(err.wordpos)) #<----
err.replace("SPAM")
t = chkr.get_text()
print("\n" + t) #<----
这是运行代码的结果:
sme at position 8
speling at position 25
ing at position 29
ng at position 30
AMMstake at position 32
ake at position 37
ke at position 38
AMM at position 40
this is SPAM text with a SPAMSSPSPAM.SSPSPAM
如您所见,由于拼写错误的单词被“SPAM”替换,拼写检查器似乎在动态变化,并检查原始文本,因为它在 err var 中包含“SPAM”的一部分。
我尝试了来自http://pythonhosted.org/pyenchant/api/enchant.checker.html的原始代码,示例看起来像是您用于提问的示例,但仍然得到了一些意想不到的结果。
注意:我唯一添加的是打印语句:
口语:
>>> text = "This is sme text with a fw speling errors in it."
>>> chkr = SpellChecker("en_US",text)
>>> for err in chkr:
... err.replace("SPAM")
...
>>> chkr.get_text()
'This is SPAM text with a SPAM SPAM errors in it.'
我的代码:
from enchant.checker import SpellChecker
text = "This is sme text with a fw speling errors in it."
chkr = SpellChecker("en_US", text)
for err in chkr:
print(err.word + " at position " + str(err.wordpos))
err.replace("SPAM")
t = chkr.get_text()
print("\n" + t)
输出与网站不匹配:
sme at position 8
fw at position 25
speling at position 30
ing at position 34
ng at position 35
AMMrors at position 37 #<---- seems to add in parts of "SPAM"
This is SPAM text with a SPAM SPAMSSPSPAM in it. #<---- my output ???
无论如何,这是我想出的解决一些问题的方法。我没有替换为“SPAM”,而是使用您发布的代码版本进行单个单词替换并替换为实际建议的单词。重要的是要注意,在这个例子中,“建议”这个词 100% 的时间都是错误的。我过去曾遇到过这个问题,“如何在没有用户交互的情况下实现拼写更正”。其范围将远远超出您的问题。但是,我认为您将需要一些 NLP 数组才能获得准确的结果。
import enchant
from enchant.checker import SpellChecker
from nltk.metrics.distance import edit_distance
class MySpellChecker():
def __init__(self, dict_name='en_US', max_dist=2):
self.spell_dict = enchant.Dict(dict_name)
self.max_dist = max_dist
def replace(self, word):
suggestions = self.spell_dict.suggest(word)
if suggestions:
for suggestion in suggestions:
if edit_distance(word, suggestion) <= self.max_dist:
return suggestions[0]
return word
if __name__ == '__main__':
text = "this is sme text with a speling mistake."
my_spell_checker = MySpellChecker(max_dist=1)
chkr = SpellChecker("en_US", text)
for err in chkr:
print(err.word + " at position " + str(err.wordpos))
err.replace(my_spell_checker.replace(err.word))
t = chkr.get_text()
print("\n" + t)