0

I have the same setup and code on mac for running simhash, it works.

But when I run it on Ubuntu, it complaints the implementation of simhash itself has the bug.

Have you encountered such problem?

objs = [(str(k), Simhash(v)) for k, v in index_data.items()] File "/usr/local/lib/python2.7/dist-packages/simhash-1.1.2-py2.7.egg/simhash/init.py", line 30, in init self.build_by_text(unicode(value)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 34: ordinal not in range(128)

4

1 回答 1

0

该错误告诉您, str(k) 无法正确解码。由于我不知道数据来自哪里以及它实际上是什么,我只能说类似

str(k).decode('cp850')

或者

Simhash(v.decode('cp850'))

可能有帮助。假设字符串在 cp850 中。至少我可以做一个'\xf6'.decode('cp850')

由于这似乎是模块内的问题,请检查使用的字符串是否事先正确解码。

于 2014-04-21T20:10:26.690 回答