python - 运行时python字符解码异常但不是eclipse

Question

我有一个获取 url 页面然后 html 对其进行转义的 python 脚本。

目标页面包含英语和希伯来语字体。

opener = urllib2.build_opener(
            urllib2.HTTPRedirectHandler(),
            urllib2.HTTPHandler(debuglevel = 0),
            urllib2.HTTPSHandler(debuglevel = 0),
            urllib2.HTTPCookieProcessor(self.cj)
        )
response = opener.open(url)
data = response.read()
goodData = HTMLParser.HTMLParser().unescape(data)
print goodData

在 Eclipse 中运行时，代码工作正常。打包并在 linux shell（ubuntu 12.04）上运行时，它在倒数第二行（打印之前）失败，并显示：

'ascii' codec can't decode byte 0xe2 in position 775: ordinal not in range(128)

我无法调试它，因为在 eclipse 中这似乎工作正常。怎么来的？

score 0 · Accepted Answer

好的，解决了：

opener = urllib2.build_opener(
            urllib2.HTTPRedirectHandler(),
            urllib2.HTTPHandler(debuglevel = 0),
            urllib2.HTTPSHandler(debuglevel = 0),
            urllib2.HTTPCookieProcessor(self.cj)
        )
response = opener.open(url)
data = response.read()
data = data.decode('utf8')
goodData = HTMLParser.HTMLParser().unescape(data)
print goodData

感谢：处理文件名时出现 UnicodeDecodeError

python - 运行时python字符解码异常但不是eclipse

1 回答 1

Related

Reference