python - encode('ascii', 'ignore') 如何抛出 UnicodeDecodeError？

Question

这条线

data = get_url_contents(r[0]).encode('ascii', 'ignore')

产生此错误

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 11450: ordinal not in range(128)

为什么？我假设因为我使用的是“忽略”，所以在将输出保存到字符串变量的值时应该不可能出现解码错误。

score 3 · Accepted Answer

由于 Python 2 的一个怪癖，您可以调用encode字节字符串（即已经编码的文本）。在这种情况下，它首先尝试通过使用 ascii 解码将其转换为 unicode 对象。因此，如果 get_url_contents 返回一个字节字符串，则您的行有效地执行此操作：

get_url_contents(r[0]).decode('ascii').encode('ascii', 'ignore')

在 Python 3 中，字节字符串没有encode方法，所以同样的问题只会导致 AttributeError。

（当然，我不知道这是问题所在 - 它可能与get_url_contents功能有关。但我上面描述的是我的最佳猜测）

1 回答 1