python - 为什么 python json.dumps 抱怨 ascii 解码？

Question

我的代码中有以下几行

outs = codecs.getwriter('utf-8')(sys.stdout)
# dJSON contains JSON message with non-ASCII chars
outs.write(json.dumps(dJSON,encoding='utf-8', ensure_ascii=False, indent=indent_val))

我收到以下异常：

    outs.write(json.dumps(dJSON,encoding='utf-8', ensure_ascii=False, indent=indent_val))
    File "/usr/lib/python2.7/json/__init__.py", line 238, in dumps
         **kw).encode(obj)
    File "/usr/lib/python2.7/json/encoder.py", line 204, in encode
         return ''.join(chunks)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 27: ordinal not in range(128)

通过encoding='utf-8'在json.dumps声明中指定，我避免了此类问题。为什么我仍然收到错误？

score 10 · Accepted Answer

我的猜测是该dJSON对象不包含纯 unicode，但它包含 unicode 和已经编码的字符串的混合，utf-8例如这失败了

>>> d = {u'name':u'पाइथन'.encode('utf-8')}
>>> json.dumps(d, encoding='utf-8', ensure_ascii=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 204, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 1: ordinal not in range(128)

但这有效（一切都是unicode）

>>> d = {u'name':u'पाइथन'}
>>> json.dumps(d, encoding='utf-8', ensure_ascii=False)
u'{"name": "\u092a\u093e\u0907\u0925\u0928"}

虽然这也有效（所有字符串）

>>> d = {'name':u'पाइथन'.encode('utf-8')}
>>> json.dumps(d, encoding='utf-8', ensure_ascii=False)
'{"name": "\xe0\xa4\xaa\xe0\xa4\xbe\xe0\xa4\x87\xe0\xa4\xa5\xe0\xa4\xa8"}'

score 4 · Accepted Answer

有一种解决方法：将utf8编码（不是utf-8！）传递给转储方法。在这种情况下，它将强制首先解码所有字符串unicode，并且您可以混合使用 unicode 字符串和已编码为 UTF-8 的字符串。为什么它有效？因为在源代码中有这样一个东西JSONEncoder：

if self.encoding != 'utf-8':
     def _encoder(o, _orig_encoder=_encoder, _encoding=self.encoding):
         if isinstance(o, str):
             o = o.decode(_encoding)
         return _orig_encoder(o)

这就是我们所需要的，而且它不会开箱即用。但是，当我们将编码更改为utf8（与 UTF-8 完全相同utf-8）时，我们强制_encoder对其进行定义，一切正常：）

score 0 · Accepted Answer

根据先前的答案，您可以使用utf8vs解决此问题utf-8，但它不包括“复制粘贴”修复。

这是此修复的复制粘贴；P

your_unicode_result = json.dumps(your_dict, encoding="utf8", ensure_ascii=False)

python - 为什么 python json.dumps 抱怨 ascii 解码？

3 回答 3

Related

Reference