python - 存在 unicode 数据时 Json 解码器不一致

Question

（这个问题与这个有关）

看看以下会话：

Python 2.7.3 (default, Jan  2 2013, 16:53:07) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import simplejson as json
>>> 
>>> my_json = '''[
...   {
...     "id" : "normal",
...     "txt" : "This is a normal entry"
...   },
...   {
...     "id" : "αβγδ",
...     "txt" : "This is a unicode entry"
...   }
... ]'''
>>> 
>>> cache = json.loads(my_json, encoding='utf-8')
>>> 
>>> cache
[{'txt': 'This is a normal entry', 'id': 'normal'}, {'txt': 'This is a unicode entry', 'id': u'\u03b1\u03b2\u03b3\u03b4'}]

为什么 json 解码器有时会生成 unicode，有时会生成纯字符串？它不应该总是产生unicode吗？

score 4 · Accepted Answer

这似乎是 simplejson 中的优化，来自simplejson docs：

如果 s 是 str 则出于性能和内存原因，可能会将仅包含 ASCII 字符的解码 JSON 字符串解析为 str。如果您的代码只需要 unicode，则适当的解决方案是在调用 decode 之前将 decode s 转换为 unicode。

注意：包含在 ASCII 中的任何字符在 UTF-8 和 ASCII 中的编码相同。所以 ASCII 是 UTF-8 的一个子集。

python - 存在 unicode 数据时 Json 解码器不一致

1 回答 1

Related

Reference