python - 序列化为保留希伯来语字符的 JSON

Question

我有以下用例：

从数据我生成一个带有数据的 json，其中一部分是希伯来语单词。例如：

import json
j = {}
city =u'חיפה' #native unicode
j['results']= []
j['results'].append({'city':city}) #Also tried to city.encode('utf-8') and other encodings

为了生成一个 json 文件，它可以兼作我的应用程序数据库（微型地理应用程序）和文件，我的用户可以直接编辑和修复数据，我使用 json 库和：

to_save = json.dumps(j)
with open('test.json','wb') as f: #also tried with w instead of wb flag.
   f.write(to_save)
   f.close()

问题是我得到一个带有 u'חיפה' 的 unicode 解码 json，例如：u'\u05d7\u05d9\u05e4\u05d4'

大多数脚本和应用程序在读取 Unicodestring 时没有任何问题，但我的用户有一个！，并且由于为开源项目做出贡献，他们需要直接编辑 JSON，他们无法弄清楚希伯来语文本。

所以，问题：我应该如何编写 json 而在另一个编辑器中打开它会显示希伯来字符？

我不确定这是否可以解决，因为我怀疑 JSON 一直是 unicode，我不能在其中使用 asccii，但对此不确定。

谢谢您的帮助

score 8 · Accepted Answer

使用ensure_ascii=False论据。

>>> import json
>>> city = u'חיפה'
>>> print(json.dumps(city))
"\u05d7\u05d9\u05e4\u05d4"
>>> print(json.dumps(city, ensure_ascii=False))
"חיפה"

根据json.dump文件：

如果 ensure_ascii 为 True（默认值），则输出中的所有非 ASCII 字符都使用 \uXXXX 序列进行转义，结果是仅由 ASCII 字符组成的 str 实例。如果 ensure_ascii 为 False，则写入 fp 的某些块可能是 unicode 实例。这通常是因为输入包含 unicode 字符串或使用了编码参数。除非 fp.write() 明确理解 unicode（如在 codecs.getwriter() 中），否则这可能会导致错误。

您的代码应如下所示：

import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with open('test.json', 'wb') as f:
    f.write(to_save.encode('utf-8'))

或者

import codecs
import json
j = {'results': [u'חיפה']}
to_save = json.dumps(j, ensure_ascii=False)
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
    f.write(to_save)

或者

import codecs
import json
j = {'results': [u'חיפה']}
with codecs.open('test.json', 'wb', encoding='utf-8') as f:
    json.dump(j, f, ensure_ascii=False)

python - 序列化为保留希伯来语字符的 JSON

1 回答 1

Related

Reference