2

在使用 Python 的 GAE 上,我使用 urlfetch 从 Flickr 获取 json 字符串。当我尝试在生产服务器上使用 json.loads 加载该字符串时,抛出异常“raised ValueError(Unpaired high surrogate)”。

当我尝试在开发控制台中 json.loads 字符串时,它会按预期加载到字典中(见下文)。我已经使用相同的代码从 Flickr 成功加载了其他几个 json 字符串。下面的 json 字符串有些内容仅在生产服务器上引发 ValueError 异常。

import json

s = """{"photo":{"id":"191019103", "secret":"d7a8bb95bc", "server":"72", "farm":1, "dateuploaded":"1153079847", "isfavorite":0, "license":"1", "safety_level":"0", "rotation":0, "originalsecret":"d7a8bb95bc", "originalformat":"jpg", "owner":{"nsid":"13968020@N00", "username":"\ud800dc80 jgraham", "realname":"", "location":"", "iconserver":"38", "iconfarm":1}, "title":{"_content":"By the Year 2000 All Our Food Will be in the Form of Tiny Pills"}, "description":{"_content":""}, "visibility":{"ispublic":1, "isfriend":0, "isfamily":0}, "dates":{"posted":"1153079847", "taken":"2006-07-15 14:31:16", "takengranularity":"0", "lastupdate":"1282690106"}, "views":"984", "editability":{"cancomment":0, "canaddmeta":0}, "publiceditability":{"cancomment":1, "canaddmeta":0}, "usage":{"candownload":1, "canblog":0, "canprint":0, "canshare":1}, "comments":{"_content":"18"}, "notes":{"note":[]}, "people":{"haspeople":0}, "tags":{"tag":[{"id":"1207251-191019103-2909", "author":"13968020@N00", "raw":"Birmingham", "_content":"birmingham", "machine_tag":0}, {"id":"1207251-191019103-77552", "author":"13968020@N00", "raw":"Bullring", "_content":"bullring", "machine_tag":0}, {"id":"1207251-191019103-463", "author":"13968020@N00", "raw":"Abstract", "_content":"abstract", "machine_tag":0}, {"id":"1207251-191019103-1174", "author":"13968020@N00", "raw":"Architecture", "_content":"architecture", "machine_tag":0}, {"id":"1207251-191019103-141", "author":"13968020@N00", "raw":"Blue", "_content":"blue", "machine_tag":0}, {"id":"1207251-191019103-2194948", "author":"13968020@N00", "raw":"i500", "_content":"i500", "machine_tag":0}, {"id":"1207251-191019103-11820", "author":"13968020@N00", "raw":"Explore", "_content":"explore", "machine_tag":0}, {"id":"1207251-191019103-3254511", "author":"13968020@N00", "raw":"utata_feature", "_content":"utatafeature", "machine_tag":0}]}, "urls":{"url":[{"type":"photopage", "_content":"http:\/\/www.flickr.com\/photos\/jgraham\/191019103\/"}]}, "media":"photo"}, "stat":"ok"}"""

print json.loads(s) #prints dict
4

4 回答 4

0

sudo pip install simplejson==3.6.5

import simplejson
simplejson.loads('{"":"\\ud800"}')

我对低代理(\udfb6)有同样的问题

于 2015-03-18T15:00:10.270 回答
0

此问题已针对 Python 2.7.7 及更高版本修复。

http://bugs.python.org/issue11489

https://hg.python.org/cpython/raw-file/v2.7.7/Misc/NEWS

但是,截至 2016 年 3 月 11 日,Google App Engine 正在生产中运行 Python 2.7.5,它没有 json 模块的补丁 11489。

我就这个问题与 GAE 支持团队进行了交谈,他们提出了一个 Google Public Issue Tracker:

https://code.google.com/p/googleappengine/issues/detail?id=12823

与此同时,按照mihaicc的建议,使用simplejson模块而不是标准 json 模块,看起来是最好的解决方案。我测试了 simplejson 版本 3.8.2,它都可以在 GAE 上运行并且没有产生错误。

于 2016-03-11T19:29:47.470 回答
0

如果您有一个每行一条记录的大 json 文件,并且不介意遗漏某些行,则可以忽略这些行。

grep -v "\\\\ud" file.json > file2.json
于 2020-02-19T22:57:07.030 回答
0

GAE Standard 的 Python 2 运行时在 2017 年 6 月从 2.7.5 迁移到 2.7.12,因此这应该不再是问题。您可以在https://shell-hrd.appspot.com/上进行测试:

Google App Engine/1.9.86
Python 2.7.12 (default, Jun 12 2019, 11:33:04) 
[GCC 4.2.1 Compatible Clang google3-trunk (trunk r361749)]

>>> import json
>>> json.loads('{"":"\\ud800"}')
{u'': u'\ud800'}
于 2019-10-04T22:54:54.857 回答