1

我需要将包含 JSON 格式文本的 UTF-8 数据文件中的一些文本加载到 JSON 对象中,然后对其进行解析。我无法控制文件的内容或格式,必须处理给我的内容。我也无法控制 Python 版本,即 2.7。

此文本文件中至少有一个值包含 \n。因此,运行脚本将导致如下错误: Expecting , delimiter: line 8 column 102 (char 470)'

文件内容如下所示:

{
"key1": "Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
"key2": "Suspendisse eu tincidunt velit. Proin mollis ligula a arcu feugiat ac imperdiet nunc sagittis. Etiam egestas fringilla tristique.\nCurabitur interdum dolor eu velit gravida et convallis purus facilisis. Aenean eu enim mi.",

"key3": "Nunc intérdum mågna nec nîbh faucibus non laoreet nisi blandit. Nunc lobortis ligula ut tellus semper in hendrerit mauris malesuada.",

"key4": "Vivamus erat turpis, fringilla id sollicitudin non, pellentesque vel lacus. Praesent placerat dapibus mauris vel hendrerit. Integer a augue leo, facilisis viverra dui. Maecenas sollicitudin adipiscing viverra. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam vestibulum commodo diam, vitae ultrices quam viverra eu. Proin eros sapien, scelerisque non condimentum vel, placerat at est. Ut fermentum mattis lacus, a eleifend ipsum euismod ac. Quisque mollis bibendum quam nec sollicitudin."

}

相关代码:

def processText(stringData):
    j = json.loads(data, encoding='utf-8')
    # do stuff that I can't change


dataFile = codecs.open('/path/to/file', 'r', 'utf-8')    
data = dataFile.read()
dataFile.close()
processText(data)

我尝试了以下方法:

  1. json.loads(data.replace('\n','\\n')文本文件包含约 15,000 个字符,因此这只会导致脚本挂起。
  2. json.loads("%r"%d)其中 d 是包含字符串的变量。这会导致错误“无法解码 JSON 对象”,因为它将文件中的每个换行符转换为 \n,这不是需要发生的。

如何将此字符串加载到 JSON 对象中?

4

2 回答 2

0

json string may contain \n (two characters). So if you see in a file \n; it is ok.

Valid json (as seen in a file (not Python source code)):

["line\nanother"]

Invalid:

["line
another"]

The confusion might arise if you include the file content as a Python string literal. In this case \n is also interpreted by Python before it gets to json parser. To avoid it, use r'' literals:

json_text = r'["line\nanother"]'
print(json_text) # -> ["line\nanother"]
d = json.loads(json_text)

If you read json from a file; you don't need to do anything (\n are just another two bytes; you'll get them as is):

with open(filename) as file: # don't need `codecs` if file is in utf-8
    d = json.load(file)
    # analog of: d = json.loads(file.read())
于 2013-03-22T03:35:59.340 回答
-1

设置strict=Falsejson.loads()

body_unicode = request.body.decode('utf-8') 
body = json.loads(body_unicode,strict=False)
于 2021-11-20T16:03:40.743 回答