python - Python ijson - 嵌套解析

Question

我正在使用 JSON 的 Web 响应，看起来像这样（简化，我无法更改格式）：

[
   { "type": "0","key1": 3, "key2": 5},
   { "type": "1","key3": "a", "key4": "b"},
   { "type": "2", "data": [<very big array here>] }
]

我想做两件事：

检查前两个对象而不将所有内容读入内存，我可以使用 Ijson 来做到这一点：

parsed = ijson.items(res.raw, 'item')
next(parsed) # first item
next(parsed) # second item

检查第三个对象而不将其全部存储在内存中。如果我再做next(parsed)一次，所有的“数据”数组都将被读入内存并变成一个字典，我想避免它。
检查数据数组而不将其全部加载到内存中。如果我不关心其他键，我可以这样做：

parsed = ijson.items(res.raw, 'item.data.item') # iterator over data's items

问题是，我需要在同一个流上完成所有这些操作。

理想情况下，将第三个对象作为类似文件的对象接收会很棒，我可以再次将其传递给 ijson，但这似乎超出了该 API 的范围。

我也可以将 ijson 替换为可以做得更好的库。

score 1 · Accepted Answer

您需要使用 ijson 的事件拦截机制。基本上通过使用在解析逻辑中下降一级，ijson.parse直到你命中大数组，然后切换到使用ijson.items其余parse事件。这使用字符串文字，但应该说明这一点：

import ijson

s = b'''[
   { "type": "0","key1": 3, "key2": 5},
   { "type": "1","key3": "a", "key4": "b"},
   { "type": "2", "data": [1, 2, 3] }
]'''
parse_events = ijson.parse(s)
while True:
    path, name, value = next(parse_events)
    # do stuff with path, name, data, until...
    if name == 'map_key' and value == 'data':
        break
for value in ijson.items(parse_events, 'item.data.item'):
    print(value)

python - Python ijson - 嵌套解析

1 回答 1

Related

Reference