我有一个大的 JSON 文件 db.json (> 100 Mb),其内容如下:
{"sitters": [["9919.html", 3, 8, 19, 47, 120, 129, 359]], "yellow": [["9945.html", 791],
["9983.html", 1496], ["9984.html", 151]], "four": [["9971.html", 81, 403], ["9991.html", 37],
["9995.html", 45, 225, 337], ["9975.html", 15], ["9978.html", 100], ["9948.html", 381],
["9966.html", 228], ...
其中键是单词,值是文件名,后跟单词在文件中出现的索引。我想从这个 JSON 文件中查询n个单词,然后检索它们对应的文件名和位置。考虑到大文件大小,您知道如何有效地做到这一点吗?我一直在看 IJSON,但我似乎无法让它工作。我努力了:
parser = parse("db.json")
for prefix, event, value in parser:
if event == 'sitters':
print value
但我可能不明白如何正确使用它,因为它给了我以下错误:
Traceback (most recent call last):
File "retriever.py", line 43, in <module>
sys.exit(main())
File "retriever.py", line 38, in main
for prefix, event, value in parser:
File "/usr/local/lib/python2.7/dist-packages/ijson/common.py", line 63, in parse
for event, value in basic_events:
File "/usr/local/lib/python2.7/dist-packages/ijson/backends/yajl2.py", line 90, in basic_parse
buffer = f.read(buf_size)
AttributeError: 'str' object has no attribute 'read'
非常感谢任何帮助!