python - Python：存储/检索/更新大量任意对象

Question

我有几百万条记录要经常存储、检索、删除。这些记录中的每一个都有一个“键”，但是“值”不容易翻译成字典，因为它是从我没有编写的模块方法返回的任意 Python 对象（我知道很多分层数据结构喜欢json作为字典更好地工作，并且不确定json在任何情况下是否是首选数据库）。

我正在考虑将每个条目腌制在一个单独的文件中。有没有更好的办法？

score 3 · Accepted Answer

使用shelve模块。

您可以将它用作字典，就像 in 一样json，但它使用 pickle 存储对象。

来自 python 官方文档：

import shelve

d = shelve.open(filename) # open -- file may get suffix added by low-level
                          # library

d[key] = data   # store data at key (overwrites old data if
                # using an existing key)
data = d[key]   # retrieve a COPY of data at key (raise KeyError if no
                # such key)
del d[key]      # delete data stored at key (raises KeyError
                # if no such key)
flag = d.has_key(key)   # true if the key exists
klist = d.keys() # a list of all existing keys (slow!)

# as d was opened WITHOUT writeback=True, beware:
d['xx'] = range(4)  # this works as expected, but...
d['xx'].append(5)   # *this doesn't!* -- d['xx'] is STILL range(4)!

# having opened d without writeback=True, you need to code carefully:
temp = d['xx']      # extracts the copy
temp.append(5)      # mutates the copy
d['xx'] = temp      # stores the copy right back, to persist it

# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.

d.close()       # close it

score 1 · Accepted Answer

我会评估键/值数据库（如 berkeleydb、kyoto cabinet 或其他）的使用情况。这将为您提供所有花哨的东西以及更好的磁盘空间处理。在块大小为 4096B 的文件系统中，无论对象的大小是多少，一百万个文件占用约 4GB（作为下限，如果对象大于 4096B，则大小会增加）。

python - Python：存储/检索/更新大量任意对象

2 回答 2

Related

Reference