python - 如何使用 bsddb3 将（长）整数值写入 Berkeley DB？

Question

我正在尝试使用 Berkeley DB 来存储频率表（即带有字符串键和整数值的哈希表）。该表将从 Python 中写入、更新和读取；所以我目前正在试验bsddb3。这看起来可以完成我想要的大部分工作，只是它看起来只支持字符串值？

如果我理解正确，Berkeley DB 支持任何类型的二进制键和值。有没有办法使用 bsddb3 有效地将原始长整数传入/传出 Berkeley DB？我知道我可以将值转换为字符串/从字符串转换，这可能是我最终要做的，但有没有更有效的方法？即通过存储“原始”整数？

背景：我目前正在使用一个大的（可能是数十，如果不是数亿，数百万个键）频率表。这目前是使用 Python 字典实现的，但是当它开始交换到虚拟内存时我中止了脚本。是的，我查看了 Redis，但这会将整个数据库存储在内存中。所以我要试试 Berkeley DB。我应该可以通过使用短期内存缓存来提高创建效率。即创建一个内存中的 Python 字典，然后定期将其添加到主伯克利数据库频率表中。

score 1 · Accepted Answer

您是否需要从 python 以外的语言读回数据？如果没有，您可以在 python 长整数上使用 pickle，并在您读回它们时取消它们。您可能能够（可能能够）使用该shelve模块，它会自动为您执行此操作。但即使没有，您也可以手动腌制和取消腌制这些值。

>>> import cPickle as pickle
>>> pickle.dumps(19999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, pickle.HIGHEST_PROTOCOL)
'\x80\x02\x8a(\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x7fT\x97\x05p\x0b\x18J#\x9aA\xa5.{8=O,f\xfa\x81|\xa1\xef\xaa\xfd\xa2e\x02.'
>>> pickle.loads('\x80\x02\x8a(\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\x7fT\x97\x05p\x0b\x18J#\x9aA\xa5.{8=O,f\xfa\x81|\xa1\xef\xaa\xfd\xa2e\x02.')
19999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999L

score 0 · Accepted Answer

Python struct to convert an integer to bytes in Python 3 or string in Python 2. Depending on your data you might use different packing format for unsigned long long or uint64_t :

struct.unpack('>Q', my_integer)

This will return the byte representation of my_integer on bigendian which match the lexicographical order required by bsddb key values. You can come with smarter packing function (have a look at wiredtiger.intpacking) to save a space.

You don't need a Python cache, use DBEnv.set_cache_max and set_cache.

python - 如何使用 bsddb3 将（长）整数值写入 Berkeley DB？

2 回答 2

Related

Reference