8

这可能是微不足道的,但我不确定我是否理解,我尝试谷歌搜索但没有找到令人信服的答案。

>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
        yet_another_dict[i] = i**2

>>> 
>>> sys.getsizeof(yet_another_dict)
98444

我怎么理解这个?为什么空字典的大小与非空字典的大小相同?

4

2 回答 2

11

有两个原因:

  1. 字典只保存对对象的引用,而不是对象本身,因此它的大小与其包含的对象的大小无关,而是与字典包含的引用(项目)的数量相关。

  2. 更重要的是,字典为块中的引用预先分配内存。因此,当您创建字典时,它已经为第一个n引用预分配了内存。当它填满内存时,它会预先分配一个新块。

您可以观察该行为,运行下一个代码和平。

d = {}
size = sys.getsizeof(d)
print size
i = 0
j = 0
while i < 3:
    d[j] = j
    j += 1
    new_size = sys.getsizeof(d)
    if size != new_size:
        print new_size
        size = new_size
        i += 1

打印出来:

280
1048
3352
12568

在我的机器上,但这取决于架构(32 位、64 位)。

于 2013-09-01T13:30:39.920 回答
7

CPython 中的字典直接在字典对象本身中分配少量的键空间(4-8 个条目,具体取决于版本和编译选项)。来自dictobject.h

/* PyDict_MINSIZE is the minimum size of a dictionary.  This many slots are
 * allocated directly in the dict object (in the ma_smalltable member).
 * It must be a power of 2, and at least 4.  8 allows dicts with no more
 * than 5 active entries to live in ma_smalltable (and so avoid an
 * additional malloc); instrumentation suggested this suffices for the
 * majority of dicts (consisting mostly of usually-small instance dicts and
 * usually-small dicts created to pass keyword arguments).
 */
#ifndef Py_LIMITED_API
#define PyDict_MINSIZE 8

请注意,CPython 还会批量调整字典的大小,以避免频繁重新分配不断增长的字典。来自dictobject.c

/* If we added a key, we can safely resize.  Otherwise just return!
 * If fill >= 2/3 size, adjust size.  Normally, this doubles or
 * quaduples the size, but it's also possible for the dict to shrink
 * (if ma_fill is much larger than ma_used, meaning a lot of dict
 * keys have been * deleted).
 *
 * Quadrupling the size improves average dictionary sparseness
 * (reducing collisions) at the cost of some memory and iteration
 * speed (which loops over every possible entry).  It also halves
 * the number of expensive resize operations in a growing dictionary.
 *
 * Very large dictionaries (over 50K items) use doubling instead.
 * This may help applications with severe memory constraints.
 */
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
    return 0;
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);
于 2013-09-01T13:30:27.627 回答