我在理解(并最终解决)为什么在内存中有一个大字典会使其他字典的创建时间更长时遇到一些困难。
这是我正在使用的测试代码
import time
def create_dict():
# return {x:[x]*125 for x in xrange(0, 100000)}
return {x:(x)*125 for x in xrange(0, 100000)} # UPDATED: to use tuples instead of list of values
class Foo(object):
@staticmethod
def dict_init():
start = time.clock()
Foo.sample_dict = create_dict()
print "dict_init in Foo took {0} sec".format(time.clock() - start)
if __name__ == '__main__':
Foo.dict_init()
for x in xrange(0, 10):
start = time.clock()
create_dict()
print "Run {0} took {1} seconds".format(x, time.clock() - start)
如果我按原样运行代码(首先在 Foo 中初始化 sample_dict),然后在循环中再创建 10 次相同的字典,我会得到以下结果:
dict_init in Foo took 0.385263764287 sec
Run 0 took 0.548807949139 seconds
Run 1 took 0.533209452471 seconds
Run 2 took 0.51916067636 seconds
Run 3 took 0.513130722575 seconds
Run 4 took 0.508272050029 seconds
Run 5 took 0.502263872177 seconds
Run 6 took 0.48867601998 seconds
Run 7 took 0.483109299676 seconds
Run 8 took 0.479019713488 seconds
Run 9 took 0.473174195256 seconds
[Finished in 5.6s]
但是,如果我不在 Foo 中初始化 sample_dict(注释掉 Foo.dict_init()),我在循环中的字典创建速度几乎快了 20%
Run 0 took 0.431378921359 seconds
Run 1 took 0.423696636179 seconds
Run 2 took 0.419630475616 seconds
Run 3 took 0.405130343806 seconds
Run 4 took 0.398099686921 seconds
Run 5 took 0.392837169802 seconds
Run 6 took 0.38799598399 seconds
Run 7 took 0.375133006408 seconds
Run 8 took 0.368755297573 seconds
Run 9 took 0.363273701371 seconds
[Finished in 4.0s]
我注意到,如果我通过调用 gc.disable() 来关闭 Python 的垃圾收集器,性能不仅提高了约 5 倍,而且在 Foo 中存储大字典也没有什么不同。以下是禁用垃圾收集的结果:
dict_init in Foo took 0.0696136982496 sec
Run 0 took 0.113533445358 seconds
Run 1 took 0.111091241489 seconds
Run 2 took 0.111151620212 seconds
Run 3 took 0.110655722831 seconds
Run 4 took 0.111807537706 seconds
Run 5 took 0.11097510318 seconds
Run 6 took 0.110936170451 seconds
Run 7 took 0.111074414632 seconds
Run 8 took 0.110678488579 seconds
Run 9 took 0.111011066463 seconds
所以我有两个问题:
- 为什么禁用垃圾收集会加快字典创建速度
- 如何在不禁用垃圾收集的情况下实现均匀的性能(使用 Foo init 和 w/o)
我将不胜感激对此的任何见解。
谢谢!
更新:在 Tim Peters 提到我正在创建可变对象之后,我决定尝试创建不可变对象(在我的情况下为元组),瞧——结果要快得多(使用 gc 和不使用 gc 相同)
dict_init in Foo took 0.017769 sec
Run 0 took 0.017547 seconds
Run 1 took 0.013234 seconds
Run 2 took 0.012791 seconds
Run 3 took 0.013371 seconds
Run 4 took 0.013288 seconds
Run 5 took 0.013692 seconds
Run 6 took 0.013059 seconds
Run 7 took 0.013311 seconds
Run 8 took 0.013343 seconds
Run 9 took 0.013675 seconds
我知道创建元组比创建列表要快得多,但是为什么拥有不可变对象的字典不会影响垃圾收集所花费的时间?不可变对象不参与引用循环吗?
谢谢你。
PS 碰巧的是,在我的实际场景中,将列表转换为元组解决了这个问题(从来不需要列表,只是没想过使用元组),但我仍然很好奇为什么它更快。