基于georg、NPE、Scott和Havok答案的附加说明。
我试图对 2 个或更多词典的集合执行此操作,并且有兴趣查看每个词典所花费的时间。因为我想在任意数量的字典上这样做,我不得不稍微改变一些答案。如果有人对他们有更好的建议,请随时编辑。
这是我的测试方法。我最近对其进行了更新,以包含使用更大字典的测试,并再次包含 Havok 和 Scott 的新方法:
首先,我使用了以下数据:
import random
x = {'xy1': 1, 'xy2': 2, 'xyz': 3, 'only_x': 100}
y = {'xy1': 10, 'xy2': 20, 'xyz': 30, 'only_y': 200}
z = {'xyz': 300, 'only_z': 300}
small_tests = [x, y, z]
# 200,000 random 8 letter keys
keys = [''.join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(8)) for _ in range(200000)]
a, b, c = {}, {}, {}
# 50/50 chance of a value being assigned to each dictionary, some keys will be missed but meh
for key in keys:
if random.getrandbits(1):
a[key] = random.randint(0, 1000)
if random.getrandbits(1):
b[key] = random.randint(0, 1000)
if random.getrandbits(1):
c[key] = random.randint(0, 1000)
large_tests = [a, b, c]
print("a:", len(a), "b:", len(b), "c:", len(c))
#: a: 100069 b: 100385 c: 99989
现在每个方法:
from collections import defaultdict, Counter
from functools import reduce
def georg_method(tests):
return {k: sum(t.get(k, 0) for t in tests) for k in set.union(*[set(t) for t in tests])}
def georg_method_nosum(tests):
# If you know you will have exactly 3 dicts
return {k: tests[0].get(k, 0) + tests[1].get(k, 0) + tests[2].get(k, 0) for k in set.union(*[set(t) for t in tests])}
def npe_method(tests):
ret = defaultdict(int)
for d in tests:
for k, v in d.items():
ret[k] += v
return dict(ret)
# Note: There is a bug with scott's method. See below for details.
# Scott included a similar version using counters that is fixed
# See the scott_update_method below
def scott_method(tests):
return dict(sum((Counter(t) for t in tests), Counter()))
def scott_method_nosum(tests):
# If you know you will have exactly 3 dicts
return dict(Counter(tests[0]) + Counter(tests[1]) + Counter(tests[2]))
def scott_update_method(tests):
ret = Counter()
for test in tests:
ret.update(test)
return dict(ret)
def scott_update_method_static(tests):
# If you know you will have exactly 3 dicts
xx = Counter(tests[0])
yy = Counter(tests[1])
zz = Counter(tests[2])
xx.update(yy)
xx.update(zz)
return dict(xx)
def havok_method(tests):
def reducer(accumulator, element):
for key, value in element.items():
accumulator[key] = accumulator.get(key, 0) + value
return accumulator
return reduce(reducer, tests, {})
methods = {
"georg_method": georg_method, "georg_method_nosum": georg_method_nosum,
"npe_method": npe_method,
"scott_method": scott_method, "scott_method_nosum": scott_method_nosum,
"scott_update_method": scott_update_method, "scott_update_method_static": scott_update_method_static,
"havok_method": havok_method
}
我还写了一个快速函数来查找列表之间的任何差异。Counter()
不幸的是,那时我在 Scott 的方法中发现了问题,即,如果您的字典总数为 0,则由于添加时的行为方式,字典根本不会被包含在内。
测试设置:
- MacBook Pro(15 英寸,2016 年末),2.9 GHz Intel Core i7,16 GB 2133 MHz LPDDR3 RAM,运行 macOS Mojave 版本 10.14.5
- Python 3.6.5 通过 IPython 6.1.0
最后,结果:
结果:小测试
for name, method in methods.items():
print("Method:", name)
%timeit -n10000 method(small_tests)
#: Method: georg_method
#: 7.81 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: georg_method_nosum
#: 4.6 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: npe_method
#: 3.2 µs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method
#: 24.9 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method_nosum
#: 18.9 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method
#: 9.1 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method_static
#: 14.4 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: havok_method
#: 3.09 µs ± 47.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
结果:大型测试
自然,不能在尽可能多的循环附近运行
for name, method in methods.items():
print("Method:", name)
%timeit -n10 method(large_tests)
#: Method: georg_method
#: 347 ms ± 20 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: georg_method_nosum
#: 280 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: npe_method
#: 119 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method
#: 324 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method_nosum
#: 289 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method
#: 123 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method_static
#: 136 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: havok_method
#: 103 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
结论
╔═══════════════════════════╦═══════╦═════════════════════════════╗
║ ║ ║ Best of Time Per Loop ║
║ Algorithm ║ By ╠══════════════╦══════════════╣
║ ║ ║ small_tests ║ large_tests ║
╠═══════════════════════════╬═══════╬══════════════╬══════════════╣
║ functools reduce ║ Havok ║ 3.1 µs ║ 103,000 µs ║
║ defaultdict sum ║ NPE ║ 3.2 µs ║ 119,000 µs ║
║ Counter().update loop ║ Scott ║ 9.1 µs ║ 123,000 µs ║
║ Counter().update static ║ Scott ║ 14.4 µs ║ 136,000 µs ║
║ set unions without sum() ║ georg ║ 4.6 µs ║ 280,000 µs ║
║ set unions with sum() ║ georg ║ 7.8 µs ║ 347,000 µs ║
║ Counter() without sum() ║ Scott ║ 18.9 µs ║ 289,000 µs ║
║ Counter() with sum() ║ Scott ║ 24.9 µs ║ 324,000 µs ║
╚═══════════════════════════╩═══════╩══════════════╩══════════════╝
重要的。YMMV。