python - 来自字典python的有效dstack数组

Question

我有一个按日期键入的字典，并填充了具有 numpy.array 属性的类。我想使用 np.dstack 从字典中的所有数组中创建一个大数组。我当前的代码是这样的：

import numpy as np
#PARTS is my dictionary
#the .partposit is the attribute that is an array of shape (50000, 12)
ks = sorted(PARTS.keys())
p1 = PARTS[ks[0]].partposit
for k in ks[1:]:
    p1 = np.dstack((p1, PARTS[k].partposit))

我的结果如我所料：

In [67]: p1.shape
Out[67]: (50000, 12, 163)

但是，它很慢。有没有更有效的方法来做到这一点？

score 3 · Accepted Answer

你可以试试这个：

>>> import numpy as np
>>> class A:
...     def __init__(self, values):
...         self.partposit = values
... 
>>> PARTS = dict((index, A(np.zeros((50000, 12)))) for index in xrange(163))
>>> p1 = np.dstack((PARTS[k].partposit for k in sorted(PARTS.keys())))
>>> p1.shape
(50000, 12, 163)
>>>

花了几秒钟把它堆在我的机器上。

>>> import timeit
>>> timeit.Timer('p1 = np.dstack((PARTS[k].partposit for k in sorted(PARTS.keys())))', "from __main__ import np, PARTS").timeit(number = 1)
2.1245520114898682

numpy.dstack接收一系列数组并将它们堆叠在一起，因此如果我们只给它列表而不是自己连续堆叠它们会更快。

numpy.dstack(tup)

按顺序深度（沿第三轴）堆叠数组。获取一系列数组并将它们沿第三轴堆叠以形成单个数组。

http://docs.scipy.org/doc/numpy/reference/generated/numpy.dstack.html

我也很好奇你的方法会持续多久：

>>> import timeit
>>> setup = """
... import numpy as np
... #PARTS is my dictionary
... #the .partposit is the attribute that is an array of shape (50000, 12)
... 
... class A:
...     def __init__(self, values):
...         self.partposit = values
... 
... PARTS = dict((index, A(np.zeros((50000, 12)))) for index in xrange(163))
... ks = sorted(PARTS.keys())
... """
>>> stack = """
... p1 = PARTS[ks[0]].partposit
... for k in ks[1:]:
...     p1 = np.dstack((p1, PARTS[k].partposit))
... """
>>> timeit.Timer(stack, setup).timeit(number = 1)
67.69684886932373

哎哟！

>>> numpy.__version__
'1.6.1'

$ python --version
Python 2.6.1

我希望这有帮助。

score 0 · Accepted Answer

此行创建一个新列表（浅拷贝），这是不必要的开销：

for k in ks[1:]:

一种更有效的方法是：

itks =iter(ks)
next(itks)
for k in itks:

此外，您可以通过以下方式消除重复查找：

entries = iter(sorted(((k, v.partposit) for k,v in PARTS.iteritems()), key=lambda(k,v):k))
p1 = next(entries)[1]
for k,v in entries: 
    p1 = np.dstack((p1, v))

这将使事情变得稍微快一些，因为它消除了 dict 中的复制和重复查找（虽然是恒定的时间，但不是免费的）。

python - 来自字典python的有效dstack数组

2 回答 2

Related

Reference