python - Python：在 numpy 中定义矩阵的内存优化方式

Question

import numpy as np

前言：

如果您对阅读感到无聊，请跳过前言，因为您已经知道这一点。

我最近在调试时遇到了一个问题。我写了 `A = B = C = np.zeros([3,3]) 我以为我刚刚定义了三个新矩阵。我所做的实际上是不同的。我定义了一个新矩阵（用零填充）和三个标签，每个标签都引用同一个矩阵。让我用下面的例子来说明：

>>> a = b = [0,0]
>>> a
[0,0]
>>> b
[0,0]
>>> # All good so far.
>>> a[0] = 1
>>> a
[1,0]
>>> # Nothing short of what one would expect...
>>> b
[1,0]
>>> # ... but since 'b' is assigned tot he same tuple, it changes as well.

问题：

好。现在我知道这没问题了，对吧？当然我可以写：

A = np.zeros([3,3])
B = np.zeros([3,3])
C = np.zeros([3,3])

一切正常吗？没错，但我同样可以写：

A, B, C = np.zeros([3,3,3])

I would think that the second option uses memory in more efficient way since it defines a 3x3x3 tensor and then 3 labels A, B and C for each of it's layers instead of three separate matrices with possible bits of memory between them.

Which one would you think is better?

score 3 · Accepted Answer

Most of all, it smells like premature optimization. If we're talking about a small number of matrices, it doesn't matter either way. If we're talkiing about a large number of matrices, you're not likely to make use of unpacking.

Having said that, the second option involves creating a larger underlying storage, while the first one creates three separate storages. The former is somewhat more efficient if all three matrices share the same lifetime. The latter is more readable, and allows releasing memory of individual matrices. If this kind of optimization matters to you at all, measure.

score 3 · Accepted Answer

I made a simple test to see what is going on in the two cases (Code and results below). As suspected the latter approach allocates memory linearly while the former is scattering it where ever it is allowed to by the system (which I would have expected to be more tightly than it was). So the latter is more efficient in terms of memory location. But in terms of allocation time (timing the script below) they are equivalent (the unpacking seems to steal some time and we are talking about almost very small numbers). So thinking about this is most likely premature optimisation.

import sys
import numpy as np
nr = 1000
rounds =1000
if len(sys.argv)==2:
    if sys.argv[1]=='seq':
        print "testing sequential allocation"
        for i in xrange(rounds):
            a=np.zeros([nr,nr])
            b=np.zeros([nr,nr])
            c=np.zeros([nr,nr])
    elif sys.argv[1]=='all':
        print "testing allocating all at once"
        for i in xrange(rounds):
            A,B,C=np.zeros([3,nr,nr])
        
        
a=np.zeros([3,3])
b=np.zeros([3,3])
c=np.zeros([3,3])
A,B,C=np.zeros([3,3,3])

print "diff in location b-a", b.__array_interface__['data'][0]-a.__array_interface__['data'][0]
print "diff in location c-a", c.__array_interface__['data'][0]-a.__array_interface__['data'][0]
print "diff in location B-A", B.__array_interface__['data'][0]-A.__array_interface__['data'][0]
print "diff in location C-A", C.__array_interface__['data'][0]-A.__array_interface__['data'][0]

OUTPUT

> diff in location b-a -125520

>diff in location c-a -173376

>diff in location B-A 72

>diff in location C-A 144

python - Python：在 numpy 中定义矩阵的内存优化方式

前言：

问题：

2 回答 2

Related

Reference