python - 转移 numpy 数据的所有权

Question

在我之前的问题中，我学会了调整子类ndarray的大小。整洁的。不幸的是，当我试图调整大小的数组是计算的结果时，这不再有效：

import numpy as np

class Foo(np.ndarray):
    def __new__(cls,shape,dtype=np.float32,buffer=None,offset=0,
                strides=None,order=None):
        return np.ndarray.__new__(cls,shape,dtype,buffer,offset,strides,order)

    def __array_prepare__(self,output,context):
        print output.flags['OWNDATA'],"PREPARE",type(output)
        return np.ndarray.__array_prepare__(self,output,context)

    def __array_wrap__(self,output,context=None):
        print output.flags['OWNDATA'],"WRAP",type(output)

        return np.ndarray.__array_wrap__(self,output,context)

a = Foo((32,))
#resizing a is no problem
a.resize((24,),refcheck=False)

b = Foo((32,))
c = Foo((32,))

d = b+c
#Cannot resize `d`
d.resize((24,),refcheck=False)

确切的输出（包括回溯）是：

True PREPARE <type 'numpy.ndarray'>
False WRAP <class '__main__.Foo'>
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    d.resize((24,),refcheck=False)
ValueError: cannot resize this array: it does not own its data

我认为这是因为numpy创建了一个新的ndarray并将其传递给__array_prepare__. 尽管在此过程中的某个时刻，似乎 " output" 数组被视图投射到我的Footype，尽管在这一点上文档似乎不是 100% 清晰/准确。无论如何，在视图转换之后，输出不再拥有数据，因此无法就地重塑（据我所知）。

有没有办法通过某种 numpy voodoo ( __array_prepare__, __array__) 等将数据的所有权转移到我的子类的实例？

score 6 · Accepted Answer

这几乎不是一个令人满意的答案，但它也不适合评论......您可以通过使用 ufunc 的out参数来解决数据的所有权问题。一个愚蠢的例子：

>>> a = Foo((5,))
>>> b = Foo((5,))
>>> c = a + b # BAD
True PREPARE <type 'numpy.ndarray'>
False WRAP <class '__main__.Foo'>
>>> c.flags.owndata
False

>>> c = Foo((5,))
>>> c[:] = a + b # BETTER
True PREPARE <type 'numpy.ndarray'>
False WRAP <class '__main__.Foo'>
>>> c.flags.owndata
True

>>> np.add(a, b, out=c) # BEST
True PREPARE <class '__main__.Foo'>
True WRAP <class '__main__.Foo'>
Foo([  1.37754085e-38,   1.68450356e-20,   6.91042737e-37,
         1.74735556e-04,   1.48018885e+29], dtype=float32)
>>> c.flags.owndata
True

我认为上面的输出与以从临时数组c[:] = a + b中复制数据为代价来拥有数据是一致的。但是，当您使用该参数c时，不应该发生这种情况。out

由于您已经担心数学表达式中的中间存储，因此微观管理它的处理方式可能不是一件坏事。也就是说，替换

g = a + b + np.sqrt(d*d + e*e + f*f)

和

g = foo_like(d) # you'll need to write this function!
np.multiply(d, d, out=g)
g += e * e
g += f * f
np.sqrt(g, out=g)
g += b
g += a

可能会为您节省一些中间内存，并让您拥有自己的数据。它确实抛出了“可读性计数”的口头禅，但是......

score 1 · Accepted Answer

不过，在此过程中的某个时刻，“输出”数组似乎被转换为我的 Foo 类型

是的，ndarray.__array_prepare__调用output.view，它返回一个不拥有其数据的数组。

我做了一些实验，但找不到一个简单的方法。

虽然我同意这种行为并不理想，但至少在您的用例中，我认为d不拥有其数据是可以接受的。Numpy 广泛使用视图，如果您坚持避免在使用 numpy 数组时创建任何视图，那么您的生活将变得非常艰难。

我还声称，根据我的经验，resize通常应该避免。如果您避免使用创建的视图，您应该不会有任何问题resize。它有一种 hacky 的感觉，并且很难使用（正如您可能开始理解的那样，在使用它时遇到了两个经典错误之一：it does not own its data. 另一个是cannot resize an array that has been referenced）。（this question中描述了另一个问题。）

由于您的使用决定resize来自对您其他问题的回答，因此我将在此处发布我的其余答案。

score 0 · Accepted Answer

怎么样：

def resize(arr, shape):
    np.require(arr, requirements=['OWNDATA'])
    arr.resize(shape, refcheck=False)

它似乎成功地调整了大小（并减少了内存消耗）：

import array
import numpy as np
import time

class Foo(np.ndarray):
    def __new__(cls, shape, dtype=np.float32, buffer=None, offset=0,
                strides=None, order=None):
        return np.ndarray.__new__(cls, shape, dtype, buffer, offset, strides, order)

    def __array_prepare__(self, output, context):
        print(output.flags['OWNDATA'], "PREPARE", type(output))
        return np.ndarray.__array_prepare__(self, output, context)

    def __array_wrap__(self, output, context=None):
        print(output.flags['OWNDATA'], "WRAP", type(output))
        output = np.ndarray.__array_wrap__(self, output, context)
        return output

def free_memory():
    """
    Return free memory available, including buffer and cached memory
    """
    total = 0
    with open('/proc/meminfo', 'r') as f:
        for line in f:
            line = line.strip()
            if any(line.startswith(field) for field in ('MemFree', 'Buffers', 'Cached')):
                field, amount, unit = line.split()
                amount = int(amount)
                if unit != 'kB':
                    raise ValueError(
                        'Unknown unit {u!r} in /proc/meminfo'.format(u=unit))
                total += amount
    return total


def gen_change_in_memory():
    """
    http://stackoverflow.com/a/14446011/190597 (unutbu)
    """
    f = free_memory()
    diff = 0
    while True:
        yield diff
        f2 = free_memory()
        diff = f - f2
        f = f2
change_in_memory = gen_change_in_memory().next

def resize(arr, shape):
    print(change_in_memory())
    # 0
    np.require(arr, requirements=['OWNDATA'])

    time.sleep(1)
    print(change_in_memory())
    # 200

    arr.resize(shape, refcheck=False)

N = 10000000
b = Foo((N,), buffer = array.array('f',range(N)))
c = Foo((N,), buffer = array.array('f',range(N)))

产量

print(change_in_memory())
# 0

d = b+c
d = np.require(d, requirements=['OWNDATA'])

print(change_in_memory())
# 39136

resize(d, (24,))   # Increases memory by 200 KiB
time.sleep(1)
print(change_in_memory())
# -39116

python - 转移 numpy 数据的所有权

3 回答 3

Related

Reference