python - 有没有办法检查 NumPy 数组是否共享相同的数据？

Question

我的印象是，在 NumPy 中，两个数组可以共享相同的内存。举个例子：

import numpy as np
a=np.arange(27)
b=a.reshape((3,3,3))
a[0]=5000
print (b[0,0,0]) #5000

#Some tests:
a.data is b.data #False
a.data == b.data #True

c=np.arange(27)
c[0]=5000
a.data == c.data #True ( Same data, not same memory storage ), False positive

所以显然b没有复制a; 它只是创建了一些新的元数据并将其附加到a正在使用的同一个内存缓冲区。有没有办法检查两个数组是否引用相同的内存缓冲区？

我的第一印象是使用a.data is b.data，但返回 false。我可以这样做a.data == b.data返回True，但我不认为检查以确保a并b共享相同的内存缓冲区，只是所引用的内存块a和所引用的内存块b具有相同的字节。

score 35 · Accepted Answer

您可以使用base属性来检查一个数组是否与另一个数组共享内存：

>>> import numpy as np
>>> a = np.arange(27)
>>> b = a.reshape((3,3,3))
>>> b.base is a
True
>>> a.base is b
False

不确定这是否能解决您的问题。None如果数组拥有自己的内存，则基本属性将是。请注意，一个数组的基数将是另一个数组，即使它是一个子集：

>>> c = a[2:]
>>> c.base is a
True

score 10 · Accepted Answer

我认为 jterrace 的答案可能是最好的方法，但这是另一种可能性。

def byte_offset(a):
    """Returns a 1-d array of the byte offset of every element in `a`.
    Note that these will not in general be in order."""
    stride_offset = np.ix_(*map(range,a.shape))
    element_offset = sum(i*s for i, s in zip(stride_offset,a.strides))
    element_offset = np.asarray(element_offset).ravel()
    return np.concatenate([element_offset + x for x in range(a.itemsize)])

def share_memory(a, b):
    """Returns the number of shared bytes between arrays `a` and `b`."""
    a_low, a_high = np.byte_bounds(a)
    b_low, b_high = np.byte_bounds(b)

    beg, end = max(a_low,b_low), min(a_high,b_high)

    if end - beg > 0:
        # memory overlaps
        amem = a_low + byte_offset(a)
        bmem = b_low + byte_offset(b)

        return np.intersect1d(amem,bmem).size
    else:
        return 0

例子：

>>> a = np.arange(10)
>>> b = a.reshape((5,2))
>>> c = a[::2]
>>> d = a[1::2]
>>> e = a[0:1]
>>> f = a[0:1]
>>> f = f.reshape(())
>>> share_memory(a,b)
80
>>> share_memory(a,c)
40
>>> share_memory(a,d)
40
>>> share_memory(c,d)
0
>>> share_memory(a,e)
8
>>> share_memory(a,f)
8

这是一个图表，显示了每次调用的时间作为我计算机上share_memory(a,a[::2])元素数量的函数。a

share_memory 函数

score 9 · Accepted Answer

要准确解决问题，您可以使用

import numpy as np

a=np.arange(27)
b=a.reshape((3,3,3))

# Checks exactly by default
np.shares_memory(a, b)

# Checks bounds only
np.may_share_memory(a, b)

两者都np.may_share_memory采用np.shares_memory可选max_work参数，让您决定投入多少精力来确保没有误报。这个问题是 NP 完全的，所以总是找到正确的答案在计算上是相当昂贵的。

score 6 · Accepted Answer

做就是了：

a = np.arange(27)
a.__array_interface__['data']

第二行将返回一个元组，其中第一个条目是内存地址，第二个是数组是否为只读。结合形状和数据类型，您可以计算出数组所覆盖的内存地址的确切跨度，因此当一个数组是另一个数组的子集时，您也可以从中计算出来。

python - 有没有办法检查 NumPy 数组是否共享相同的数据？

4 回答 4

Related

Reference