arrays - 转置 numpy 数组对其步幅和数据缓冲区的影响

Question

假设给你一个 numpy 数组

x = np.array([[1,2],[3,4]], dtype=np.int8)

让我们来看看它的转置。

y = x.T

我对 numpy 文档的理解是，转置只修改了数组的步幅，而不是它的底层数据缓冲区。

我们可以通过运行来验证

>> x.data.strides
(2, 1)

>> y.data.strides
(1, 2)

但是，数据似乎也被修改了

>> x.data.tobytes()
b'\x01\x02\x03\x04'

>> y.data.tobytes()
b'\x01\x03\x02\x04'

根据我的理解，当预期的行为应该是y的数据缓冲区与的数据缓冲区保持一致时x，只有步幅发生变化。

为什么我们看到不同的数据缓冲区y？也许该data属性没有显示底层内存布局？

score 3 · Accepted Answer

检查数据缓冲区的更好方法是使用__array_interface__指针：

In [8]: y=x.T
In [9]: x.__array_interface__
Out[9]: 
{'strides': None,
 'data': (144597512, False),
 'shape': (2, 2),
 'version': 3,
 'typestr': '|i1',
 'descr': [('', '|i1')]}
In [10]: y.__array_interface__
Out[10]: 
{'strides': (1, 2),
 'data': (144597512, False),
 'shape': (2, 2),
 'version': 3,
 'typestr': '|i1',
 'descr': [('', '|i1')]}

文档.data是：

在 [12] 中：x.data? memoryview(object) 创建一个引用给定对象的新 memoryview 对象。

In [13]: x.data
Out[13]: <memory at 0xb2f7cb6c>
In [14]: y.data
Out[14]: <memory at 0xb2f7cbe4>

所以y.data不是显示其缓冲区的字节，而是显示跨步遍历的字节。我不确定是否有办法查看y数据缓冲区。

In [25]: y.base
Out[25]: 
array([[1, 2],
       [3, 4]], dtype=int8)

x是 C 连续的，y是 F 连续的。

score 1 · Accepted Answer

作为@hpaulj 好答案的补充：

In [7]: frombuffer(x,uint8)
Out[7]: array([1, 2, 3, 4], dtype=uint8)

In [8]: frombuffer(y,uint8) 
ValueError: ndarray is not C-contiguous

In [9]: frombuffer(np.ascontiguousarray(y),uint8)
Out[9]: array([1, 3, 2, 4], dtype=uint8)

表明这y确实是一种观点。

arrays - 转置 numpy 数组对其步幅和数据缓冲区的影响

2 回答 2

Related

Reference