python - 将 numpy 数组转换为 numpy 记录数组

Question

我尝试通过为每列命名来将 10x2 数组转换为记录。

我试过这样：

t = arange (10)
>>> n = dstack([t,
                roll (t, 1),
                roll (t, -1)])[0]
... ... >>> 
>>> n = n[:,1:3]
>>> n
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])
>>> nt = [('left', int), ('right', int)]
>>> array (n, nt)
array([[(9, 9), (1, 1)],
       [(0, 0), (2, 2)],
       [(1, 1), (3, 3)],
       [(2, 2), (4, 4)],
       [(3, 3), (5, 5)],
       [(4, 4), (6, 6)],
       [(5, 5), (7, 7)],
       [(6, 6), (8, 8)],
       [(7, 7), (9, 9)],
       [(8, 8), (0, 0)]], 
      dtype=[('left', '<i8'), ('right', '<i8')])
>>>

令我惊讶的是，每一行的元素都是元组而不是 int 类型的数字。

我该如何纠正这一点，并使 n 的每一行看起来像[ 9,1 ]而不是[(9, 9), (1, 1)]？

score 3 · Accepted Answer

您可以使用新的 dtype 创建一个视图，它看起来是相同的数据：

In [150]: nt = [('left',np.int),('right',np.int)]

In [151]: n
Out[151]: 
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])

In [152]: n.view(nt)
Out[152]: 
array([[(9, 1)],
       [(0, 2)],
       [(1, 3)],
       [(2, 4)],
       [(3, 5)],
       [(4, 6)],
       [(5, 7)],
       [(6, 8)],
       [(7, 9)],
       [(8, 0)]], 
      dtype=[('left', '<i8'), ('right', '<i8')])

但是，这保持了 2d 形状：

In [160]: n_struct = n.view(nt)

In [161]: n_struct.shape
Out[161]: (10, 1)

In [162]: n_struct = n.view(nt).reshape(n.shape[0])

In [163]: n_struct
Out[163]: 
array([(9, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8),
       (7, 9), (8, 0)], 
      dtype=[('left', '<i8'), ('right', '<i8')])

正如你所问，访问是这样的：

In [170]: n_struct['left']
Out[170]: array([9, 0, 1, 2, 3, 4, 5, 6, 7, 8])

In [171]: n_struct['right']
Out[171]: array([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])

@Ophion 的警告是，这仅在 dtype 兼容的情况下才有效，因为ndarray.view(dtype)将原始数据解释为给定的 dtype，它不会将数据转换为新的给定 dtype。换句话说，（来自文档），

a.view(some_dtype)用不同的数据类型构造数组内存的视图。这可能会导致重新解释内存字节。

score 2 · Accepted Answer

希望在纯 numpy 中有更好的方法，但要让你开始：

>>> nt = [('left', int), ('right', int)]
>>> n
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])

>>> out = np.array(np.zeros(n.shape[0]),nt)
>>> out
array([(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0),
       (0, 0), (0, 0)],
      dtype=[('left', '<i8'), ('right', '<i8')])

>>> out['left']=n[:,0]
>>> out['right']=n[:,1]

>>> out
array([(9, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8),
       (7, 9), (8, 0)],
      dtype=[('left', '<i8'), ('right', '<i8')])

>>> out['left']
array([9, 0, 1, 2, 3, 4, 5, 6, 7, 8])

当然有熊猫的答案：

>>> import pandas as pd
>>> df = pd.DataFrame(n,columns=['left','right'])
>>> df
   left  right
0     9      1
1     0      2
2     1      3
3     2      4
4     3      5
5     4      6
6     5      7
7     6      8
8     7      9
9     8      0

熊猫数据框的一些好处：

>>> df.values
array([[9, 1],
       [0, 2],
       [1, 3],
       [2, 4],
       [3, 5],
       [4, 6],
       [5, 7],
       [6, 8],
       [7, 9],
       [8, 0]])

score 1 · Accepted Answer

如果底层 dtypes 不兼容，则该view方法不起作用。后备选项是用元组列表填充记录数组：

In [128]: x=np.arange(12).reshape(4,3)

In [129]: y=np.zeros((4,),dtype=[('x','f'),('y','f'),('z','f')])

In [130]: y
Out[130]: 
array([(0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0)], 
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4')])

In [131]: y[:]=[tuple(row) for row in x]

In [132]: y
Out[132]: 
array([(0.0, 1.0, 2.0), (3.0, 4.0, 5.0), (6.0, 7.0, 8.0), (9.0, 10.0, 11.0)], 
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4')])

这个元组列表可以在初始构造中使用：

In [135]: np.array([tuple(row) for row in x],y.dtype)
Out[135]: 
array([(0.0, 1.0, 2.0), (3.0, 4.0, 5.0), (6.0, 7.0, 8.0), (9.0, 10.0, 11.0)], 
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4')])

python - 将 numpy 数组转换为 numpy 记录数组

3 回答 3

Related

Reference