python - 使用 Numpy 将数组分区为 N 个块

Question

有这个你如何将列表分成大小均匀的块？用于将数组拆分为块。无论如何，对于使用 Numpy 的巨型阵列，是否可以更有效地执行此操作？

score 128 · Accepted Answer

试试numpy.array_split。

从文档中：

>>> x = np.arange(8.0)
>>> np.array_split(x, 3)
    [array([ 0.,  1.,  2.]), array([ 3.,  4.,  5.]), array([ 6.,  7.])]

与相同numpy.split，但如果组的长度不相等，则不会引发异常。

如果块数 > len(array) 你会得到嵌套在里面的空白数组，以解决这个问题 - 如果你的拆分数组保存在中a，那么你可以通过以下方式删除空数组：

[x for x in a if x.size > 0]

a如果您愿意，只需将其保存回来。

score 24 · Accepted Answer

只是一些使用array_split,split和hsplit的vsplit例子：

n [9]: a = np.random.randint(0,10,[4,4])

In [10]: a
Out[10]: 
array([[2, 2, 7, 1],
       [5, 0, 3, 1],
       [2, 9, 8, 8],
       [5, 7, 7, 6]])

有关使用的一些示例array_split：
如果您将数组或列表作为第二个参数，您基本上会给出要“剪切”的索引（之前）

# split rows into 0|1 2|3
In [4]: np.array_split(a, [1,3])
Out[4]:                                                                                                                       
[array([[2, 2, 7, 1]]),                                                                                                       
 array([[5, 0, 3, 1],                                                                                                         
       [2, 9, 8, 8]]),                                                                                                        
 array([[5, 7, 7, 6]])]

# split columns into 0| 1 2 3
In [5]: np.array_split(a, [1], axis=1)                                                                                           
Out[5]:                                                                                                                       
[array([[2],                                                                                                                  
       [5],                                                                                                                   
       [2],                                                                                                                   
       [5]]),                                                                                                                 
 array([[2, 7, 1],                                                                                                            
       [0, 3, 1],
       [9, 8, 8],
       [7, 7, 6]])]

一个整数作为第二个参数。指定相等块的数量：

In [6]: np.array_split(a, 2, axis=1)
Out[6]: 
[array([[2, 2],
       [5, 0],
       [2, 9],
       [5, 7]]),
 array([[7, 1],
       [3, 1],
       [8, 8],
       [7, 6]])]

split工作方式相同，但如果无法进行相等拆分，则会引发异常

除了array_split您可以使用快捷方式vsplit和hsplit.
vsplit并且hsplit几乎不言自明：

In [11]: np.vsplit(a, 2)
Out[11]: 
[array([[2, 2, 7, 1],
       [5, 0, 3, 1]]),
 array([[2, 9, 8, 8],
       [5, 7, 7, 6]])]

In [12]: np.hsplit(a, 2)
Out[12]: 
[array([[2, 2],
       [5, 0],
       [2, 9],
       [5, 7]]),
 array([[7, 1],
       [3, 1],
       [8, 8],
       [7, 6]])]

score 10 · Accepted Answer

我相信您正在寻找numpy.split或者可能numpy.array_split是否部分的数量不需要正确划分数组的大小。

score 9 · Accepted Answer

不是一个完整的答案，而是对其他（正确）答案的代码格式很好的长注释。如果您尝试以下操作，您将看到您得到的是原始数组的视图，而不是副本，并且您链接的问题中接受的答案并非如此。注意可能的副作用！

>>> x = np.arange(9.0)
>>> a,b,c = np.split(x, 3)
>>> a
array([ 0.,  1.,  2.])
>>> a[1] = 8
>>> a
array([ 0.,  8.,  2.])
>>> x
array([ 0.,  8.,  2.,  3.,  4.,  5.,  6.,  7.,  8.])
>>> def chunks(l, n):
...     """ Yield successive n-sized chunks from l.
...     """
...     for i in xrange(0, len(l), n):
...         yield l[i:i+n]
... 
>>> l = range(9)
>>> a,b,c = chunks(l, 3)
>>> a
[0, 1, 2]
>>> a[1] = 8
>>> a
[0, 8, 2]
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8]

score 0 · Accepted Answer

这个怎么样？在这里，您使用您想要的长度拆分数组。

a = np.random.randint(0,10,[4,4])

a
Out[27]: 
array([[1, 5, 8, 7],
       [3, 2, 4, 0],
       [7, 7, 6, 2],
       [7, 4, 3, 0]])

a[0:2,:]
Out[28]: 
array([[1, 5, 8, 7],
       [3, 2, 4, 0]])

a[2:4,:]
Out[29]: 
array([[7, 7, 6, 2],
       [7, 4, 3, 0]])

score 0 · Accepted Answer

这可以使用as_stridednumpy 来实现。我假设如果块大小不是总行数的一个因素，那么最后一批中的其余行将用零填充。

from numpy.lib.stride_tricks import as_strided
def batch_data(test, chunk_count):
  m,n = test.shape
  S = test.itemsize
  if not chunk_count:
    chunk_count = 1
  batch_size = m//chunk_count
# Batches which can be covered fully
  test_batches = as_strided(test, shape=(chunk_count, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
  covered = chunk_count*batch_size
  if covered < m:
    rest = test[covered:,:]
    rm, rn = rest.shape
    mismatch = batch_size - rm
    last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
    return np.vstack((test_batches,last_batch))
  return test_batches

这是基于我的回答https://stackoverflow.com/a/68238815/5462372。

python - 使用 Numpy 将数组分区为 N 个块

6 回答 6

Related

Reference