python - 从 numpy.int32 数组的数据字节中有效地删除每 4 个字节

Question

我有一个大numpy.int32阵列，可能需要 4GB 或更多。它实际上是一个24 位整数数组（在音频应用程序中很常见），但由于numpy.int24不存在，我使用了int32.

我想将此数组的数据作为 24 位（即每个数字 3 个字节）输出到文件中。

这行得通（我不久前在某个地方找到了这个“食谱”，但我再也找不到了）：

  import numpy as np
  x = np.array([[-33772,-2193],[13313,-1314],[20965,-1540],[10706,-5995],[-37719,-5871]], dtype=np.int32)
  data = ((x.reshape(x.shape + (1,)) >> np.array([0, 8, 16])) & 255).astype(np.uint8)
  print(data.tostring())

  # b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95\xe8\xff\xa9l\xff\x11\xe9\xff'

但是当只有几 GB 时，许多reshape会使其效率低下x：它需要大量不需要的 RAM。

另一种解决方案是删除每 4 个字节：
```
s = bytes([c for i, c in enumerate(x.tostring()) if i % 4 != 3])

# b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95\xe8\xff\xa9l\xff\x11\xe9\xff'
```
它可以工作，但我怀疑如果x需要 4 GB 的 RAM，这条线将至少消耗 8 GB 的 RAM，对于两者s和x（也许还有x.tostring()？）

TL;DR：如何通过删除每 4 个字节有效地（不使用两倍于实际数据大小的 RAM）将 int32 数组作为 24 位数组写入磁盘？

注意：这是可能的，因为整数实际上是 24 位的，即每个值的绝对值 < 2^23-1

score 2 · Accepted Answer

假设x是 C-contiguous 并且您的平台是 little-endian（否则将需要很少的调整），您可以这样做：

import numpy as np

# Input data
x = np.array([[-33772, -2193], [13313, -1314], [20965, -1540],
              [10706, -5995], [-37719, -5871]], dtype=np.int32)
# Make 24-bit uint8 view
x2 = np.ndarray(shape=x.shape + (3,), dtype=np.uint8, buffer=x, offset=0, 
                strides=x.strides + (1,))  
print(x2.tostring())
# b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95...
np.save('data.npy', x2)  # Save to disk

在此示例中，请注意：

我们添加了一个维度：x.shape + (3,)is (5, 2, 3)。
x2本质上是的一个视图x，也就是说，它使用相同的数据。
诀窍在于大步前进。x.strides + (1,)在这里(8, 4, 1)。每一新行x相对于其前一行前进 8 个字节，并且每一新列前进 4 个字节。在x2中，我在步幅上加了一个 1，因此新的最内层维度中的每个项目都相对于前一个项目前进 1 个字节。如果的形状x2为 (5, 2, 4) （即使用+ (4,)代替+ (3,)），则与相同x，但由于是 (5, 2, 3)，最后一个字节只是“跳过”。

您可以使用以下方法恢复它：


x2 = np.load('data.npy', mmap_mode='r')  # Use mmap to avoid using extra memory
x3 = np.zeros(x2.shape[:-1] + (4,), np.uint8)
x3[..., :3] = x2
del x2  # Release mmap
# Fix negative sign in last byte (could do this in a loop
# or in "batches" if you want to avoid the intermediate
# array from the "&" operation, or with Numba)
x3[..., 3] = np.where(x3[..., 2] & 128, 255, 0)
# Make int32 view
x4 = np.ndarray(x3.shape[:-1], np.int32, buffer=x3, offset=0, strides=x3.strides[:-1])
print(x4)
# [[-33772  -2193]
#  [ 13313  -1314]
#  [ 20965  -1540]
#  [ 10706  -5995]
#  [-37719  -5871]]

score 2 · Accepted Answer

经过更多的摆弄，我发现这是可行的：

import numpy as np
x = np.array([[-33772,-2193],[13313,-1314],[20965,-1540],[10706,-5995],[-37719,-5871]], dtype=np.int32)
x2 = x.view(np.uint8).reshape(-1,4)[:,:3]
print(x2.tostring())
# b'\x14|\xffo\xf7\xff\x014\x00\xde\xfa\xff\xe5Q\x00\xfc\xf9\xff\xd2)\x00\x95\xe8\xff\xa9l\xff\x11\xe9\xff'

这是一个时间+内存基准：

import numpy as np, time
t0 = time.time()
x = np.random.randint(10000, size=(125_000_000, 2), dtype=np.int32)  # 125M * 2 * 4 bytes ~ 1GB of RAM
print('Random array generated in %.1f sec.' % (time.time() - t0))
time.sleep(5)  
# you can check the RAM usage in the task manager in the meantime...
t0 = time.time()
x2 = x.view(np.uint8).reshape(-1,4)[:,:3]
x2.tofile('test')
print('24-bit output file written in %.1f sec.' % (time.time() - t0))

结果：

在 4.6 秒内生成随机数组。
24 位输出文件在 35.9 秒内写入。

此外，在整个处理过程中仅使用了 ~1GB（使用 Windows 任务管理器进行监控）

@jdehesa 的方法给出了类似的结果，即如果我们改用这一行：

x2 = np.ndarray(shape=x.shape + (3,), dtype=np.uint8, buffer=x, offset=0, strides=x.strides + (1,))

该进程的 RAM 使用量也达到了 1GB 的峰值，花费的时间x2.tofile(...)约为 37 秒。

score 1 · Accepted Answer

我运行了您的代码并得到了与您的 35 秒相似的时间，但是当我的 SSD 可以达到 2GB/s 时，这对于 750MB 来说似乎太慢了。我无法想象为什么它这么慢。所以我决定使用OpenCV高度优化的 SIMD 代码，通过剥离每 4 个字节的 Alpha/透明度信息，将 RGBA8888 图像减少到 RGB888——相当于将 32 位转换为 24 位。

为了不使用过多的额外内存，我一次以 1,000,000 个立体声样本 (6MB) 的形式进行处理，并将其附加到输出文件中。它在 1 秒内运行，并且文件与您的代码创建的文件比较相同。

#!/usr/bin/env python3

import numpy as np
import cv2

def orig(x):
    x2 = x.view(np.uint8).reshape(-1,4)[:,:3]
    x2.tofile('orig.dat')

def chunked(x):
    BATCHSIZE = 1_000_000
    l = len(x)
    with open('test.dat', 'w') as file:
        for b in range(0,l,BATCHSIZE):
            s = min(BATCHSIZE,l-b)
            y = x[b:b+s,:].view(np.uint8).reshape(s*2,1,4) 
            z = cv2.cvtColor(y,cv2.COLOR_BGRA2BGR)
            # Append to file
            z.tofile(file)
            if b+s == l:
                break


# Repeatable randomness
np.random.seed(42)                                                                                         
# Create array of stereo samples
NSAMPLES = 125_000_000
x = np.random.randint(10000, size=(NSAMPLES, 2), dtype=np.int32)

# orig(x)
chunked(x)

python - 从 numpy.int32 数组的数据字节中有效地删除每 4 个字节

3 回答 3

Related

Reference