python - Python-numpy 读取字节和偏移到有符号的 int32

Question

我有一个操作可以在 python 中应用于超过 1000 万个值。我的问题是优化实际操作。我有 2 种工作方法，numpy 和 python vanilla。

Python 原版操作：

1：我的原始值是4个byte数据：b'\x9a#\xe6\x00'= [154, 35, 230, 0]=[0x9A, 0x23, 0xE6, 0x00]
2：我取最后一个字节，先放入：b'\x00\x9a#\xe6'= [0, 154, 35, 230]=[0x00, 0x9A, 0x23, 0xE6]
3：我将其转换为 int32 有符号值：-433874432

文件加载：

f = open(path_data, "rb")
while trame := f.read(4):

数据操作：

trame = b'\x9a#\xe6\x00'
trame_list = list(trame)  # [154, 35, 230, 0]
trame_list_swap = [trame_list[-1]] + trame_list[:-1]
trame_swap = bytes(trame_list_swap)
result = int.from_bytes(trame_swap, byteorder='little', signed=True)

numpy 操作

文件加载：

datas_raw = numpy.fromfile(path_data, dtype="<i4")
# datas_raw = numpy.array([-1708923392, 1639068928, 2024603392, ...])  # len(datas_raw) = 12171264
for i, trame in enumerate(datas_raw):

数据操作：

trame = 15082394
tmp = list(trame.tobytes("C"))
tmp.insert(0, tmp.pop())
result = numpy.ndarray(1, "<i", bytes(tmp))[0]

它执行与香草相同的处理，但在这里速度较慢，因为numpy.ndarray它被触发了 1000 万次......

问题

我的问题如下：

我希望 numpy 版本对所有值进行按位操作而不进行操作for loop（在 python 中非常慢）...欢迎任何其他解决该问题的方法（不是关闭 XY 问题...）

score 1 · Accepted Answer

在这里，我使用一些随机数据代替从文件中读取的数据，您可以使用np.loadtxt. 理想情况下，您会将字节读入形状为 (4*n,) 的一维数组，然后重新整形为(n,4).

import numpy as np
rng = np.random.default_rng(0)
data = rng.integers(-2**31,2**31,size=10000,dtype="i4")
data = data.view("u1").reshape((-1,4))
# Last column first, then other 3
data = data[:,[3,0,1,2]]
# Depending on platform might need to specify byteorder, e.g., "<i4" or ">i4"
ints = np.ascontiguousarray(data).view("i4")

这会产生像

array([[-1031643175],
       [  267112355],
       [ -640212606],
       ...,

这将返回一个形状为 (n,1) 的有符号整数数组。

python - Python-numpy 读取字节和偏移到有符号的 int32

Python 原版操作：

numpy 操作

问题

1 回答 1

Related

Reference