python - 当我将 np 数组写入二进制文件时出现问题，新文件只有原始文件的一半

Question

我正在尝试删除原始文件的前 24 行，因此我打开了原始原始文件（我们称之为 raw1.raw）并将其转换为 nparray，然后我初始化了一个新数组并删除了 top24 行，但是在编写了新文件之后数组到新的二进制文件（raw2.raw），我发现raw2只有15.2mb，而原始文件raw1.raw就像30.6mb，我的代码：

import numpy as np
import imageio
import rawpy
import cv2


def ave():
    
    fd = open('raw1.raw', 'rb')
    rows = 3000 #around 3000, not the real rows
    cols = 5100 #around 5100, not the real cols
    f = np.fromfile(fd, dtype=np.uint8,count=rows*cols)
    I_array = f.reshape((rows, cols)) #notice row, column format
    #print(I_array)
   
    fd.close()

    im = np.zeros((rows - 24 , cols))
    for i in range (len(I_array) - 24):
        for j in range(len(I_array[i])):
            im[i][j] = I_array[i + 24][j]
            
    #print(im)

    newFile = open("raw2.raw", "wb")
    
    im.astype('uint8').tofile(newFile)
    newFile.close()


if __name__ == "__main__":
    ave()

我在写入二进制文件时尝试使用 im.astype('uint16')，但如果我使用 uint16，值会错误。

score 1 · Accepted Answer

您的“raw1.raw”文件中肯定有更多您未使用的数据。您确定该文件不是使用“uint16”数据创建的，并且您只是将前半部分作为“uint8”数据提取吗？我刚刚检查了随机数据的写入。

import os, numpy as np

x = np.random.randint(0,256,size=(3000,5100),dtype='uint8')
x.tofile(open('testfile.raw','w'))
print(os.stat('testfile.raw').st_size) #I get 15.3MB.

因此，3000 x 5100 的 'uint8' 显然占用了 15.3MB。不知道你是怎么30+的。

＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃编辑＃＃＃＃＃＃＃＃＃

只是为了添加更多说明。您是否意识到 dtype 只是改变数据的“视图”？它不会影响保存在内存中的实际数据。这也适用于从文件中读取的数据。举个例子：

import numpy as np

#The way to understand x, is that x is taking 12 bytes in memory and using
#that information to hold 3 values. The first 4 bytes are the first value, 
#the second 4 bytes are the second, etc. 
x = np.array([1,2,3],dtype='uint32') 

#Change x to display those 12 bytes at 6 different values. Doing this does
#NOT change the data that the array is holding. You are only changing the 
#'view' of the data. 
x.dtype = 'uint16'
print(x)

通常（很少有特殊情况），更改 dtype 不会更改基础数据。但是，转换函数 .astype() 确实会更改基础数据。如果您有任何 12 个字节的数组被视为“int32”，那么运行 .astype('uint8') 将获取每个条目（4 个字节）并将其转换（称为强制转换）为 uint8 条目（1 个字节）。对于 3 个条目，新数组将只有 3 个字节。你可以从字面上看到这一点：

x = np.array([1,2,3],dtype='uint32')
print(x.tobytes())
y = x.astype('uint8')
print(y.tobytes())

所以，当我们说一个文件是 30mb 时，我们的意思是这个文件有（减去一些头信息）是 30,000,000 字节，它们正好是 uint8s。1 个 uint8 是 1 个字节。如果任何数组有 6000×5100 个 uint8s（字节），那么该数组在内存中有 30,600,000 字节的信息。

同样，如果您读取文件（与文件无关）并写入 np.fromfile(,dtype=np.uint8,count=15_300_000)，那么您告诉 python 读取 15_300_000 个字节（同样 1 个字节是 1 个 uint8）的信息（15MB）。如果您的文件是 100mb、40mb 甚至 30mb，那将完全无关紧要，因为您告诉 python 只读取前 15mb 的数据。

python - 当我将 np 数组写入二进制文件时出现问题，新文件只有原始文件的一半

1 回答 1

Related

Reference