python - 在python中读取二进制文件

Question

我必须在 python 中读取二进制文件。这是首先由 Fortran 90 程序以这种方式编写的：

open(unit=10,file=filename,form='unformatted')
write(10)table%n1,table%n2
write(10)table%nH
write(10)table%T2
write(10)table%cool
write(10)table%heat
write(10)table%cool_com
write(10)table%heat_com
write(10)table%metal
write(10)table%cool_prime
write(10)table%heat_prime
write(10)table%cool_com_prime
write(10)table%heat_com_prime
write(10)table%metal_prime
write(10)table%mu
if (if_species_abundances) write(10)table%n_spec
close(10)

我可以使用以下 IDL 代码轻松读取此二进制文件：

n1=161L
n2=101L
openr,1,file,/f77_unformatted
readu,1,n1,n2
print,n1,n2
spec=dblarr(n1,n2,6)
metal=dblarr(n1,n2)
cool=dblarr(n1,n2)
heat=dblarr(n1,n2)
metal_prime=dblarr(n1,n2)
cool_prime=dblarr(n1,n2)
heat_prime=dblarr(n1,n2)
mu  =dblarr(n1,n2)
n   =dblarr(n1)
T   =dblarr(n2)
Teq =dblarr(n1)
readu,1,n
readu,1,T
readu,1,Teq
readu,1,cool
readu,1,heat
readu,1,metal
readu,1,cool_prime
readu,1,heat_prime
readu,1,metal_prime
readu,1,mu
readu,1,spec
print,spec
close,1

我想做的是用 Python 读取这个二进制文件。但也有一些问题。首先，这是我读取文件的尝试：

import numpy
from numpy import *
import struct

file='name_of_my_file'
with open(file,mode='rb') as lines:
    c=lines.read()

我尝试读取前两个变量：

dummy, n1, n2, dummy = struct.unpack('iiii',c[:16])

但是正如你所看到的，我不得不添加到虚拟变量中，因为不知何故，fortran 程序在这些位置添加了整数 8。

现在的问题是尝试读取其他字节时。我没有得到 IDL 程序的相同结果。

这是我尝试读取数组 n

 double = 8
 end = 16+n1*double
 nH = struct.unpack('d'*n1,c[16:end])

但是，当我打印这个数组时，我得到了无意义的值。我的意思是，我可以使用上面的 IDL 代码读取文件，所以我知道会发生什么。所以我的问题是：当我不知道确切的结构时如何阅读这个文件？为什么使用 IDL 阅读起来如此简单？我需要用 Python 读取这个数据集。

score 6 · Accepted Answer

您正在寻找的是struct模块。

该模块允许您从字符串中解压缩数据，将其视为二进制数据。

您提供格式字符串和文件字符串，它将使用返回二进制对象的数据。

例如，使用您的变量：

import struct
content = f.read() #I'm not sure why in a binary file you were using "readlines",
                   #but if this is too much data, you can supply a size to read()
n, T, Teq, cool = struct.unpack("dddd",content[:32])

这将使 n、T、Teq 和 cool 保存二进制文件中的前四个双精度数。当然，这只是一个演示。您的示例看起来需要双精度列表 - 方便地struct.unpack返回一个元组，我为您的情况采用它仍然可以正常工作（如果没有，您可以列出它们）。请记住，struct.unpack需要消耗传递给它的整个字符串 - 否则你会得到一个struct.error. 所以，要么分割你的输入字符串，要么只分割read你将使用的字符数，就像我在上面的评论中所说的那样。

例如，

n_content = f.read(8*number_of_ns) #8, because doubles are 8 bytes
n = struct.unpack("d"*number_of_ns,n_content)

score 0 · Accepted Answer

你试过 scipy.io.readsav吗？

只需像这样阅读您的文件：

mydict = scipy.io.readsav('name_of_file')

score 0 · Accepted Answer

看起来您正在尝试读取 RAMSES 生成的cooling_0000x.out文件。

请注意，前两个整数 (n1, n2) 提供了文件正文中的两个维度表（数组）的维度......所以你需要先处理这两个整数才能知道多少真实*8数据在文件的其余部分。

scipy 应该有帮助——它可以让你读取任意维度的二进制数据：

http://wiki.scipy.org/Cookbook/InputOutput#head-e35c7736718209eea00ebf37a7e1dfb91df696e1

如果你已经有了这个 python 代码，请告诉我，因为我今天（2014 年 9 月 17 日）要写它。

瑞克

python - 在python中读取二进制文件

3 回答 3

Related

Reference