python - 在python中简单读取fortran二进制数据并不那么简单

Question

我有一个来自 FORTRAN 代码的二进制输出文件。想用python阅读它。（使用 FORTRAN 阅读并输出文本以供 python 阅读不是一种选择。长篇大论。）我可以以简单的方式阅读第一条记录：

>>> binfile=open('myfile','rb')
>>> pad1=struct.unpack('i',binfile.read(4))[0]
>>> ver=struct.unpack('d',binfile.read(8))[0]
>>> pad2=struct.unpack('i',binfile.read(4))[0]
>>> pad1,ver,pad2
(8,3.13,8)

正好。但这是一个大文件，我需要更有效地执行此操作。所以我尝试：

>>> (pad1,ver,pad2)=struct.unpack('idi',binfile.read(16))

这不会运行。给我一个错误并告诉我 unpack 需要一个长度为 20 的参数。自从我上次检查 4+8+4=16 以来，这对我来说毫无意义。当我放弃并用 20 替换 16 时，它会运行，但是这三个数字填充了数字垃圾。有谁看到我做错了什么？谢谢！

score 6 · Accepted Answer

你得到的大小是由于对齐，对齐后尝试struct.calcsize('idi')验证大小实际上是20。要使用不对齐的本机字节顺序，请指定struct.calcsize('=idi')并使其适应您的示例。

有关该struct模块的更多信息，请查看http://docs.python.org/2/library/struct.html

score 6 · Accepted Answer

该struct模块主要用于与 C 结构互操作，因此它对齐数据成员。idi对应于以下C结构：

struct
{
   int int1;
   double double1;
   int int2;
}

double条目需要 8 字节对齐才能在大多数 CPU 加载操作中有效（甚至正确）运行。int1这就是为什么在和之间添加 4 个字节的填充double1，这会将结构的大小增加到 20 个字节。模块执行相同的填充struct，除非您通过添加<（在小端机器上）或>（在大端机器上）或只是=在格式字符串的开头来抑制填充：

>>> struct.unpack('idi', d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: unpack requires a string argument of length 20
>>> struct.unpack('<idi', d)
(-1345385859, 2038.0682530887993, 428226400)
>>> struct.unpack('=idi', d)
(-1345385859, 2038.0682530887993, 428226400)

（d是 16 个随机字符的字符串。）

score 1 · Accepted Answer

我建议使用数组来读取由 FORTRAN 使用 UNFORMATTED、SEQUENTIAL 编写的文件。

您使用数组的具体示例如下：

import array
binfile=open('myfile','rb')
pad = array.array('i')
ver = array.array('d')
pad.fromfile(binfile,1)   # read the length of the record 
ver.fromfile(binfile,1)   # read the actual data written by FORTRAN
pad.fromfile(binfile,1)   # read the length of the record

如果您有写入整数和双精度数组的 FORTRAN 记录，这很常见，您的 python 将如下所示：

import array
binfile=open('myfile','rb')
pad = array.array('i')
my_integers = array.array('i')
my_floats = array.array('d')
number_of_integers = 1000 # replace with how many you need to read
number_of_floats = 10000 # replace with how many you need to read
pad.fromfile(binfile,1)   # read the length of the record
my_integers.fromfile(binfile,number_of_integers) # read the integer data
my_floats.fromfile(binfile,number_of_floats)     # read the double data
pad.fromfile(binfile,1)   # read the length of the record

最后的评论是，如果文件中有字符，您也可以将它们读入数组，然后将其解码为字符串。像这样的东西：

import array
binfile=open('myfile','rb')
pad = array.array('i')
my_characters = array.array('B')
number_of_characters = 63 # replace with number of characters to read
pad.fromfile(binfile,1)   # read the length of the record 
my_characters.fromfile(binfile,number_of_characters ) # read the data
my_string = my_characters.tobytes().decode(encoding='utf_8') 
pad.fromfile(binfile,1)   # read the length of the record

python - 在python中简单读取fortran二进制数据并不那么简单

3 回答 3

Related

Reference