python - 从 c 中创建的结构中读取 python 中的结构

Question

我对使用 Python 非常陌生，对 C 非常生疏，所以我提前为我听起来多么愚蠢和/或迷茫而道歉。

我在 C 中有创建包含数据的 .dat 文件的函数。我正在使用 Python 打开文件以读取文件。我需要阅读的内容之一是在 C 函数中创建并以二进制形式打印的结构。在我的 Python 代码中，我位于文件的适当行以读取结构。我已经尝试过逐项拆开stuct的包装，并且作为一个整体没有成功。结构中的大多数项目在 C 代码中被声明为“真实的”。我正在与其他人一起处理此代码，主要源代码是他的，并且已将变量声明为“真实”。我需要把它放在一个循环中，因为我想读取目录中以“.dat”结尾的所有文件。要开始循环，我有：

for files in os.listdir(path):
  if files.endswith(".dat"):
    part = open(path + files, "rb")
    for line in part:

然后我阅读了包含结构的行之前的所有行。然后我到达那条线并拥有：

      part_struct = part.readline()
      r = struct.unpack('<d8', part_struct[0])

我正在尝试读取存储在结构中的第一件事。我在这里的某个地方看到了一个例子。当我尝试这个时，我收到一个错误，内容如下：

struct.error: repeat count given without format specifier

我会接受任何人可以给我的任何建议。我已经坚持了几天，并尝试了许多不同的东西。老实说，我认为我不了解 struct 模块，但我已经尽可能多地阅读了它。

谢谢！

score 19 · Accepted Answer

您可以使用ctypes.Structure或struct.Struct指定文件的格式。要从@perreal 的答案中由 C 代码生成的文件中读取结构：

"""
struct { double v; int t; char c;};
"""
from ctypes import *

class YourStruct(Structure):
    _fields_ = [('v', c_double),
                ('t', c_int),
                ('c', c_char)]

with open('c_structs.bin', 'rb') as file:
    result = []
    x = YourStruct()
    while file.readinto(x) == sizeof(x):
        result.append((x.v, x.t, x.c))

print(result)
# -> [(12.100000381469727, 17, 's'), (12.100000381469727, 17, 's'), ...]

见io.BufferedIOBase.readinto()。它在 Python 3 中受支持，但在 Python 2.7 中未记录默认文件对象。

struct.Struct需要明确指定填充字节 ( x)：

"""
struct { double v; int t; char c;};
"""
from struct import Struct

x = Struct('dicxxx')
with open('c_structs.bin', 'rb') as file:
    result = []
    while True:
        buf = file.read(x.size)
        if len(buf) != x.size:
            break
        result.append(x.unpack_from(buf))

print(result)

它产生相同的输出。

为避免不必要的复制Array.from_buffer(mmap_file)，可用于从文件中获取结构数组：

import mmap # Unix, Windows
from contextlib import closing

with open('c_structs.bin', 'rb') as file:
    with closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_COPY)) as mm: 
        result = (YourStruct * 3).from_buffer(mm) # without copying
        print("\n".join(map("{0.v} {0.t} {0.c}".format, result)))

score 8 · Accepted Answer

一些C代码：

#include <stdio.h>
typedef struct { double v; int t; char c;} save_type;
int main() {
    save_type s = { 12.1f, 17, 's'};
    FILE *f = fopen("output", "w");
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fwrite(&s, sizeof(save_type), 1, f);
    fclose(f);
    return 0;
}

一些Python代码：

import struct
with open('output', 'rb') as f:
    chunk = f.read(16)
    while chunk != "":
        print len(chunk)
        print struct.unpack('dicccc', chunk)
        chunk = f.read(16)

输出：

(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')
(12.100000381469727, 17, 's', '\x00', '\x00', '\x00')

但也有填充问题。的填充大小save_type为 16，因此我们再读取 3 个字符并忽略它们。

score 0 · Accepted Answer

格式说明符中的数字表示重复计数，但它必须位于字母之前，例如'<8d'. 但是您说您只想读取结构的一个元素。我猜你只是想要'<d'。我猜您正在尝试将要读取的字节数指定为 8，但您不需要这样做。d假设。

我还注意到您正在使用readline. 读取二进制数据似乎是错误的。它将读取直到下一个回车/换行，这将在二进制数据中随机发生。你想要做的是使用read(size)，像这样：

part_struct = part.read(8)
r = struct.unpack('<d', part_struct)

实际上，您应该小心，因为read返回的数据可能少于您请求的数据。如果确实如此，您需要重复它。

part_struct = b''
while len(part_struct) < 8:
    data = part.read(8 - len(part_struct))
    if not data: raise IOException("unexpected end of file")
    part_struct += data
r = struct.unpack('<d', part_struct)

score 0 · Accepted Answer

Numpy 可用于读取/写入二进制数据。您只需要定义一个自定义np.dtype实例来定义您的 c-struct 的内存布局。

例如，这里有一些定义结构的 C++ 代码（应该同样适用于 C 结构，尽管我不是 C 专家）：

struct MyStruct {
    uint16_t FieldA;
    uint16_t pad16[3];
    uint32_t FieldB;
    uint32_t pad32[2];
    char     FieldC[4];
    uint64_t FieldD;
    uint64_t FieldE;
};

void write_struct(const std::string& fname, MyStruct h) {
    // This function serializes a MyStruct instance and
    // writes the binary data to disk.
    std::ofstream ofp(fname, std::ios::out | std::ios::binary);
    ofp.write(reinterpret_cast<const char*>(&h), sizeof(h));

}

根据我在stackoverflow.com/a/5397638pad16上找到的建议，我在结构（和字段）中包含了一些填充，pad32以便以更可预测的方式进行序列化。我认为这是 C++ 的事情；使用普通的 ol' C 结构时可能没有必要。

现在，在 python 中，我们创建一个numpy.dtype描述内存布局的对象MyStruct：

import numpy as np

my_struct_dtype =  np.dtype([
    ("FieldA"            , np.uint16  ,       ),
    ("pad16"             , np.uint16  , (3,)  ),
    ("FieldB"            , np.uint32          ),
    ("pad32"             , np.uint32  , (2,)  ),
    ("FieldC"            , np.byte    , (4,)  ),
    ("FieldD"            , np.uint64          ),
    ("FieldE"            , np.uint64          ),
])

然后使用 numpyfromfile读取保存 c-struct 的二进制文件：

# read data
struct_data = np.fromfile(fpath, dtype=my_struct_dtype, count=1)[0]

FieldA         = struct_data["FieldA"]
FieldB         = struct_data["FieldB"]
FieldC         = struct_data["FieldC"]
FieldD         = struct_data["FieldD"]
FieldE         = struct_data["FieldE"]

if FieldA != expected_value_A:
    raise ValueError("Bad FieldA, got %d" % FieldA)
if FieldB != expected_value_B:
    raise ValueError("Bad FieldB, got %d" % FieldB)
if FieldC.tobytes() != b"expc":
    raise ValueError("Bad FieldC, got %s" % FieldC.tobytes().decode())
...

count=1上述调用中的参数是np.fromfile(..., count=1)使返回的数组只有一个元素；这意味着“从文件中读取第一个结构实例”。请注意，我正在索引[0]以将该元素从数组中取出。

如果您已将来自许多 c-struct 的数据附加到同一个文件中，则可以使用fromfile(..., count=n)将 struct 实例读n入 shape 的 numpy 数组(n,)。设置count=-1是np.fromfileandnp.frombuffer函数的默认值，表示“读取所有数据”，从而生成一个 shape 的一维数组(number_of_struct_instances,)。

您还可以使用offset关键字参数来np.fromfile控制文件中数据读取的开始位置。

总而言之，这里有一些 numpy 函数，一旦定义了自定义，它们就会很有用dtype：

将二进制数据读取为 numpy 数组：
- np.frombuffer(bytes_data, dtype=...)：将给定的二进制数据（例如 pythonbytes实例）解释为给定 dtype 的 numpy 数组。您可以定义一个自定义 dtype来描述您的 c 结构的内存布局。
- np.fromfile(filename, dtype=...): 读取二进制数据filename。应该是一样的结果 np.frombuffer(open(filename, "rb").read(), dtype=...)。
将 numpy 数组写入二进制数据：
- ndarray.tobytes()：构造一个bytes包含来自给定 numpy 数组的原始数据的 python 实例。如果数组的数据具有对应于 c-struct 的 dtype，则来自的字节 ndarray.tobytes可以由 c/c++ 反序列化并解释为该 c-struct 的（数组）实例。
- ndarray.tofile(filename): 将数组中的二进制数据写入filename. 然后可以通过 c/c++ 反序列化此数据。相当于open("filename", "wb").write(a.tobytes())。

score 0 · Accepted Answer

我最近遇到了同样的问题，所以我为任务制作了模块，存储在这里： http: //pastebin.com/XJyZMyHX

示例代码：

MY_STRUCT="""typedef struct __attribute__ ((__packed__)){
    uint8_t u8;
    uint16_t u16;
    uint32_t u32;
    uint64_t u64;
    int8_t i8;
    int16_t i16;
    int32_t i32;
    int64_t i64;
    long long int lli;
    float flt;
    double dbl;
    char string[12];
    uint64_t array[5];
} debugInfo;"""

PACKED_STRUCT='\x01\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\x00\xff\x00\x00\xff\xff\x00\x00\x00\x00\xff\xff\xff\xff*\x00\x00\x00\x00\x00\x00\x00ff\x06@\x14\xaeG\xe1z\x14\x08@testString\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00'

if __name__ == '__main__':
    print "String:"
    print depack_bytearray_to_str(PACKED_STRUCT,MY_STRUCT,"<" )
    print "Bytes in Stuct:"+str(structSize(MY_STRUCT))
    nt=depack_bytearray_to_namedtuple(PACKED_STRUCT,MY_STRUCT,"<" )
    print "Named tuple nt:"
    print nt
    print "nt.string="+nt.string

结果应该是：

String:
u8:1
u16:256
u32:65536
u64:4294967296
i8:-1
i16:-256
i32:-65536
i64:-4294967296
lli:42
flt:2.09999990463
dbl:3.01
string:u'testString\x00\x00'
array:(1, 2, 3, 4, 5)

Bytes in Stuct:102
Named tuple nt:
CStruct(u8=1, u16=256, u32=65536, u64=4294967296L, i8=-1, i16=-256, i32=-65536, i64=-4294967296L, lli=42, flt=2.0999999046325684, dbl=3.01, string="u'testString\\x00\\x00'", array=(1, 2, 3, 4, 5))
nt.string=u'testString\x00\x00'

python - 从 c 中创建的结构中读取 python 中的结构

5 回答 5

Related

Reference