python - 通过 readinto() 将二进制数据解析为 ctypes 结构对象

Question

我正在尝试按照此处的示例处理二进制格式：

http://dabeaz.blogspot.jp/2009/08/python-binary-io-handling.html

>>> from ctypes import *
>>> class Point(Structure):
>>>     _fields_ = [ ('x',c_double), ('y',c_double), ('z',c_double) ]
>>>
>>> g = open("foo","rb") # point structure data
>>> q = Point()
>>> g.readinto(q)
24
>>> q.x
2.0

我已经定义了标题的结构，并且正在尝试将数据读入我的结构中，但是遇到了一些困难。我的结构是这样的：

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ]

ctypes 文档说：

对于像 c_int 这样的整数类型字段，可以给出第三个可选项。它必须是一个小的正整数，定义字段的位宽。

因此，("more_funky_numbers_7bytes", c_uint, 56),我尝试将该字段定义为 7 字节字段，但出现错误：

ValueError：位域无效的位数

所以我的第一个问题是如何定义一个 7 字节的 int 字段？

然后，如果我跳过该问题并注释掉“more_funky_numbers_7bytes”字段，则结果数据将被加载..但正如预期的那样，只有 1 个字符被加载到“ascii_text_32bytes”中。出于某种原因16，我假设返回的是它读入结构的计算字节数......但是如果我注释掉我的“时髦数字”字段并且“”ascii_text_32bytes”只给出一个字符（1个字节），那不应该是13，而不是16吗？？？

然后我尝试将 char 字段分解为一个单独的结构，并从我的 Header 结构中引用它。但这也不起作用......

class StupidStaticCharField(BigEndianStructure):
    _fields_ = [
                ("ascii_text_1", c_byte),
                ("ascii_text_2", c_byte),
                ("ascii_text_3", c_byte),
                ("ascii_text_4", c_byte),
                ("ascii_text_5", c_byte),
                ("ascii_text_6", c_byte),
                ("ascii_text_7", c_byte),
                ("ascii_text_8", c_byte),
                ("ascii_text_9", c_byte),
                ("ascii_text_10", c_byte),
                ("ascii_text_11", c_byte),
                .
                .
                .
                ]

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", StupidStaticCharField),
                ("timestamp_4bytes", c_uint),
                #("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_ushort),
                ("other_flags_1byte", c_ushort),
                ("payload_length_2bytes", c_ushort),

                ]

所以，任何想法如何：

定义一个 7 字节字段（我需要使用定义的函数对其进行解码）
定义一个 32 字节的静态 char 字段

更新

我找到了一个似乎有效的结构......

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char * 32),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_byte * 7),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ]

但是，现在我剩下的问题是，为什么使用时.readinto()：

f = open(binaryfile, "rb")

mystruct = BinaryHeader()
f.readinto(mystruct)

它正在返回52，而不是预期的，51. 额外的字节从何而来，又去往何方？

更新 2 对于那些感兴趣的人，这里有一个将值读入 eryksun 提到的命名元组的替代方法的示例：struct

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)

score 7 · Accepted Answer

此行定义实际上是用于定义位域：

...
("more_funky_numbers_7bytes", c_uint, 56),
...

这是错误的。位域的大小应小于或等于类型的大小，因此c_uint最多应为 32，多出一个位将引发异常：

ValueError: number of bits invalid for bit field

使用位域的示例：

from ctypes import *

class MyStructure(Structure):
    _fields_ = [
        # c_uint8 is 8 bits length
        ('a', c_uint8, 4), # first 4 bits of `a`
        ('b', c_uint8, 2), # next 2 bits of `a`
        ('c', c_uint8, 2), # next 2 bits of `a`
        ('d', c_uint8, 2), # since we are beyond the size of `a`
                           # new byte will be create and `d` will
                           # have the first two bits
    ]

mystruct = MyStructure()

mystruct.a = 0b0000
mystruct.b = 0b11
mystruct.c = 0b00
mystruct.d = 0b11

v = c_uint16()

# copy `mystruct` into `v`, I use Windows
cdll.msvcrt.memcpy(byref(v), byref(mystruct), sizeof(v))

print sizeof(mystruct) # 2 bytes, so 6 bits are left floating, you may
                       # want to memset with zeros
print bin(v.value)     # 0b1100110000

你需要的是 7 个字节，所以你最终做的是正确的：

...
("more_funky_numbers_7bytes", c_byte * 7),
...

至于结构的大小，它将是 52，我将填充额外的字节以对齐32 位处理器上的 4 个字节或 64 位上的 8 个字节的结构。这里：

from ctypes import *

class BinaryHeader(BigEndianStructure):
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
        ("ascii_text_32bytes", c_char * 32),
        ("timestamp_4bytes", c_uint),
        ("more_funky_numbers_7bytes", c_byte * 7),
        ("some_flags_1byte", c_byte),
        ("other_flags_1byte", c_byte),
        ("payload_length_2bytes", c_ushort),
    ]

mystruct = BinaryHeader(
    0x11111111,
    '\x22' * 32,
    0x33333333,
    (c_byte * 7)(*([0x44] * 7)),
    0x55,
    0x66,
    0x7777
)

print sizeof(mystruct)

with open('data.txt', 'wb') as f:
    f.write(mystruct)

额外的字节在文件之间other_flags_1byte和payload_length_2bytes文件中填充：

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 00 77 77 f.ww
            ^
         extra byte

当涉及到文件格式和网络协议时，这是一个问题。要将其更改为 1：

 ...
class BinaryHeader(BigEndianStructure):
    _pack_ = 1
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
...

该文件将是：

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 77 77    fww

至于struct，它不会让你的情况更容易。遗憾的是它不支持格式中的嵌套元组。例如这里：

>>> from struct import *
>>>
>>> data = '\x11\x11\x11\x11\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22
\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x33
\x33\x33\x33\x44\x44\x44\x44\x44\x44\x44\x55\x66\x77\x77'
>>>
>>> BinaryHeader = Struct('>I32cI7BBBH')
>>>
>>> BinaryHeader.unpack(data)
(286331153, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', 858993459, 68, 68, 68, 68, 68, 68, 68, 85, 102, 30583)
>>>

这个结果不能用namedtuple，还是要根据索引来解析。如果您可以执行类似'>I(32c)(I)(7B)(B)(B)H'. 自 2003 年以来，此处已请求此功能（扩展 struct.unpack 以生成嵌套元组），但此后没有做任何事情。

python - 通过 readinto() 将二进制数据解析为 ctypes 结构对象

1 回答 1

Related

Reference