1

Pandas has a DataFrame.to_msgpack() method for serialising a dataframe to the MessagePack format.

It requires a file path or a 'buffer-like' object. If not provided, then it returns the data in a string representation.

My question is how to properly save this data as a buffer-like object without saving it as a string first?

#1
string_data = df.to_msgpack()  # returns data as string

#2
memory_buffer = memory view(df.to_msgpack())  # creates a memory view from string

#3
df.to_msgpack('filename.msg')  # return data as binary file

#4
memory_buffer = memoryview(b'')
df.to_msgpack(memory_buffer, append=True)  # would this work?

In scenario 4, df.to_msgpack() requires a buffer-like object, whereas memoryview() requires an input parameter. So one would have to create an 'empty' memory view and then pass this to the to_msgpack() method. Then append the data. Though I wonder if this will lead to artefacts when unpacking the data.

With scenario 2, is it correct to think that a memory view of a string would be equivalent to a byte-array?

4

1 回答 1

2

在仔细阅读 pandas 源代码之后,似乎这样做的方法是使用 python 的 io.BytesIO() 作为缓冲区:

buffer = io.BytesIO()
df.to_msgpack(buffer, append=False, compress='zlib')

这似乎工作得很好。请注意,compress 选项在 0.16.0 版本中显得有些参差不齐,但似乎在主版本中已解决。

于 2015-04-29T10:21:46.973 回答