7

我还没有弄清楚如何使用 pandas DataFrames 在 python 2 和 3 之间进行泡菜加载/保存。我玩过的pickler中有一个“协议”选项没有成功,但我希望有人有一个快速的想法让我尝试。这是获取错误的代码:

python2.7

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a2')
>>> a = pandas.DataFrame.load('a2')
>>> a = pandas.DataFrame.load('a3')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
ValueError: unsupported pickle protocol: 3

蟒蛇3

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a3')
>>> a = pandas.DataFrame.load('a3')
>>> a = pandas.DataFrame.load('a2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)

也许期望pickle在python版本之间工作有点乐观?

4

3 回答 3

8

我有同样的问题。您可以在 python3 中使用以下函数更改数据帧 pickle 文件的协议:

import pickle
def change_pickle_protocol(filepath,protocol=2):
    with open(filepath,'rb') as f:
        obj = pickle.load(f)
    with open(filepath,'wb') as f:
        pickle.dump(obj,f,protocol=protocol)

然后你应该可以在python2中打开它没问题。

于 2016-01-15T22:55:06.543 回答
1

如果有人使用pandas.DataFrame.to_pickle()然后在源代码中进行以下修改以具有pickle协议设置的能力:

1)在源文件中/pandas/io/pickle.py(修改前将原始文件复制为/pandas/io/pickle.py.ori)搜索以下行:

def to_pickle(obj, path):

pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)

将这些行更改为:

def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL):

pkl.dump(obj, f, protocol=protocol)

2)在源文件中/pandas/core/generic.py(修改前将原始文件复制为/pandas/core/generic.py.ori)搜索以下行:

def to_pickle(self, path):

return to_pickle(self, path)

将这些行更改为:

def to_pickle(self, path, protocol=None):

return to_pickle(self, path, protocol)

3)如果它运行,则重新启动您的python内核,然后使用任何可用的pickle协议(0、1、2、3、4)保存您的数据帧:

# Python 2.x can read this
df.to_pickle('my_dataframe.pck', protocol=2)

# protocol will be the highest (4), Python 2.x can not read this
df.to_pickle('my_dataframe.pck')

4)熊猫升级后,重复步骤1和2。

5)(可选)要求开发人员在正式版本中具有此功能(因为您的代码将在没有这些更改的任何其他 Python 环境中抛出异常)

美好的一天!

于 2016-10-25T09:41:54.313 回答
1

您可以覆盖可用于泡菜包的最高协议:

import pickle as pkl
import pandas as pd
if __name__ == '__main__':
    # this constant is defined in pickle.py in the pickle package:"
    pkl.HIGHEST_PROTOCOL = 2
    # 'foo.pkl' was saved in pickle protocol 4
    df = pd.read_pickle(r"C:\temp\foo.pkl")

    # 'foo_protocol_2' will be saved in pickle protocol 2 
    # and can be read in pandas with Python 2
    df.to_pickle(r"C:\temp\foo_protocol_2.pkl")

这绝对不是一个优雅的解决方案,但它可以在不直接更改 pandas 代码的情况下完成工作。

更新:我发现新版本的熊猫,允许在.to_pickle函数中指定泡菜版本: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html[1] DataFrame.to_pickle(path, compression='infer', protocol=4)

于 2017-06-15T08:40:42.383 回答