4

(It is possible to directly jump to the question, further down, and to skip the introduction.)

There is a common difficulty with pickling Python objects from user-defined classes:

# This is program dumper.py
import pickle

class C(object):
    pass

with open('obj.pickle', 'wb') as f:
    pickle.dump(C(), f)

In fact, trying to get the object back from another program loader.py with

# This is program loader.py
with open('obj.pickle', 'rb') as f:
    obj = pickle.load(f)

results in

AttributeError: 'module' object has no attribute 'C'

In fact, the class is pickled by name ("C"), and the loader.py program does not know anything about C. A common solution consists in importing with

from dumper import C  # Objects of class C can be imported

with open('obj.pickle', 'rb') as f:
    obj = pickle.load(f)

However, this solution has a few drawbacks, including the fact that all the classes referenced by the pickled objects have to be imported (there can be many); furthermore, the local namespace becomes polluted by names from the dumper.py program.

Now, a solution to this consists of fully qualifying objects prior to pickling:

# New dumper.py program:
import pickle
import dumper  # This is this very program!

class C(object):
    pass

with open('obj.pickle', 'wb') as f:
    pickle.dump(dumper.C(), f)  # Fully qualified class

Unpickling with the original loader.py program above now works directly (no need to do from dumper import C).

Question: Now, other classes from dumper.py seem to be automatically fully qualified upon pickling, and I would love to know how this works, and whether this is a reliable, documented behavior:

import pickle
import dumper  # This is this very program!

class D(object):  # New class!
    pass

class C(object):
    def __init__(self):
        self.d = D()  # *NOT* fully qualified

with open('obj.pickle', 'wb') as f:
    pickle.dump(dumper.C(), f)  # Fully qualified pickle class

Now, unpickling with the original loader.py program also works (no need to do from dumper import C); print obj.d gives a fully qualified class, which I find surprising:

<dumper.D object at 0x122e130>

This behavior is very convenient, since only the top, pickled object has to be fully qualified with the module name (dumper.C()). But is this behavior reliable and documented? how come that classes are pickled by name ("D") but that the unpickling decides that the pickled self.d attribute is of class dumper.D (and not some local D class)?

PS: The question, refined: I just noticed a few interesting details that might point to an answer to this question:

In the pickling program dumper.py, print self.d prints <__main__.D object at 0x2af450>, with the first dumper.py program (the one without import dumper). On the other hand, doing import dumper and creating the object with dumper.C() in dumper.py makes print self.d print <dumper.D object at 0x2af450>: the self.d attribute is automatically qualified by Python! So, it appears that the pickle module has no role in the nice unpickling behavior described above.

The question is thus really: why does Python convert D() into the fully qualified dumper.D, in the second case? is this documented somewhere?

4

2 回答 2

3

当你的类在你的主模块中定义时,pickle 期望在它们被 unpickle 时找到它们。在你的第一种情况下,类是在主模块中定义的,所以当加载器运行时,加载器是主模块,pickle 找不到类。如果您查看 的内容obj.pickle,您会看到名称__main__导出为 C 和 D 类的命名空间。

在第二种情况下, dumper.py 会自行导入。现在您实际上定义了两组单独的 C 和 D 类:一组在__main__命名空间中,一组在dumper命名空间中。您序列化dumper命名空间中的一个(查看obj.pickle以验证)。

如果找不到,pickle 将尝试动态导入命名空间,因此当 loader.py 运行时,pickle 本身会导入 dumper.py 以及 dumper.C 和 dumper.D 类。

由于您有两个单独的脚本,dumper.py 和 loader.py,因此只有在公共导入模块中定义它们共享的类才有意义:

常见的.py

class D(object):
    pass

class C(object):
    def __init__(self):
        self.d = D()

加载器.py

import pickle

with open('obj.pickle','rb') as f:
    obj = pickle.load(f)

print obj

转储程序.py

import pickle
from common import C

with open('obj.pickle','wb') as f:
    pickle.dump(C(),f)

请注意,即使C()在这种情况下 dumper.py 转储 pickle 知道它是一个common.C对象(请参阅参考资料obj.pickle)。loader.py 运行时会动态导入 common.py 并成功加载对象。

于 2011-05-15T16:53:56.600 回答
2

以下是发生的情况:当从 inside 导入dumper(或执行from dumper import C)时dumper.py会再次解析整个程序(这可以通过在模块中插入 print 来查看)。这种行为是预期的,因为dumper它不是已经加载的模块(__main__但是被认为是加载的)——它不在sys.modules.

如 Mark 的回答所示,导入模块自然会限定模块中定义的所有名称,因此在重新评估文件时将self.d = D()其解释为 class (这相当于Mark 的回答中的 parsing )。dumper.Ddumper.pycommon.py

因此,import dumper(or from dumper import C) 技巧被解释了,并且 pickling 不仅完全限定了 class C,而且也完全限定了 class D。这使得通过外部程序进行 unpickling 更容易!

这也表明import dumperdone indumper.py强制 Python 解释器解析程序两次,既不高效也不优雅。因此,在一个程序中腌制类并在另一个程序中取消腌制它们可能最好通过马克回答中概述的方法来完成:腌制类应该在一个单独的模块中。

于 2011-05-15T19:31:14.430 回答