Python 2.7.3
我有一个包含数千个数据文件的文件夹。每个数据文件都被提供给构造函数并进行大量处理。现在我正在遍历文件并按顺序处理它们:
class Foo:
def __init__(self,file):
self.bar = do_lots_of_stuff_with_numpy_and_scipy(file)
def do_lots_of_stuff_with_numpy_and_scipy(file):
pass
def get_foos(dir):
return [Foo(os.path.join(dir,file)) for file in os.listdir(dir)]
这很好用,但是太慢了。我想并行执行此操作。我试过:
def parallel_get_foos(dir):
p = Pool()
foos = p.map(Foo, [os.path.join(dir,file) for file in os.listdir(dir)])
p.close()
p.join()
return foos
if __name__ == "__main__":
foos = parallel_get_foos(sys.argv[1])
但它只是错误地出现了很多这些:
Process PoolWorker-7:
Traceback (most recent call last):
File "/l/python2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/l/python2.7/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/l/python2.7/lib/python2.7/multiprocessing/pool.py", line 99, in worker
put((job, i, result))
File "/l/python2.7/lib/python2.7/multiprocessing/queues.py", line 390, in put
return send(obj)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
我尝试过创建一个函数来返回对象,例如:
def get_foo(file):
return Foo(file)
def parallel_get_foos(dir):
...
foos = p.map(get_foo, [os.path.join(dir,file) for file in os.listdir(dir)])
...
但正如预期的那样,我得到了同样的错误。
我已经阅读了大量类似的线程,试图解决类似这样的问题,但没有一个解决方案对我有帮助。所以我很感激任何帮助!
编辑:
Bakuriu 正确地推测我在我的 do_lots_of_stuff 方法中定义了一个非顶级函数。特别是,我正在执行以下操作:
def fit_curve(data,degree):
"""Fits a least-square polynomial function to the given data."""
sorted = data[data[:,0].argsort()].T
coefficients = numpy.polyfit(sorted[0],sorted[1],degree)
def eval(val,deg=degree):
res = 0
for coefficient in coefficients:
res += coefficient*val**deg
deg -= 1
return res
return eval
反正有没有让这个功能可以腌制?