我正在尝试使用multiprocessing.Pool
. 我能够在多处理模块中成功使用 ThreadPool 实现,但希望能够使用进程而不是线程,因为它可能会更快并消除为 Matplotlib 处理多线程环境所做的一些更改。我收到一个错误,我怀疑与不共享地址空间的进程有关,但我不确定如何解决它:
Traceback (most recent call last):
File "test_tarfile.py", line 32, in <module>
test_multiproc()
File "test_tarfile.py", line 24, in test_multiproc
pool.map(read_file, files)
File "/ldata/whitcomb/epd-7.1-2-rh5-x86_64/lib/python2.7/multiprocessing/pool.py", line 225, in map
return self.map_async(func, iterable, chunksize).get()
File "/ldata/whitcomb/epd-7.1-2-rh5-x86_64/lib/python2.7/multiprocessing/pool.py", line 522, in get
raise self._value
ValueError: I/O operation on closed file
实际的程序更复杂,但这是我正在做的一个重现错误的示例:
from multiprocessing.pool import ThreadPool, Pool
import StringIO
import tarfile
def write_tar():
tar = tarfile.open('test.tar', 'w')
contents = 'line1'
info = tarfile.TarInfo('file1.txt')
info.size = len(contents)
tar.addfile(info, StringIO.StringIO(contents))
tar.close()
def test_multithread():
tar = tarfile.open('test.tar')
files = [tar.extractfile(member) for member in tar.getmembers()]
pool = ThreadPool(processes=1)
pool.map(read_file, files)
tar.close()
def test_multiproc():
tar = tarfile.open('test.tar')
files = [tar.extractfile(member) for member in tar.getmembers()]
pool = Pool(processes=1)
pool.map(read_file, files)
tar.close()
def read_file(f):
print f.read()
write_tar()
test_multithread()
test_multiproc()
我怀疑当TarInfo
对象被传递到另一个进程但父进程TarFile
没有时出现问题,但我不确定如何在多进程情况下修复它。我可以这样做而不必从 tarball 中提取文件并将它们写入磁盘吗?