1

类似问题的问题: Parallel Python - too many files and Python too many open files (subprocesses)

我正在使用 Parallel Python [V1.6.2] 来运行任务。该任务处理输入文件并输出日志/报告。假设有10个文件夹,每个文件夹有5000~20000个文件,并行读取、处理和写出日志。每个文件大约 50KB ~ 250KB

运行约 6 小时后,Parallel Python 失败并出现以下错误。

  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 342, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 506, in set_ncpus
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 140, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 146, in start
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
  File "/usr/lib/python2.7/subprocess.py", line 1135, in _execute_child
  File "/usr/lib/python2.7/subprocess.py", line 1091, in pipe_cloexec
OSError: [Errno 24] Too many open files
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in  apport_excepthook
ImportError: No module named fileutils

Original exception was:
Traceback (most recent call last):
  File "PARALLEL_TEST.py", line 746, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 342, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 506, in set_ncpus
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 140, in __init__
  File "/usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py", line 146, in start
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
  File "/usr/lib/python2.7/subprocess.py", line 1135, in _execute_child
  File "/usr/lib/python2.7/subprocess.py", line 1091, in pipe_cloexec
OSError: [Errno 24] Too many open files

虽然我理解,这可能是这里http://bugs.python.org/issue2320指出的子进程中的问题,但是,似乎解决方案只是 Py V3.2 的一部分。我目前绑定到 Py V2.7。

我想知道以下建议是否有帮助:[1] http://www.parallelpython.com/component/option,com_smf/Itemid,1/topic,313.0

*) 在 /usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/pp.py 的 destroy() 方法中添加 worker.t.close()

*) 在 /usr/local/lib/python2.7/dist-packages/pp-1.6.2-py2.7.egg/ppauto.py 中增加 BROADCAST_INTERVAL

我想知道 Python V2.7 中是否有可用的修复程序/解决此问题的方法。

提前致谢

4

2 回答 2

1

我的团队最近在运行 celeryd 任务队列作业时偶然发现了一个类似的问题,即相同的文件句柄资源耗尽问题。我相信 OP 已经搞定了,它很可能是 Python 2.7 和 Python 3.1 中 suprocess.py lib 中的杂乱代码。

As suggested in , Python Bug#2320, please pass in close_fds=True everywhere you call subprocess.Popen(). In fact they make that a default in Python 3.2 while also fixing the underlying race condition issue. See more details in that ticket.

于 2013-08-28T17:09:41.950 回答
0

我已经离开了一些线路来破坏作业服务器。job_server.destroy() 解决了这个问题。

于 2012-12-24T05:10:50.057 回答