我正在尝试使用 PBS 调度在 linux 集群上设置 ipython 并行。
我按照http://www.andreazonca.com/2013/04/ipython-parallell-setup-on-carver-at.html上的说明进行操作(官方说明更难遵循)。我在头节点上运行命令,它使用 PBS(即标准集群配置)将作业发送到从节点。
我的问题是我超时了。我尝试将等待时间从 2 秒增加到 20 秒,但没有成功。任何帮助,将不胜感激。完整输出如下。
实际上,最后我希望能够从连接 ssh 的笔记本电脑而不是从集群头节点运行 ipython 命令,但我认为这是合理的第一步。
2013-08-11 13:56:07,380.380 [IPEngineApp] Config changed:
2013-08-11 13:56:07,381.381 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,381.381 [IPEngineApp] Config changed:
2013-08-11 13:56:07,382.382 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,381.381 [IPEngineApp] Config changed:
2013-08-11 13:56:07,381.381 [IPEngineApp] Config changed:
2013-08-11 13:56:07,383.383 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,382.382 [IPEngineApp] Config changed:
2013-08-11 13:56:07,382.382 [IPEngineApp] Config changed:
2013-08-11 13:56:07,383.383 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,382.382 [IPEngineApp] Config changed:
2013-08-11 13:56:07,383.383 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,383.383 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,383.383 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,382.382 [IPEngineApp] Config changed:
2013-08-11 13:56:07,383.383 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:07,387.387 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,387.387 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,388.388 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,388.388 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,388.388 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,389.389 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,389.389 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,389.389 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,389.389 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,389.389 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,389.389 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,389.389 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,389.389 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,389.389 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,390.390 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,390.390 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,390.390 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07,390.390 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:07,390.390 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:07,391.391 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:07,391.391 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07,391.391 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07,391.391 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07,391.391 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07.391 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07,391.391 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:07.391 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07,391.391 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07.391 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07,391.391 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:07,392.392 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07.392 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07.392 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07.392 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07,392.392 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:07.392 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07.393 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:07.402 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:07.402 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:07.402 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:07.402 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:07.403 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:07.403 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:07.403 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:07.403 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01,273.273 [IPEngineApp] Config changed:
2013-08-11 13:56:01,273.273 [IPEngineApp] Config changed:
2013-08-11 13:56:01,274.274 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,274.274 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,272.272 [IPEngineApp] Config changed:
2013-08-11 13:56:01,274.274 [IPEngineApp] Config changed:
2013-08-11 13:56:01,275.275 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,275.275 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,273.273 [IPEngineApp] Config changed:
2013-08-11 13:56:01,276.276 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,276.276 [IPEngineApp] Config changed:
2013-08-11 13:56:01,276.276 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,278.278 [IPEngineApp] Config changed:
2013-08-11 13:56:01,278.278 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,279.279 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01,279.279 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01,279.279 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01,279.279 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01,279.279 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01,279.279 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01,279.279 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01,280.280 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01,280.280 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01,280.280 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01,280.280 [IPEngineApp] Config changed:
2013-08-11 13:56:01,280.280 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01,280.280 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] {'EngineFactory': {'timeout': 10}, 'IPEngineApp': {'log_level': 10}}
2013-08-11 13:56:01,280.280 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,280.280 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01,281.281 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01,281.281 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01,281.281 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01,282.282 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01,282.282 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,282.282 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,282.282 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01.282 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01,282.282 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01.282 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01,282.282 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01.282 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01,283.283 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01,283.283 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,283.283 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,284.284 [IPEngineApp] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-08-11 13:56:01,284.284 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01,284.284 [IPEngineApp] Searching path [u'/home/username', u'/home/username/.ipython/profile_default'] for config files
2013-08-11 13:56:01,284.284 [IPEngineApp] Attempting to load config file: ipython_config.py
2013-08-11 13:56:01.284 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01.284 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01.284 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01.284 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01,284.284 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipython_config.py
2013-08-11 13:56:01,285.285 [IPEngineApp] Attempting to load config file: ipengine_config.py
2013-08-11 13:56:01,286.286 [IPEngineApp] Loaded config file: /home/username/.ipython/profile_default/ipengine_config.py
2013-08-11 13:56:01.286 [IPEngineApp] Loading url_file u'/home/username/.ipython/profile_default/security/ipcontroller-engine.json'
2013-08-11 13:56:01.295 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01.295 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01.296 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01.296 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01.296 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01.296 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01.298 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:01.299 [IPEngineApp] Registering with controller at tcp://127.0.0.1:53956
2013-08-11 13:56:17.411 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:17.412 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:17.413 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:17.412 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:17.412 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:17.413 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:17.413 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:17.413 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.304 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.304 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.305 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.306 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.306 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.306 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.307 [IPEngineApp] Registration timed out after 10.0 seconds
2013-08-11 13:56:11.309 [IPEngineApp] Registration timed out after 10.0 seconds
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
根据 Andrea Zonca 的建议更新
添加
c.HubFactory.ip = '*'
似乎有所帮助,尽管由于某种未知原因并没有立即生效。
但它仍然不起作用:
我跑
./.ipython/profile_default/ipc 2 1
2013-09-09 12:19:51,884.884 [IPClusterStart] Using existing profile dir: u'/home/username/.ipython/profile_default'
2013-09-09 12:19:51.889 [IPClusterStart] Starting ipcluster with [daemon=False]
2013-09-09 12:19:51.890 [IPClusterStart] Creating pid file: /home/username/.ipython/profile_default/pid/ipcluster.pid
2013-09-09 12:19:51.890 [IPClusterStart] Starting Controller with LocalControllerLauncher
2013-09-09 12:19:52.890 [IPClusterStart] Starting 2 Engines with PBS
2013-09-09 12:19:52.904 [IPClusterStart] Job submitted with job id: '4783'
2013-09-09 12:20:22.904 [IPClusterStart] Engines appear to have started successfully
然后在头节点上运行:
>IPython.parallel import Client
>rc = Client()
>lview = rc.load_balanced_view()
但输出来自
>rc.ids
是
[]
所以我试着跑步
ipcontroller --port=8888
然后运行 nmap
$nmap -sT -O localhost
...
8888/tcp open sun-answerbook
...
这表明端口是打开的,并且确实 telnet 给了我从节点的响应。
但是当我运行上面的原始命令时,
./.ipython/profile_default/ipc 2 1
nmap 显示没有端口打开。所以问题似乎是 qsub 文件中运行的 ipengine 没有像从命令行运行的 ipcontroller 那样打开端口。
这是 qsub 文件:
$cat /home/username/.ipython/profile_default/pbs.engine.template.ppn2
#!/bin/sh
#PBS -q longqueue
#PBS -l nodes={n/2}:ppn=2
cd $PBS_O_WORKDIR
which ipengine
mpirun -np {n} ipengine --timeout=20
这是我的/home/username/.ipython/profile_default/ipcluster_config.py:
c = get_config()
c.IPClusterStart.controller_launcher_class = 'LocalControllerLauncher'
c.IPClusterStart.engine_launcher_class = 'PBS'
c.PBSLauncher.batch_template_file = u'/home/username/.ipython/profile_default/pbs.engine.template'