python - IPython 并行控制器可以同时具有本地和远程 ipengines 吗？

Question

c = Client(profile='myprofile')

或者

c = Client('/path/to/my/ipcontroller-client.json')

对于本地 ipengines (IIUC) 和

c = Client('/path/to/my/ipcontroller-client.json', sshserver='me@myhub.example.com')

如果我的 ipengines 在另一台服务器上。

但是我需要做什么才能让 IPython 并行控制器，比如说，管理来自本地节点的 8 个 ipengines 和来自远程节点的 8 个 ipengines，通过 SSH 连接？

或者如果没有全面的 HDFS、Hadoop 等，这是不可能的吗？

我的目标是拥有一个客户端（或控制器？）接口，我可以以负载平衡的方式发送一堆计算，我不在乎在哪里运行什么以及何时运行。

score 3 · Accepted Answer

客户端的 sshserver 参数仅适用于无法从客户端直接访问控制器的情况（例如笔记本电脑上的客户端、远程网络上防火墙后面的控制器）。客户永远不需要知道或关心引擎在哪里。此外，仅当机器无法相互访问时才需要 ssh 隧道。为简单起见，我假设您实际上不需要 ssh 隧道。

最简单的情况：

无配置

（如果共享文件系统则跳过）将连接文件发送到 host2

[host1] rsync -av $HOME/.ipython/profile_default/security/ host2:.ipython/profile_default/security/

在 host1 上启动引擎

[host1] ipengine
# or start multiple engines at once:
[host1] ipcluster engines -n 5

在 host2 上启动引擎

[host2] ipengine
# or start multiple engines at once:
[host2] ipcluster engines -n 8

在 host1 上打开一个客户端：

[host1] ipython
In[1]: from IPython import parallel
In[2]: rc = parallel.Client()

您现在应该可以访问两台机器上的引擎。

您也可以通过配置来表达所有这些。初始化配置文件：

[host1] ipython profile create --parallel

告诉 ipcontroller 监听所有接口ipcontroller_config.py：

c.HubFactory.ip = '*'

告诉 ipcluster 在 host1 和 host2 上使用 ssh 启动引擎ipcluster_config.py：

c.IPClusterEngines.engine_launcher_class = 'SSH'
c.SSHEngineSetLauncher.engines = {
    'host1': 5,
    'host2': 8,
}

一切从以下开始ipcluster：

[host1] ipcluster start

SSH 启动器将负责将连接文件复制到远程引擎。

如果您确实需要 ssh 隧道，您可以指定

c.IPControllerApp.ssh_server = u'host1'

在ipcontroller_config.py. IPython 应该能够判断引擎或客户端是否正在运行host1，如果不需要，则跳过隧道。如果无法确定，您可以手动指定应使用 ssh 服务器的位置并将其排除在配置之外，或者将其放入配置并手动指定不应使用 ssh 服务器，以您更方便的方式为准.