我已经使用 cfncluster 创建了一个 EC2 实例集群,现在我需要在所有节点上运行 dispynode.py 命令。
我首先创建一个名为“workers.txt”的私有 IP 地址列表,然后运行以下 bash 命令
for host in $(cat workers.txt); do
ssh $host "dispynode.py --ext_ip_addr $host &";
done
这似乎有效,因为我得到了每个 IP 地址的预期 dispynode 输出。例如,对于每个 IP 地址,我都会得到与此类似的输出
NOTE: Using dispy port 61591 (was 51348 in earlier versions)
2019-08-22 06:07:12 dispynode - dispynode version: 4.11.0, PID: 16074
2019-08-22 06:07:12 dispynode - Files will be saved under "/tmp/dispy/node"
2019-08-22 06:07:12 pycos - version 4.8.11 with epoll I/O notifier
2019-08-22 06:07:12 dispynode - "ip-172-31-8-242" serving 8 cpus
Enter "quit" or "exit" to terminate dispynode,
"stop" to stop service, "start" to restart service,
"release" to check and close computation,
"cpus" to change CPUs used, anything else to get status:
Enter "quit" or "exit" to terminate dispynode,
"stop" to stop service, "start" to restart service,
"release" to check and close computation,
"cpus" to change CPUs used, anything else to get status:
NOTE: Using dispy port 61591 (was 51348 in earlier versions)
问题是,当我 SSH 进入节点并检查进程是否正在运行时,它不是。
ssh 172.31.8.242
kill -0 16074
-bash: kill: (16074) - No such process
而且 dispy 客户端不起作用,无法发现节点。
问题:为什么我的并行 ssh 命令没有在节点上启动程序和/或为什么进程在启动后没有保持运行