嗨,一个名为 G09 的软件使用 Linda 并行工作。它在其他节点(主机)上生成其并行子进程
/usr/bin/ssh -x compute-0-127.local -n /usr/local/g09l/g09/linda-exe/l1002.exel ...other_opts...
但是,当主节点杀死该进程时,其他节点上的相应子进程,即compute-0-127并没有死亡,而是继续在后台运行。现在,我手动转到每个具有这些孤立 Linda 进程的节点并使用kill
. 有没有办法杀死这样的子进程?
在杀死进程之前查看 PSTREE 的 pastebin 1,在父进程被杀死后查看 PSTREE 的 pastebin 2
pastebin1 - http://pastebin.com/yNXFR28V
pastebin2 - http://pastebin.com/ApwXrueh - 没有
足够的信誉点用于第二个超链接pastebin,对不起!(
更新到 Answer1
感谢马丁的解释。我尝试了以下
killme() { kill 0 ; } ; #Make calls to prepare for running G09 ;
g09 < "$g09inp" > "$g09out" &
trap killme 'TERM'
wait
但是当 Torque/Maui(处理作业执行)终止作业(此脚本)qdel $jobid
时,G09 启动的进程ssh -x $host -n
仍然在后台运行。我在这里做错了什么?(正常终止不是问题,因为 G09 本身会停止这些过程。)以下是pstree
之前qdel
bash
|-461.norma.iitb. /opt/torque/mom_priv/jobs/461.norma.iitb.ac.in.SC
| `-g09
| `-l1002.exe 1048576000Pd-C-C-addn-H-MO6-fwd-opt.chk
| `-cLindaLauncher/tmp/viaExecDataN6
| |-l1002.exel 1048576000Pd-C-C-addn-H-MO6-fwd-opt.ch
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | |-{l1002.exel}
| | `-{l1002.exel}
| |-ssh -x compute-0-149.local -n ...
| |-ssh -x compute-0-147.local -n ...
| |-ssh -x compute-0-146.local -n ...
| |-{cLindaLauncher}
| `-{cLindaLauncher}
`-pbs_demux
在qdel
它仍然显示之后
461.norma.iitb. /opt/torque/mom_priv/jobs/461.norma.iitb.ac.in.SC
`-ssh -x -n compute-0-149 rm\040-rf\040/state/partition1/trirag09/461
l1002.exel 1048576000Pd-C-C-addn-H-MO6-fwd-opt.ch
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
|-{l1002.exel}
`-{l1002.exel}
ssh -x compute-0-149.local -n /usr/local/g09l/g09/linda-exe/l1002.exel
ssh -x compute-0-147.local -n /usr/local/g09l/g09/linda-exe/l1002.exel
ssh -x compute-0-146.local -n /usr/local/g09l/g09/linda-exe/l1002.exel
我在这里做错了什么?错trap killme 'TERM'
了吗?