0

我有一个大致如下所示的 shell 脚本:

#!/bin/bash

# Script variables
NPM="/usr/bin/npm"

# Start several sub-processes in a loop in parallel
for i in {1..4}; do
    $NPM run -s long_running_script >> /path/to/script/output/stream.tsv &
done
wait

为了确保长时间运行的脚本持续运行,但为了防止多个实例并行运行,我通过 cron 调用它,使用:

0 * * * * /usr/bin/flock -n /var/lock/my_lock_file /path/to/script/hourly.sh

我尝试使用陷阱并将其添加到脚本的开头,但没有运气:

trap "kill $(jobs -p)" EXIT

上述的不同变体也不起作用,例如trap "kill -HUP -$$". 似乎陷阱甚至没有被执行(大概是因为它正在等待所有实例$NPM run -s long_running_script完成?)。

我可以从不同的脚本中杀死进程,知道孩子和孙子的名字,然后执行pkill,但我更喜欢更通用的解决方案。有没有办法在不手动跟踪所有生成的进程及其后续子进程的情况下杀死从 shell 脚本生成的所有进程?

更新

向脚本片段添加了更多详细信息,以显示我正在管道输出。这是 NPM 产生的进程的快照(在每个循环的迭代中):

UID        PID  PPID  C STIME TTY          TIME CMD
root      4124  4122 10 14:51 ?        00:00:01 npm
root      4134  4124  0 14:51 ?        00:00:00 sh -c node long_running_script.js
root      4135  4134 42 14:51 ?        00:00:03 node long_running_script.js

以及来自 lsof 的相应输出:

COMMAND  PID USER   FD      TYPE DEVICE SIZE/OFF    NODE NAME
npm     4124 root  cwd       DIR   43,0     4096 2753156 /path/to/working/directory
npm     4124 root  rtd       DIR   43,0     4096       2 /
npm     4124 root  txt       REG   43,0 11187096 1971997 /usr/bin/nodejs
npm     4124 root  mem       REG   43,0 25913104 2238299 /usr/lib/x86_64-linux-gnu/libicudata.so.55.1
npm     4124 root  mem       REG   43,0  1864888 2757931 /lib/x86_64-linux-gnu/libc-2.23.so
npm     4124 root  mem       REG   43,0    89696 2752592 /lib/x86_64-linux-gnu/libgcc_s.so.1
npm     4124 root  mem       REG   43,0  1088952 2757928 /lib/x86_64-linux-gnu/libm-2.23.so
npm     4124 root  mem       REG   43,0  1566440 2234802 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
npm     4124 root  mem       REG   43,0  1636360 2238295 /usr/lib/x86_64-linux-gnu/libicuuc.so.55.1
npm     4124 root  mem       REG   43,0  2496856 2238296 /usr/lib/x86_64-linux-gnu/libicui18n.so.55.1
npm     4124 root  mem       REG   43,0  2361856 2752549 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
npm     4124 root  mem       REG   43,0   428384 2752550 /lib/x86_64-linux-gnu/libssl.so.1.0.0
npm     4124 root  mem       REG   43,0    14608 2757925 /lib/x86_64-linux-gnu/libdl-2.23.so
npm     4124 root  mem       REG   43,0   138696 2757942 /lib/x86_64-linux-gnu/libpthread-2.23.so
npm     4124 root  mem       REG   43,0    31712 2757938 /lib/x86_64-linux-gnu/librt-2.23.so
npm     4124 root  mem       REG   43,0   142640 2238078 /usr/lib/x86_64-linux-gnu/libuv.so.1.0.0
npm     4124 root  mem       REG   43,0   104824 2752763 /lib/x86_64-linux-gnu/libz.so.1.2.8
npm     4124 root  mem       REG   43,0   162632 2757932 /lib/x86_64-linux-gnu/ld-2.23.so
npm     4124 root    0r      CHR    1,3      0t0    1031 /dev/null
npm     4124 root    1w      REG   43,0  2488489 2752561 /path/to/script/output/stream.tsv
npm     4124 root    2w      REG   43,0     4669 1837068 /var/log/my_log_file
npm     4124 root    3r      REG   0,17        0    8980 /run/lock/my_lock_file
npm     4124 root    4r     FIFO    0,9      0t0    8092 pipe
npm     4124 root    5w     FIFO    0,9      0t0    8092 pipe
npm     4124 root    6u  a_inode   0,10        0    2049 [eventpoll]
npm     4124 root    7r     FIFO    0,9      0t0    8093 pipe
npm     4124 root    8w     FIFO    0,9      0t0    8093 pipe
npm     4124 root    9u  a_inode   0,10        0    2049 [eventfd]

更新 2

以下是我调用 时会发生的情况fuser -k,它只会杀死顶级进程和 npm,但不会杀死它的孙子进程:

root@host:/ fuser -v /var/lock/my_file_lock 
/run/lock/my_file_lock:    root      26156 f.... flock
                           root      26157 f.... hourly.sh
                           root      26159 f.... npm
                           root      26225 f.... npm
                           root      26328 f.... npm
                           root      26470 f.... npm

root@host:/ fuser -k /var/lock/my_file_lock 
/run/lock/my_file_lock:     4121  4122  4124  4153  4290  4430

root@host:/ ps -ef
COMMAND  PID USER   FD      TYPE DEVICE SIZE/OFF    NODE NAME
...
root      4134     1  0 14:51 ?        00:00:00 sh -c node long_running_script.js
root      4135  4134  0 14:51 ?        00:00:08 node long_running_script.js
4

1 回答 1

0

In general, you can use fuser -k to kill all processes having your lockfile open:

fuser -k /var/lock/my_lock_file

That said, Node quite intentionally tries not to pass file descriptors through to child processes. Since you're using NPM, you might try reusing a file descriptor that's going to be passed through anyhow. For instance, to use stdin:

exec 0<>/var/lock/my_lock_file
flock -x 0 || exit
# ...put the rest of your code here...

will pass in the lock on stdin, so as long as your original stdin is passed through to subprocesses, those processes should inherit the lock and be killed by fuser.

If stdin, stdout or stderr is not passed through, I would suggest chatting with upstream for the specific tooling or process that isn't passing it through to see if there's a way to avoid or modify that behavior.

于 2017-03-26T21:22:32.800 回答