linux - 我可以检测到从长期运行的后台进程中提前退出吗？

Question

我正在尝试改进在集群环境中运行的多台服务器的启动脚本。服务器进程应该无限期地运行，但偶尔会在启动时失败，例如Address already in use异常。

我希望启动脚本的退出代码能够反映这些提前终止，例如，等待 1 秒并告诉我服务器是否启动正常。我还需要回显服务器 PID。

这是我迄今为止最好的镜头：

$ cat startup.sh
# start the server in the bg but if it fails in the first second, 
# then kill startup.sh.

CMD="start_server -option1 foo -option2 bar"
eval "($CMD >> cc.log 2>&1 || kill -9 $$ &)"
SERVER_PID=$!

# the `kill` above only has 1 second to kill me-- otherwise my exit code is 0
sleep 1
echo $SERVER_PID

退出代码工作正常，但仍然存在两个问题：

如果服务器长时间运行但最终遇到错误，则父级startup.sh将已经退出，并且$$PID 可能已被不相关的进程重用，然后该脚本将终止该进程。
这SERVER_PID是不正确的，因为它是子shell的PID而不是start_server命令（在这种情况下是startup.sh脚本的孙子。

有没有更简单的方法来后台start_server进程，获取它的 PID，并使用超时检查错误代码？我查看了 bash 内置函数wait，timeout但它们似乎不适用于最终不应退出的进程。

我无法更改服务器代码，并且启动脚本不应无限期运行。

score 1 · Accepted Answer

You can also use coproc (and look, I'm putting the command in an array, and also with proper quoting!):

#!/bin/bash
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[@]}" >> cc.log 2>&1 ; }
server_pid=$!
sleep 1
if [[ -z "${mycoprocfd[@]}" ]]; then
    echo >&2 "Failure detected when starting server! Server died before 1 second."
    exit 1
else
    echo $server_pid
fi

The trick is that coproc puts the file descriptors of the redirections of stdin and stdout in a prescribed array (here mycoprocfd) and empties the array when the process exits. So you don't need to do clumsy stuff with the PID itself.

You can hence check for the server to never exit as so:

#!/bin/bash
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[@]}" >> cc.log 2>&1 ; }
server_pid=$!
read -u "${mycoprocfd[0]}"
echo >&2 "Oh dear, the server with PID $server_pid died after $SECONDS seconds."
exit 1

That's because read will read on the file descriptor given by coproc (but nothing is ever read here, since the stdout of your command has been redirected to a file!), and read exits when the file descriptor is closed, i.e., when the command launched by coproc exits.

I'd say this is a really elegant solution!

Now, this script will live as long as the coproc lives. I understood that's not what you want. In this case, you can timeout the read with its -t option, and then you'll use the fact that return's exit status is greater than 128 if it timed out. E.g., for a 4.5 seconds timeout

#!/bin/bash
timeout=4.5
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[@]}" >> cc.log 2>&1 ; }
server_pid=$!
read -t $timeout -u "${mycoprocfd[0]}"
if (($?>128)); then
    echo "$server_pid <-- all is good, it's still alive after $timeout seconds."
else
    echo >&2 "Oh dear, the server with PID $server_pid died after $timeout seconds."
    exit 1
fi
exit 0 # Yay

This is also very elegant :).

Use, extend, and adapt to your needs! (but with good practices!)

Hope this helps!

Remarks.

coproc is a bash-builtin that appeared in bash 4.0. The solutions shown here are 100% pure bash (except the first one, with sleep, which is not the best one at all!).
The use of coproc in scripts is almost always superior to putting jobs in background with & and doing clumsy and awkward stuff with sleep and checking $!.
If you want coproc to keep quiet, whatever happens (e.g., if there's an error launching the command, which is fine here since you're handling everything yourself), do:
```
coproc mycoprocfd { "${cmd[@]}" >> cc.log 2>&1 ; } > /dev/null 2>&1
```

score 0 · Accepted Answer

20 分钟的谷歌搜索显示https://stackoverflow.com/a/6756971/494983和https://stackoverflow.com/a/14296353/494983。kill -0 $PID

所以看来我可以使用：

$ cat startup.sh   
CMD="start_server -option1 foo -option2 bar"
eval "$CMD >> cc.log 2>&1 &"
SERVER_PID=$!
sleep 1
kill -0 $SERVER_PID
if [ $? != 0 ]; then
    echo "Failure detected when starting server! PID $SERVER_PID doesn't exist!" 1>&2
    exit 1
else
    echo $SERVER_PID
fi

This wouldn't work for processes that I can't send signals to but works well enough in my case (where startup.sh starts the server itself).

linux - 我可以检测到从长期运行的后台进程中提前退出吗？

2 回答 2

Related

Reference