bash - 如何在 bash 中等待几个子进程完成，并在任何子进程以代码结束时返回退出代码！=0 ！= 0？

Question

如何在 bash 脚本中等待从该脚本生成的几个子进程完成，然后!=0在任何子进程以 code 结尾时返回退出代码!=0？

简单的脚本：

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有 10 个生成的子进程，但它总是会给出退出状态0（请参阅参考资料help wait）。如何修改此脚本，以便1在任何子进程以 code 结尾时发现生成的子进程的退出状态并返回退出代码!=0？

有没有比收集子进程的 PID、按顺序等待它们并对退出状态求和更好的解决方案？

score 644 · Accepted Answer

wait还（可选）采用PID等待的进程，并$!获得PID在后台启动的最后一个命令的。修改循环以将PID每个生成的子进程的存储到一个数组中，然后再次循环等待每个PID.

# run processes and store pids in array
for i in $n_procs; do
    ./procs[${i}] &
    pids[${i}]=$!
done

# wait for all pids
for pid in ${pids[*]}; do
    wait $pid
done

score 328 · Accepted Answer

http://jeremy.zawodny.com/blog/archives/010717.html：

#!/bin/bash

FAIL=0

echo "starting"

./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &

for job in `jobs -p`
do
echo $job
    wait $job || let "FAIL+=1"
done

echo $FAIL

if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi

score 93 · Accepted Answer

这是使用wait.

运行一些进程：

$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &

wait然后用命令等待他们：

$ wait < <(jobs -p)

^{或者只是wait（没有参数）所有人。}

这将等待后台的所有作业都完成。

^{如果-n提供了该选项，则等待下一个作业终止并返回其退出状态。}

请参阅：help wait和help jobs语法。

但是缺点是这只会返回最后一个 ID 的状态，因此您需要检查每个子进程的状态并将其存储在变量中。

或者使您的计算功能在失败时创建一些文件（空或带有失败日志），然后检查该文件是否存在，例如

$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.

score 56 · Accepted Answer

简单地说：

#!/bin/bash

pids=""

for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

wait $pids

...code continued here ...

更新：

正如多位评论者所指出的，上述等待所有进程完成后再继续，但如果其中一个失败，则不会退出并失败，这可以与@Bryan、@SamBrightman 和其他人建议的以下修改有关：

#!/bin/bash

pids=""
RESULT=0


for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

for pid in $pids; do
    wait $pid || let "RESULT=1"
done

if [ "$RESULT" == "1" ];
    then
       exit 1
fi

...code continued here ...

score 55 · Accepted Answer

如果您安装了 GNU Parallel，您可以执行以下操作：

# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}

GNU Parallel 会给你退出代码：

0 - 所有作业均无错误地运行。
1-253 - 一些作业失败。退出状态给出失败作业的数量
254 - 超过 253 个作业失败。
255 - 其他错误。

观看介绍视频以了解更多信息：http: //pi.dk/1

score 41 · Accepted Answer

到目前为止，这是我想出的。如果孩子终止，我想看看如何中断睡眠命令，这样就不必调整WAITALL_DELAY到一个人的使用情况。

waitall() { # PID...
  ## Wait for children to exit and indicate whether all exited with 0 status.
  local errors=0
  while :; do
    debug "Processes remaining: $*"
    for pid in "$@"; do
      shift
      if kill -0 "$pid" 2>/dev/null; then
        debug "$pid is still alive."
        set -- "$@" "$pid"
      elif wait "$pid"; then
        debug "$pid exited with zero exit status."
      else
        debug "$pid exited with non-zero exit status."
        ((++errors))
      fi
    done
    (("$#" > 0)) || break
    # TODO: how to interrupt this sleep when a child terminates?
    sleep ${WAITALL_DELAY:-1}
   done
  ((errors == 0))
}

debug() { echo "DEBUG: $*" >&2; }

pids=""
for t in 3 5 4; do 
  sleep "$t" &
  pids="$pids $!"
done
waitall $pids

score 23 · Accepted Answer

为了并行化这个......

for i in $(whatever_list) ; do
   do_something $i
done

翻译成这个...

for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
   (
   export -f do_something ## export functions (if needed)
   export PATH ## export any variables that are required
   xargs -I{} --max-procs 0 bash -c ' ## process in batches...
      {
      echo "processing {}" ## optional
      do_something {}
      }' 
   )

如果一个进程发生错误，它不会中断其他进程，但会导致整个序列的退出代码非零。
在任何特定情况下，可能需要也可能不需要导出函数和变量。
您可以--max-procs根据需要多少并行度进行设置（0意思是“一次全部”）。
GNU Parallel提供了一些额外的功能来代替xargs-- 但它并不总是默认安装。
在此示例中，for循环并不是绝对必要的，因为echo $i基本上只是重新生成$(whatever_list) 的输出。我只是认为使用for关键字可以更容易地了解正在发生的事情。
Bash 字符串处理可能会令人困惑——我发现使用单引号最适合包装非平凡的脚本。
您可以轻松地中断整个操作（使用 ^C 或类似方法），这与更直接的 Bash 并行方法不同。

这是一个简化的工作示例...

for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
   {
   echo sleep {}
   sleep 2s
   }'

score 15 · Accepted Answer

15

这是我使用的东西：

#wait for jobs
for job in `jobs -p`; do wait ${job}; done

于 2019-05-03T14:41:18.837 回答

score 10 · Accepted Answer

我看到这里列出了很多很好的例子，也想把我的也扔进去。

#! /bin/bash

items="1 2 3 4 5 6"
pids=""

for item in $items; do
    sleep $item &
    pids+="$! "
done

for pid in $pids; do
    wait $pid
    if [ $? -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of $?"
    else
        echo "FAILED - Job $pid exited with a status of $?"
    fi
done

我使用与并行启动/停止服务器/服务非常相似的东西并检查每个退出状态。对我很有用。希望这可以帮助某人！

score 9 · Accepted Answer

如果任何一个doCalculations失败，以下代码将等待所有计算完成并返回退出状态 1 。

#!/bin/bash
for i in $(seq 0 9); do
   (doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1

score 8 · Accepted Answer

我不相信 Bash 的内置功能是可能的。

您可以在孩子退出时收到通知：

#!/bin/sh
set -o monitor        # enable script job control
trap 'echo "child died"' CHLD

但是，没有明显的方法可以在信号处理程序中获取孩子的退出状态。

获取该子状态通常是wait较低级别 POSIX API 中的函数族的工作。不幸的是，Bash 对此的支持是有限的——您可以等待一个特定的子进程（并获得其退出状态），或者您可以等待所有子进程，并且总是得到 0 结果。

看起来不可能做的是相当于waitpid(-1), 直到任何子进程返回为止。

score 7 · Accepted Answer

如果您有 bash 4.2 或更高版本，以下内容可能对您有用。它使用关联数组来存储任务名称及其“代码”以及任务名称及其 pid。我还内置了一个简单的速率限制方法，如果您的任务消耗大量 CPU 或 I/O 时间并且您想要限制并发任务的数量，它可能会派上用场。

该脚本在第一个循环中启动所有任务，并在第二个循环中使用结果。

这对于简单的情况来说有点矫枉过正，但它允许非常整洁的东西。例如，可以将每个任务的错误消息存储在另一个关联数组中，并在一切稳定后打印它们。

#! /bin/bash

main () {
    local -A pids=()
    local -A tasks=([task1]="echo 1"
                    [task2]="echo 2"
                    [task3]="echo 3"
                    [task4]="false"
                    [task5]="echo 5"
                    [task6]="false")
    local max_concurrent_tasks=2

    for key in "${!tasks[@]}"; do
        while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
            sleep 1 # gnu sleep allows floating point here...
        done
        ${tasks[$key]} &
        pids+=(["$key"]="$!")
    done

    errors=0
    for key in "${!tasks[@]}"; do
        pid=${pids[$key]}
        local cur_ret=0
        if [ -z "$pid" ]; then
            echo "No Job ID known for the $key process" # should never happen
            cur_ret=1
        else
            wait $pid
            cur_ret=$?
        fi
        if [ "$cur_ret" -ne 0 ]; then
            errors=$(($errors + 1))
            echo "$key (${tasks[$key]}) failed."
        fi
    done

    return $errors
}

main

score 7 · Accepted Answer

这是我的版本，适用于多个 pid，如果执行时间过长，则记录警告，如果执行时间超过给定值，则停止子进程。

function WaitForTaskCompletion {
    local pids="${1}" # pids to wait for, separated by semi-colon
    local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
    local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
    local caller_name="${4}" # Who called this function
    local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       

    Logger "${FUNCNAME[0]} called by [$caller_name]."

    local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once 
    local log_ttime=0 # local time instance for comparaison

    local seconds_begin=$SECONDS # Seconds since the beginning of the script
    local exec_time=0 # Seconds since the beginning of this function

    local retval=0 # return value of monitored pid process
    local errorcount=0 # Number of pids that finished with errors

    local pidCount # number of given pids

    IFS=';' read -a pidsArray <<< "$pids"
    pidCount=${#pidsArray[@]}

    while [ ${#pidsArray[@]} -gt 0 ]; do
        newPidsArray=()
        for pid in "${pidsArray[@]}"; do
            if kill -0 $pid > /dev/null 2>&1; then
                newPidsArray+=($pid)
            else
                wait $pid
                result=$?
                if [ $result -ne 0 ]; then
                    errorcount=$((errorcount+1))
                    Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
                fi
            fi
        done

        ## Log a standby message every hour
        exec_time=$(($SECONDS - $seconds_begin))
        if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
            if [ $log_ttime -ne $exec_time ]; then
                log_ttime=$exec_time
                Logger "Current tasks still running with pids [${pidsArray[@]}]."
            fi
        fi

        if [ $exec_time -gt $soft_max_time ]; then
            if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
                Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
                soft_alert=1
                SendAlert

            fi
            if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
                Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
                kill -SIGTERM $pid
                if [ $? == 0 ]; then
                    Logger "Task stopped successfully"
                else
                    errrorcount=$((errorcount+1))
                fi
            fi
        fi

        pidsArray=("${newPidsArray[@]}")
        sleep 1
    done

    Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
    if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
        Logger "Stopping execution."
        exit 1337
    else
        return $errorcount
    fi
}

# Just a plain stupid logging function to be replaced by yours
function Logger {
    local value="${1}"

    echo $value
}

例如，等待所有三个进程完成，如果执行时间超过 5 秒，则记录警告，如果执行时间超过 120 秒，则停止所有进程。不要在失败时退出程序。

function something {

    sleep 10 &
    pids="$!"
    sleep 12 &
    pids="$pids;$!"
    sleep 9 &
    pids="$pids;$!"

    WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting

score 7 · Accepted Answer

等待所有作业并返回最后一个失败作业的退出代码。与上述解决方案不同，这不需要 pid 保存或修改脚本的内部循环。离开，然后等待。

function wait_ex {
    # this waits for all jobs and returns the exit code of the last failing job
    ecode=0
    while true; do
        [ -z "$(jobs)" ] && break
        wait -n
        err="$?"
        [ "$err" != "0" ] && ecode="$err"
    done
    return $ecode
}

编辑：修复了可能被运行不存在的命令的脚本愚弄的错误。

score 6 · Accepted Answer

只需将结果存储在外壳之外，例如在文件中。

#!/bin/bash
tmp=/tmp/results

: > $tmp  #clean the file

for i in `seq 0 9`; do
  (doCalculations $i; echo $i:$?>>$tmp)&
done      #iterate

wait      #wait until all ready

sort $tmp | grep -v ':0'  #... handle as required

score 6 · Accepted Answer

我对此进行了尝试，并结合了此处其他示例中的所有最佳部分。该脚本将在任何checkpids后台进程退出时执行该函数，并输出退出状态而不诉诸轮询。

#!/bin/bash

set -o monitor

sleep 2 &
sleep 4 && exit 1 &
sleep 6 &

pids=`jobs -p`

checkpids() {
    for pid in $pids; do
        if kill -0 $pid 2>/dev/null; then
            echo $pid is still alive.
        elif wait $pid; then
            echo $pid exited with zero exit status.
        else
            echo $pid exited with non-zero exit status.
        fi
    done
    echo
}

trap checkpids CHLD

wait

score 5 · Accepted Answer

#!/bin/bash
set -m
for i in `seq 0 9`; do
  doCalculations $i &
done
while fg; do true; done

set -m允许您在脚本中使用 fg & bg
fg，除了把最后一个进程放在前台之外，和它前台的进程有相同的退出状态
while fgfg当任何以非零退出状态退出时将停止循环

不幸的是，当后台进程以非零退出状态退出时，这将无法处理。（循环不会立即终止。它会等待前面的进程完成。）

score 5 · Accepted Answer

我刚刚将脚本修改为后台并并行化进程。

我做了一些实验（在 Solaris 上同时使用 bash 和 ksh）并发现 'wait' 输出退出状态，如果它不为零，或者在没有提供 PID 参数时返回非零退出的作业列表。例如

重击：

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]-  Exit 2                  sleep 20 && exit 2
[2]+  Exit 1                  sleep 10 && exit 1

克什：

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+  Done(2)                  sleep 20 && exit 2
[2]+  Done(1)                  sleep 10 && exit 1

此输出被写入 stderr，因此 OP 示例的简单解决方案可能是：

#!/bin/bash

trap "rm -f /tmp/x.$$" EXIT

for i in `seq 0 9`; do
  doCalculations $i &
done

wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
  exit 1
fi

虽然这样：

wait 2> >(wc -l)

还将返回一个计数但没有 tmp 文件。这也可以这样使用，例如：

wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)

但这并不比 tmp 文件 IMO 有用得多。我找不到一种有用的方法来避免 tmp 文件，同时也避免在子 shell 中运行“等待”，这根本不起作用。

score 4 · Accepted Answer

这里已经有很多答案，但我很惊讶似乎没有人建议使用数组......所以这就是我所做的 - 这可能对未来的某些人有用。

n=10 # run 10 jobs
c=0
PIDS=()

while true

    my_function_or_command &
    PID=$!
    echo "Launched job as PID=$PID"
    PIDS+=($PID)

    (( c+=1 ))

    # required to prevent any exit due to error
    # caused by additional commands run which you
    # may add when modifying this example
    true

do

    if (( c < n ))
    then
        continue
    else
        break
    fi
done 


# collect launched jobs

for pid in "${PIDS[@]}"
do
    wait $pid || echo "failed job PID=$pid"
done

score 4 · Accepted Answer

这行得通，应该和@HoverHell 的回答一样好，甚至更好！

#!/usr/bin/env bash

set -m # allow for job control
EXIT_CODE=0;  # exit code of overall script

function foo() {
     echo "CHLD exit code is $1"
     echo "CHLD pid is $2"
     echo $(jobs -l)

     for job in `jobs -p`; do
         echo "PID => ${job}"
         wait ${job} ||  echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
     done
}

trap 'foo $? $$' CHLD

DIRN=$(dirname "$0");

commands=(
    "{ echo "foo" && exit 4; }"
    "{ echo "bar" && exit 3; }"
    "{ echo "baz" && exit 5; }"
)

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    echo "$i ith command has been issued as a background job"
done

# wait for all to finish
wait;

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"

# end

当然，我已经在一个 NPM 项目中永久保存了这个脚本，它允许您并行运行 bash 命令，这对测试很有用：

https://github.com/ORESoftware/generic-subshell

score 4 · Accepted Answer

正是出于这个目的，我编写了一个bash名为:for.

注意：:for不仅保留并返回失败函数的退出代码，还会终止所有并行运行的实例。在这种情况下可能不需要。

#!/usr/bin/env bash

# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
    local pids=("$@")
    [ ${#pids} -eq 0 ] && return $?

    trap 'kill -INT "${pids[@]}" &>/dev/null || true; trap - INT' INT
    trap 'kill -TERM "${pids[@]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM

    for pid in "${pids[@]}"; do
        wait "${pid}" || return $?
    done

    trap - INT RETURN TERM
}

# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
    local f="${1}" && shift

    local i=0
    local pids=()
    for arg in "$@"; do
        ( ${f} "${arg}" ) &
        pids+=("$!")
        if [ ! -z ${FOR_PARALLEL+x} ]; then
            (( i=(i+1)%${FOR_PARALLEL} ))
            if (( i==0 )) ;then
                :wait "${pids[@]}" || return $?
                pids=()
            fi
        fi
    done && [ ${#pids} -eq 0 ] || :wait "${pids[@]}" || return $?
}

用法

for.sh：

#!/usr/bin/env bash
set -e

# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)

msg="You should see this three times"

:(){
  i="${1}" && shift

  echo "${msg}"

  sleep 1
  if   [ "$i" == "1" ]; then sleep 1
  elif [ "$i" == "2" ]; then false
  elif [ "$i" == "3" ]; then
    sleep 3
    echo "You should never see this"
  fi
} && :for : 1 2 3 || exit $?

echo "You should never see this"

$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1

参考

[1]：博客
[2]：要点

score 4 · Accepted Answer

这是@Luca Tettamanti 对最受好评的答案的扩展，以制作一个完全可运行的示例。

这个答案让我想知道：

是什么类型的变量n_procs，它包含什么？是什么类型的变量procs，它包含什么？有人可以通过为这些变量添加定义来更新此答案以使其可运行吗？我不明白怎么做。

...并且：

子进程完成后如何从子进程中获取返回码（这是这个问题的全部症结所在）？

无论如何，我想通了，所以这是一个完全可运行的示例。

笔记：

$!是如何获取最后执行的子进程的PID（进程ID）。
运行任何带有&其后的命令cmd &，例如，使其在后台作为与主进程的并行 suprocess 运行。
myarray=()是如何在 bash 中创建数组。
要了解有关wait内置命令的更多信息，请参阅help wait. 另请参阅，尤其是有关作业控制内置程序的官方 Bash 用户手册，例如wait和jobs，此处：https ://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#索引等待。

完整的、可运行的程序：等待所有进程结束

multi_process_program.sh（来自我的eRCaGuy_hello_world 存储库）：

#!/usr/bin/env bash


# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
    seconds_to_sleep="$1"
    sleep "$seconds_to_sleep"
    return "$seconds_to_sleep"
}

# Create an array of whatever commands you want to run as subprocesses
procs=()  # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")

num_procs=${#procs[@]}  # number of processes
echo "num_procs = $num_procs"

# run commands as subprocesses and store pids in an array
pids=()  # bash array
for (( i=0; i<"$num_procs"; i++ )); do
    echo "cmd = ${procs[$i]}"
    ${procs[$i]} &  # run the cmd as a subprocess
    # store pid of last subprocess started; see:
    # https://unix.stackexchange.com/a/30371/114401
    pids+=("$!")
    echo "    pid = ${pids[$i]}"
done

# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[@]}"; do
    wait "$pid"
    return_code="$?"
    echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."

通过运行将上面的文件更改为可执行chmod +x multi_process_program.sh，然后像这样运行它：

time ./multi_process_program.sh

样本输出。查看time调用中命令的输出如何显示运行时间为 5.084 秒。我们还能够成功地从每个子流程中检索返回码。

eRCaGuy_hello_world/bash$ time ./multi_process_program.sh 
num_procs = 4
cmd = my_sleep 5
    pid = 21694
cmd = my_sleep 2
    pid = 21695
cmd = my_sleep 3
    pid = 21697
cmd = my_sleep 4
    pid = 21699
PID = 21694; return_code = 5
PID = 21695; return_code = 2
PID = 21697; return_code = 3
PID = 21699; return_code = 4
All 4 processes have ended.
PID 21694 is done; return_code = 5; 3 PIDs remaining.
PID 21695 is done; return_code = 2; 2 PIDs remaining.
PID 21697 is done; return_code = 3; 1 PIDs remaining.
PID 21699 is done; return_code = 4; 0 PIDs remaining.

real    0m5.084s
user    0m0.025s
sys 0m0.061s

更进一步：确定每个单独流程何时结束

如果您想在每个进程完成时执行一些操作，并且您不知道它们何时完成，您可以在无限while循环中轮询以查看每个进程何时终止，然后执行您想要的任何操作。

只需注释掉上面的“OPTION 1”代码块，并用这个“OPTION 2”块代替它：

# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
    for i in "${!pids[@]}"; do
        pid="${pids[$i]}"
        # echo "pid = $pid"  # debugging

        # See if PID is still running; see my answer here:
        # https://stackoverflow.com/a/71134379/4561887
        ps --pid "$pid" > /dev/null
        if [ "$?" -ne 0 ]; then
            # PID doesn't exist anymore, meaning it terminated

            # 1st, read its return code
            wait "$pid"
            return_code="$?"

            # 2nd, remove this PID from the `pids` array by `unset`ting the
            # element at this index; NB: due to how bash arrays work, this does
            # NOT actually remove this element from the array. Rather, it
            # removes its index from the `"${!pids[@]}"` list of indices,
            # adjusts the array count(`"${#pids[@]}"`) accordingly, and it sets
            # the value at this index to either a null value of some sort, or
            # an empty string (I'm not exactly sure).
            unset "pids[$i]"

            num_pids="${#pids[@]}"
            echo "PID $pid is done; return_code = $return_code;" \
                 "$num_pids PIDs remaining."
        fi
    done

    # exit the while loop if the `pids` array is empty
    if [ "${#pids[@]}" -eq 0 ]; then
        break
    fi

    # Do some small sleep here to keep your polling loop from sucking up
    # 100% of one of your CPUs unnecessarily. Sleeping allows other processes
    # to run during this time.
    sleep 0.1
done

完整程序的示例运行和输出，其中选项 1 已注释掉且选项 2 正在使用中：

eRCaGuy_hello_world/bash$ ./multi_process_program.sh 
num_procs = 4
cmd = my_sleep 5
    pid = 22275
cmd = my_sleep 2
    pid = 22276
cmd = my_sleep 3
    pid = 22277
cmd = my_sleep 4
    pid = 22280
PID 22276 is done; return_code = 2; 3 PIDs remaining.
PID 22277 is done; return_code = 3; 2 PIDs remaining.
PID 22280 is done; return_code = 4; 1 PIDs remaining.
PID 22275 is done; return_code = 5; 0 PIDs remaining.

PID XXXXX is done在该过程终止后，这些行中的每一行都会立即打印出来！请注意，尽管sleep 5（在本例中为 PID 22275）的进程首先运行，但它最后完成，并且我们在每个进程终止后立即成功检测到它。我们还成功地检测到了每个返回码，就像在选项 1 中一样。

其他参考：

*****+ [非常有帮助]获取后台进程的退出代码- 这个答案教会了我关键原则（强调添加）：

wait <n>等到带有 PID 的进程完成（它会阻塞直到进程完成，所以你可能不想在确定进程完成之前调用它），然后返回已完成进程的退出代码。

换句话说，它帮助我知道即使在该过程完成后，您仍然可以调用wait它来获取它的返回码！
如何检查进程ID（PID）是否存在
1. 我的答案
从 Bash 数组中删除一个元素- 请注意，bash 数组中的元素实际上并没有被删除，它们只是“未设置”。请参阅我在上面代码中的评论以了解这意味着什么。
如何使用命令行可执行文件true在 bash 中进行无限循环：https ://www.cyberciti.biz/faq/bash-infinite-loop/

score 3 · Accepted Answer

我需要这个，但目标进程不是当前 shell 的子进程，在这种情况下wait $PID不起作用。我确实找到了以下替代方法：

while [ -e /proc/$PID ]; do sleep 0.1 ; done

这取决于procfs的存在，它可能不可用（例如，Mac 不提供它）。因此，为了便携性，您可以改用它：

while ps -p $PID >/dev/null ; do sleep 0.1 ; done

score 3 · Accepted Answer

陷阱是你的朋友。您可以在许多系统中捕获 ERR。您可以捕获 EXIT，或在 DEBUG 上执行每个命令后的一段代码。

这是除所有标准信号之外的。

编辑

这是错误帐户的意外登录，因此我没有看到示例请求。

在这里尝试，在我的常规帐户上。

处理 bash 脚本中的异常

score 3 · Accepted Answer

set -e
fail () {
    touch .failure
}
expect () {
    wait
    if [ -f .failure ]; then
        rm -f .failure
        exit 1
    fi
}

sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect

set -e顶部使您的脚本在失败时停止。

expect1如果任何子作业失败，将返回。

score 2 · Accepted Answer

我最近使用了这个（感谢 Alnitak）：

#!/bin/bash
# activate child monitoring
set -o monitor

# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!

# count, and kill when all done
c=0
function kill_on_count() {
    # you could kill on whatever criterion you wish for
    # I just counted to simulate bash's wait with no args
    [ $c -eq 9 ] && kill $pid
    c=$((c+1))
    echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD

function save_status() {
    local i=$1;
    local rc=$2;
    # do whatever, and here you know which one stopped
    # but remember, you're called from a subshell
    # so vars have their values at fork time
}

# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
    (doCalculations $i; save_status $i $?) &
done

# wait for locking subprocess to be killed
wait $pid
echo

从那里可以轻松推断，并触发（触摸文件，发送信号）并更改计数标准（计数触摸的文件或其他）以响应该触发。或者，如果您只想要“任何”非零 rc，只需从 save_status 中终止锁。

score 2 · Accepted Answer

捕获 CHLD 信号可能不起作用，因为如果它们同时到达，您可能会丢失一些信号。

#!/bin/bash

trap 'rm -f $tmpfile' EXIT

tmpfile=$(mktemp)

doCalculations() {
    echo start job $i...
    sleep $((RANDOM % 5)) 
    echo ...end job $i
    exit $((RANDOM % 10))
}

number_of_jobs=10

for i in $( seq 1 $number_of_jobs )
do
    ( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done

wait 

i=0
while read res; do
    echo "$res"
    let i++
done < "$tmpfile"

echo $i jobs done !!!

score 2 · Accepted Answer

等待多个子进程并在其中任何一个以非零状态码退出时退出的解决方案是使用“wait -n”

#!/bin/bash
wait_for_pids()
{
    for (( i = 1; i <= $#; i++ )) do
        wait -n $@
        status=$?
        echo "received status: "$status
        if [ $status -ne 0 ] && [ $status -ne 127 ]; then
            exit 1
        fi
    done
}

sleep_for_10()
{
    sleep 10
    exit 10
}

sleep_for_20()
{
    sleep 20
}

sleep_for_10 &
pid1=$!

sleep_for_20 &
pid2=$!

wait_for_pids $pid2 $pid1

状态代码“127”用于不存在的进程，这意味着孩子可能已经退出。

score 2 · Accepted Answer

差点掉进了jobs -p用来收集PID的陷阱，如果孩子已经退出了就不行了，如下脚本所示。我选择的解决方案是简单地调用wait -nN 次，其中 N 是我拥有的孩子的数量，我碰巧确定地知道。

#!/usr/bin/env bash

sleeper() {
    echo "Sleeper $1"
    sleep $2
    echo "Exiting $1"
    return $3
}

start_sleepers() {
    sleeper 1 1 0 &
    sleeper 2 2 $1 &
    sleeper 3 5 0 &
    sleeper 4 6 0 &
    sleep 4
}

echo "Using jobs"
start_sleepers 1

pids=( $(jobs -p) )

echo "PIDS: ${pids[*]}"

for pid in "${pids[@]}"; do
    wait "$pid"
    echo "Exit code $?"
done

echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"

echo "Waiting for N processes"
start_sleepers 2

for ignored in $(seq 1 4); do
    wait -n
    echo "Exit code $?"
done

输出：

Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0

score 1 · Accepted Answer

可能存在在等待进程之前进程已完成的情况。如果我们触发等待一个已经完成的进程，它将触发一个错误，比如 pid is not a child of this shell。为避免此类情况，可以使用以下函数来判断流程是否完成：

isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
    echo "Process: $PID is still running"
    sleep 5
done
echo "Process $PID has finished"
}

score 0 · Accepted Answer

我认为并行运行作业并检查状态的最直接方法是使用临时文件。已经有几个类似的答案（例如 Nietzche-jou 和 mug896）。

#!/bin/bash
rm -f fail
for i in `seq 0 9`; do
  doCalculations $i || touch fail &
done
wait 
! [ -f fail ]

上面的代码不是线程安全的。如果您担心上面的代码会与其自身同时运行，最好使用更独特的文件名，例如 fail.$$。最后一行是为了满足要求：“当任何子进程以代码结束时返回退出代码 1 ！= 0？” 我在那里提出了一个额外的要求来清理。像这样写可能更清楚：

#!/bin/bash
trap 'rm -f fail.$$' EXIT
for i in `seq 0 9`; do
  doCalculations $i || touch fail.$$ &
done
wait 
! [ -f fail.$$ ]

这是从多个作业收集结果的类似片段：我创建了一个临时目录，将所有子任务的输出记录在一个单独的文件中，然后将它们转储以供审查。这与问题不符-我将其作为奖励：

#!/bin/bash
trap 'rm -fr $WORK' EXIT

WORK=/tmp/$$.work
mkdir -p $WORK
cd $WORK

for i in `seq 0 9`; do
  doCalculations $i >$i.result &
done
wait 
grep $ *  # display the results with filenames and contents

score 0 · Accepted Answer

我也遇到过类似的情况，但是循环子shell 存在各种问题，确保这里的其他解决方案不起作用，所以我让循环编写我要运行的脚本，最后等待。有效地：

#!/bin/bash
echo > tmpscript.sh
for i in `seq 0 9`; do
    echo "doCalculations $i &" >> tmpscript.sh
done
echo "wait" >> tmpscript.sh
chmod u+x tmpscript.sh
./tmpscript.sh

愚蠢，但简单，并在事后帮助调试了一些东西。

如果我有时间，我会更深入地了解 GNU并行，但我自己的“doCalculations”过程很难。

score 0 · Accepted Answer

从Bash 5.1开始，由于引入了以下功能，有一种很好的新方法可以等待和处理多个后台作业的结果wait -p：

#!/usr/bin/env bash

# Spawn background jobs
for ((i=0; i < 10; i++)); do
    secs=$((RANDOM % 10)); code=$((RANDOM % 256))
    (sleep ${secs}; exit ${code}) &
    echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done

# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
    wait -n -p pid; code=$?
    [[ -z "${pid}" ]] && break
    echo "Background job ${pid} finished with code ${code}"
    (( ${code} != 0 )) && result=1
done

# Return overall result
exit ${result}

score -1 · Accepted Answer

我在想可能doCalculations; echo "$?" >>/tmp/acc在发送到后台的子shellwait中运行，然后, then/tmp/acc将包含退出状态，每行一个。不过，我不知道附加到累加器文件的多个进程的任何后果。

以下是此建议的试用版：

文件：做计算

#!/bin/sh

random -e 20
sleep $?
random -e 10

档案：试试

#!/bin/sh

rm /tmp/acc

for i in $( seq 0 20 ) 
do
        ( ./doCalculations "$i"; echo "$?" >>/tmp/acc ) &
done

wait

cat /tmp/acc | fmt
rm /tmp/acc

运行输出./try

5 1 9 6 8 1 2 0 9 6 5 9 6 0 0 4 9 5 5 9 8

bash - 如何在 bash 中等待几个子进程完成，并在任何子进程以代码结束时返回退出代码！=0 ！= 0？

34 回答 34

用法

参考

完整的、可运行的程序：等待所有进程结束

更进一步：确定每个单独流程何时结束

其他参考：

编辑

Related

Reference