bash - bash：使用文件列表限制for循环中的子shell

Question

我一直在尝试让一个 for 循环同时运行一堆命令，并试图通过子外壳来实现。我设法将下面的脚本拼凑在一起进行测试，它似乎工作正常。

#!/bin/bash
for i in {1..255}; do
  (
    #commands
  )&

done
wait

唯一的问题是我的实际循环将是 for i in files* 然后它就崩溃了，我认为是因为它启动了太多的子外壳来处理。所以我加了

#!/bin/bash
for i in files*; do
  (
    #commands
  )&
if (( $i % 10 == 0 )); then wait; fi
done
wait

现在失败了。有谁知道解决这个问题的方法？使用不同的命令来限制子shell 的数量或为$i 提供一个数字？

干杯

score 5 · Accepted Answer

xargs/并行

另一种解决方案是使用为并发设计的工具：

printf '%s\0' files* | xargs -0 -P6 -n1 yourScript

这-P6是xargs将启动的最大并发进程数。如果你愿意，让它成为 10。

我建议xargs是因为它可能已经在您的系统上。如果您想要一个真正强大的解决方案，请查看GNU Parallel。

数组中的文件名

对于您的问题的另一个明确答案：获取计数器作为数组索引？

files=( files* )
for i in "${!files[@]}"; do
    commands "${files[i]}" &
    (( i % 10 )) || wait
done

（复合命令周围的括号并不重要，因为后台作业将与使用子shell具有相同的效果。）

功能

只是语义不同：

simultaneous() {
    while [[ $1 ]]; do
        for i in {1..11}; do
            [[ ${@:i:1} ]] || break
            commands "${@:i:1}" &
        done
        shift 10 || shift "$#"
        wait
    done
}
simultaneous files*

score 4 · Accepted Answer

您会发现使用计算作业数量很有用jobs。例如：

wc -w <<<$(jobs -p)

因此，您的代码将如下所示：

#!/bin/bash
for i in files*; do
  (
    #commands
  )&
  if (( $(wc -w <<<$(jobs -p)) % 10 == 0 )); then wait; fi
done
wait

正如@chepner建议的那样：

在 bash 4.3 中，您可以在任何wait -n作业完成后立即继续，而不是等待所有作业

score 3 · Accepted Answer

明确定义计数器

#!/bin/bash
for f in files*; do
  (
    #commands
  )&
  (( i++ % 10 == 0 )) && wait
done
wait

无需初始化i，因为第一次使用时默认为 0。也无需重新设置该值，i %10i=10、20、30 等为 0。

score 2 · Accepted Answer

如果你有 Bash≥4.3，你可以使用wait -n：

#!/bin/bash

max_nb_jobs=10

for i in file*; do
    # Wait until there are less than max_nb_jobs jobs running
    while mapfile -t < <(jobs -pr) && ((${#MAPFILE[@]}>=max_nb_jobs)); do
        wait -n
    done
    {
        # Your commands here: no useless subshells! use grouping instead
    } &
done
wait

如果你没有wait -n可用的，你可以使用这样的东西：

#!/bin/bash

set -m

max_nb_jobs=10

sleep_jobs() {
   # This function sleeps until there are less than $1 jobs running
   local n=$1
   while mapfile -t < <(jobs -pr) && ((${#MAPFILE[@]}>=n)); do
      coproc read
      trap "echo >&${COPROC[1]}; trap '' SIGCHLD" SIGCHLD
      [[ $COPROC_PID ]] && wait $COPROC_PID
   done
}

for i in files*; do
    # Wait until there are less than 10 jobs running
    sleep_jobs "$max_nb_jobs"
    {
        # Your commands here: no useless subshells! use grouping instead
    } &
done
wait

像这样进行的好处是，我们不对完成工作所花费的时间做任何假设。新工作一有空间就启动。此外，它都是纯 Bash，因此不依赖外部工具，并且（可能更重要的是），您可以使用您的 Bash 环境（变量、函数等）而不导出它们（数组不能轻易导出，因此可以成为一个巨大的专业人士）。

bash - bash：使用文件列表限制for循环中的子shell

4 回答 4

xargs/并行

数组中的文件名

功能

Related

Reference