3

While my original problem was solved in a different manner (see comment thread under this question, as well as the edits to this question), I was able to create a stack/LIFO for GNU Parallel in Bash. So I will edited my background/question to reflect a situation where it could be needed.

Background

I am using GNU Parallel to process files with a Bash script. As the files are processed, more files are created and new commands need to be added to parallel's list. I am not able to give parallel a complete list of commands, as information is generated as the initial files are processed.

I need a way to add the lines to parallel's list while it is running.

Parallel will also need to wait for a new line if nothing is in the queue and exit once the queue is finished.

Solution

First I created a fifo:

mkfifo /tmp/fifo

Next I created a bash file that cat's the file and pipes the output to parallel, which checks for the end_of_file line. (I wrote this with help from the accepted answer as well as from here)

#!/bin/bash
while true;
do
cat /tmp/fifo
done | parallel --ungroup --gnu --eof "end_of_file" "{}"

Then I write to the pipe with this command, adding lines to parallel's queue:

echo "command here" > /tmp/fifo

With this setup, all new commands are added to the queue. Once the queue is full parallel will begin processing it. This means that if you have slots for 32 jobs (32 processors), then you will need to add 32 jobs in order to start the queue.

If parallel is occupying all of its processors, it will put the job on hold until a processor becomes available.

By using the --ungroup argument, parallel will process/output jobs as they are added to the queue once the queue is full.

Without the --ungroup argument, parallel waits until a new slot is needed to complete a job. From the accepted answer:

Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or -u, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.

4

1 回答 1

3

来自http://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-queue-system-batch-manager

使用 GNU 并行作为队列系统/批处理管理器时有一个小问题:您必须在作业开始之前提交 JobSlot 数量的作业,之后您可以一次提交一个,如果空闲槽可用,作业将立即开始. 正在运行或已完成的作业的输出会被保留,并且只会在 JobSlots 更多作业已启动时打印(除非您使用 --ungroup 或 -u,在这种情况下,作业的输出会立即打印)。例如,如果您有 10 个作业槽,则仅在作业 11 开始时打印第一个已完成作业的输出,而仅在作业 12 开始时打印第二个已完成作业的输出。

于 2015-08-25T19:17:44.423 回答