bash - 使用 bash 监视目录中的现有文件和新文件

Question

我有一个使用inotify-tool.
此脚本会在新文件到达文件夹时发出通知。它对文件执行一些工作，完成后将文件移动到另一个文件夹。（它看起来沿着这条线）：

inotifywait -m -e modify "${path}" |
    while read NEWFILE
       work on/with NEWFILE
       move NEWFILE no a new directory
    done

通过使用inotifywait，只能监视新文件。使用for OLDFILE in path代替的类似过程inotifywait将适用于现有文件：

for OLDFILE in ${path} 
do 
   work on/with OLDFILE 
   move NEWFILE no a new directory
done

我尝试将这两个循环结合起来。通过首先运行第二个循环。但是，如果文件快速到达并且数量众多，则在第二个循环运行时文件将到达会发生变化。然后，两个循环都不会捕获这些文件。

鉴于文件夹中已经存在文件，并且新文件将很快到达文件夹中，如何确保脚本能够捕获所有文件？

score 0 · Accepted Answer

通过使用 inotifywait，只能监控新文件。

我会要求定义“新文件”。man inotifywait指定了一个事件列表，其中还列出了诸如create和delete和之类的事件，delete_self并且 inotifywait 还可以监视“旧文件”（被定义为在 inotifywait 执行之前存在的文件）和目录。您只指定了一个事件-e modify来通知 ${path} 中文件的修改，它包括对预先存在的文件和在inotify执行后创建的文件的修改。

...如何确保脚本将捕获所有文件？

您的脚本足以捕获路径内发生的所有事件。如果您无法在生成文件的部分和接收文件的部分之间进行同步，那么您将无能为力，并且总是存在竞争条件。如果您的脚本收到 0% 的 CPU 时间，而生成文件的部分将获得 100% 的 CPU 时间怎么办？不能保证进程之间的 CPU 时间（除非使用经过认证的实时系统......）。在它们之间实现同步。

您可以观看其他活动。如果生成站点在准备好文件时关闭文件，请注意关闭事件。您还可以work on/with NEWFILE在后台并行运行以加快执行和读取新文件。但是，如果接收方比发送方慢，如果您的脚本在 NEWFILEs 上运行的速度比生成新文件的速度慢，那么您无能为力...

如果文件名中没有特殊字符和空格，我会选择：

inotifywait -m -e modify "${path}" |
while IFS=' ' read -r path event file ;do
    lock "${path}" 
    work on "${path}/${file}"
    ex. mv "${path}/${file}" ${new_location}
    unlock "${path}"
done

在你的脚本lock和unlock生成部分之间实现了一些锁定机制。您可以在文件创建进程和文件处理进程之间创建通信。

我认为您可以使用一些事务文件系统，它可以让您从其他脚本“锁定”一个目录，直到您准备好对其进行工作，但我在该领域没有经验。

我尝试将这两个循环结合起来。但是，如果文件快速到达并且数量众多，则在第二个循环运行时文件将到达会发生变化。

在运行 process_old_files_loop 之前，在后台运行 process_new_file_loop。在继续处理现有文件循环之前，最好确保（即同步）inotifywait 已成功启动，这样它们之间也没有竞争条件。

也许一个简单的例子和/或起点是：

work() {
    local file="$1"
    some work "$file"
    mv "$file" "$predefiend_path"
}

process_new_files_loop() {
    # let's work on modified files in parallel, so that it is faster

    trap 'wait' INT
    inotifywait -m -e modify "${path}" |
    while IFS=' ' read -r path event file ;do
        work "${path}/${file}" &
    done
}

process_old_files_loop() {
    # maybe we should parse in parallel here too?
    # maybe export -f work; find "${path} -type f | xargs -P0 -n1 -- bash -c 'work $1' -- ?

    find "${path}" -type f |
    while IFS= read -r file; do
        work "${file}"
    done
}

process_new_files_loop &
child=$!

sleep 1

if ! ps -p "$child" >/dev/null 2>&1; then
    echo "ERROR running processing-new-file-loop" >&2
    exit 1
fi
process_old_files_loop
wait # wait for process_new_file_loop

如果您真的关心执行速度并希望更快地完成，请更改为 python 或 C（或除 shell 之外的任何内容）。bash 并不快，它是一个 shell，应该用于互连两个进程（将一个的 stdout 传递给另一个的 stdin）并且逐行解析流while IFS= read -r line在 bash 中非常慢，通常应该作为最后的手段使用。也许使用xargslikexargs -P0 -n1 sh -c "work on $1; mv $1 $path" --或parallel将是加快速度的一种手段，但普通的 python 或 C 程序可能会快 n 倍。

score 0 · Accepted Answer

一旦inotifywait启动并等待，它会将消息打印Watches established.到标准错误。因此，在那之后您需要浏览现有文件。

因此，一种方法是编写将处理标准错误的内容，并在看到该消息时列出所有现有文件。为方便起见，您可以将该功能包装在一个函数中：

function list-existing-and-follow-modify() {
  local path="$1"
  inotifywait --monitor \
              --event modify \
              --format %f \
              -- \
              "$path" \
    2> >( while IFS= read -r line ; do
            printf '%s\n' "$line" >&2
            if [[ "$line" = 'Watches established.' ]] ; then
              for file in "$path"/* ; do
                if [[ -e "$file" ]] ; then
                  basename "$file"
                fi
              done
              break
            fi
          done
          cat >&2
        )
}

然后写：

list-existing-and-follow-modify "$path" \
| while IFS= read -r file
    # ... work on/with "$file"
    # move "$file" to a new directory
  done

笔记：

如果你不熟悉>(...)我使用的符号，它被称为“进程替换”；有关详细信息，请参阅https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution。
上面的竞争条件现在与你原来的竞争条件相反：如果一个文件是在inotifywait启动后不久创建的，那么list-existing-and-follow-modify可能会列出两次。但是您可以在您的while-loop中轻松地处理它，方法if [[ -e "$file" ]]是在操作之前确保文件仍然存在。
我有点怀疑您的inotifywait选择是否真的是您想要的。modify，特别是，似乎是错误的事件。但我相信你可以根据需要调整它们。我在上面所做的唯一更改，除了为了清晰/明确切换到长选项并添加--健壮性之外，是添加--format %f以便您获得没有无关细节的文件名。
似乎没有任何方法可以告诉inotifywait使用换行符以外的分隔符，所以，我只是顺其自然。确保避免使用包含换行符的文件名。

score 0 · Accepted Answer

一个更简单的解决方案是在子shell 中的inotifywait 前面添加一个ls，使用awk 创建看起来像inotifywait 的输出。

我用它来检测和处理现有的和新的文件：

(ls ${path} | awk '{print "'${path}' EXISTS "$1}' && inotifywait -m ${path} -e close_write -e moved_to) |
  while read dir action file; do
    echo $action $dir $file
    # DO MY PROCESSING
  done

所以它运行 ls，格式化输出并将其发送到 stdout，然后在同一个子 shell 中运行 inotifywait，将输出也发送到 stdout 进行处理。

bash - 使用 bash 监视目录中的现有文件和新文件

3 回答 3

Related

Reference