linux - 快速查找外壳

Question

我有一个简单的查找命令，需要遍历服务器上的数百万个文件并找到一些具有给定后缀的文件。随着时间的推移，这些文件被频繁地写入和删除。我只是想知道是否有一种方法可以更快地找到。使用 locate 是没有问题的，因为为 locate 制作数据库将非常昂贵。

find /myDirWithThausandsofDirectories/ -name *.suffix

在某些服务器上，此命令需要几天时间！

有什么想法吗？

谢谢，

score 3 · Accepted Answer

您可以使用审计子系统来监控文件的创建和删除。将此与初始运行相结合，find您可以创建一个可以实时更新的文件数据库。

score 3 · Accepted Answer

分而治之？假设 MP 操作系统和处理器find为每个子文件夹生成多个命令。

for dir in /myDirWithThausandsofDirectories/*
do find "$dir" -name "*.suffix" &
done

根据子目录的数量，您可能希望控制find在给定时间运行的进程（命令）数量。这会有点棘手，但可行（即使用 bash shell，用生成的进程的 pid 保存一个数组，$!并且只允许新进程，具体取决于数组的长度）。上面也没有搜索根目录下的文件，只是一个简单的例子。

如果您不知道如何完成流程管理，那么是时候学习了；）这是一篇非常好的关于该主题的文章。这才是你真正需要的。但是请阅读整个内容以了解它是如何工作的。

score 0 · Accepted Answer

由于您使用的是简单的 glob，因此您可以使用Bash 的递归 glob。例子：

shopt -s globstar
for path in /etc/**/**.conf
do
    echo "$path"
done

可能会更快，因为它使用的内部 shell 功能的灵活性比find.

如果你不能使用 Bash，但你对路径深度有限制，你可以明确列出不同的深度：

for path in /etc/*/*.conf /etc/*/*/*.conf /etc/*/*/*/*.conf
do
    echo "$path"
done

score 0 · Accepted Answer

这是代码：

find /myDirWithThausandsofDirectories/ -d type maxdepth 1 > /tmp/input
IFS=$'\n' read -r -d '' -a files < /tmp/input


do_it() {
   for f; do find $f  -name *.suffix | sed -e s/\.suffix//g ; done
}

# Divide the list into 5 sub-lists.
i=0 n=0 a=() b=() c=() d=() e=()
while ((i < ${#files[*]})); do
    a[n]=${files[i]}
    b[n]=${files[i+1]}
    c[n]=${files[i+2]}
    d[n]=${files[i+3]}
    e[n]=${files[i+4]}
    ((i+=5, n++))
done

# Process the sub-lists in parallel
do_it "${a[@]}" >> /tmp/f.unsorted 2>/tmp/f.err &
do_it "${b[@]}" >> /tmp/f.unsorted 2>/tmp/f.err &
do_it "${c[@]}" >> /tmp/f.unsorted 2>/tmp/f.err &
do_it "${d[@]}" >> /tmp/f.unsorted 2>/tmp/f.err &
do_it "${e[@]}" >> /tmp/f.unsorted 2>/tmp/f.err &
wait
echo Find is Done!

我遇到的唯一问题是一些文件名（非常小的百分比）部分存在。我不知道是什么原因！

linux - 快速查找外壳

4 回答 4

Related

Reference