bash - 为什么这种非递归的“查找”比搜索“ls”的输出慢得多？

Question

我写了两个不同的脚本如下：

# Script 1:
current_date=$(date +"%m-%d-%Y")
logsPath="/hadoop_common/smallsite/realtime/current/spark/logs"

find $logsPath -maxdepth 1 -name "*$current_date*" -print > tmp

和

# Script 2:
current_date=$(date +"%m-%d-%Y")
logsPath="/hadoop_common/smallsite/realtime/current/spark/logs"

ls $logsPath | grep "$current_date" > tmp
sed -i "s|^|$logsPath|" tmp

第一个脚本需要 24 分钟才能列出 2367 个文件路径，第二个脚本需要 16 秒才能列出相同数量的文件路径。

为什么差别这么大？我在不同的日子、多次、以任何顺序运行它们，第一个脚本总是花费 20 分钟以上，第二个脚本花费不到 20 秒。

操作系统：红帽

更新 2

查找（GNU findutils）4.4.2

我运行了这个脚本

#!/bin/bash

current_date=$(date +"%m-%d-%Y")

logsPath="/hadoop_common/smallsite/realtime/current/spark/logs"

touch resultStat

date >> resultStat

ls $logsPath | grep "$current_date" > tmp1
sed -i "s|^|$logsPath|" tmp1

date >> resultStat

find $logsPath -maxdepth 1 -type f -name "*$current_date*" -print > tmp2

date >> resultStat

strace ls "$logsPath" 2>&1 | grep -c stat >> resultStat

date >> resultStat

strace find "$logsPath" -maxdepth 1 -name "*$current_date*" -print 2>&1 | grep -c stat >> resultStat

date >> resultStat

ls -1q $logsPath | wc -l >> resultStat

date >> resultStat

files=(/hadoop_common/smallsite/realtime/current/spark/logs/*"$(date +%m-%d-%Y)"* )

date >> resultStat

结果统计内容：

Thu Mar 29 19:14:28 UTC 2018
Thu Mar 29 19:14:47 UTC 2018
Thu Mar 29 19:41:26 UTC 2018
14
Thu Mar 29 19:41:44 UTC 2018
189805
Thu Mar 29 20:08:30 UTC 2018
190348
Thu Mar 29 20:08:48 UTC 2018
Thu Mar 29 20:09:06 UTC 2018

bash - 为什么这种非递归的“查找”比搜索“ls”的输出慢得多？

0 回答 0

Related

Reference