bash - bash：如何将一个文件的多个副本快速复制到另一个文件中？

Question

我需要通过输入逐渐变大的输入文件来对程序进行压力测试。我有一个输入文件inputSmall.txt，我想将N时间和cat这些副本复制到同一个文件中。N是大文件。如果有任何比以下简单循环更快的方法（例如N=1000）：

for i in {1..1000}
do 
    cat inputSmall.txt >> input1000.txt
done

我的机器有足够的磁盘空间来存储inputN.txt非常大N的 s 并且有很多 RAM，以防万一。

谢谢

score 0 · Accepted Answer

当你写

for i in {1..1000}

您告诉外壳程序首先将 1 到 1000 的所有数字写入命令缓冲区，然后遍历每个数字。对于大量数据，这不仅速度很慢，而且 bur 还增加了显着的内存需求（例如，参见unix.se 上的这篇文章）。

在 bash 中，您可以使用以下语法来避免这一切：

for ((i=1; i<=1000;i++))

作为奖励，这允许边界是变量。

score 0 · Accepted Answer

cat是一个外部命令，而不是外壳的一部分；像所有外部命令一样，启动它有很大的开销。同样，运行>>input1000.txt是一项相当昂贵的文件系统操作——查找与目录关联的 inode，打开它，然后（在离开范围时）刷新内容并关闭文件。

只做一次这些事情效率更高。

假设最后一行inputSmall.txt以换行符结尾，以下将正常工作，并且开销要少得多：

in=$(<inputSmall.txt)        # read the input file only once
exec 3>>input1000.txt        # open the output file only once

for ((i=0; i<1000; i++)); do
  printf '%s\n' "$in" >&3    # write the input from memory to the output fd
done
exec 3>&-                    # close the output fd

score 0 · Accepted Answer

这样，您可以更快地完成“指数”，但您需要为 tmp 文件留出一些额外的磁盘空间。

input=$1

new=${input}.new.txt
tmp=${input}.tmp.txt

cat ${input} > ${new}
cat "" > ${tmp}

# 2^10=1024
for ((i=0; i<10 ; i++))
do
        cat ${new} >> ${tmp}
        cat ${tmp} >> ${new}
done

rm ${tmp}

bash - bash：如何将一个文件的多个副本快速复制到另一个文件中？

3 回答 3

Related

Reference