linux - 合并多个文件的更快方法

Question

我在 Linux 中有多个小文件（大约 70,000 个文件），我想在文件的每一行末尾添加一个单词，然后将它们全部合并到一个文件中。

我正在使用这个脚本：

for fn in *.sms.txt 
do 
    sed 's/$/'$fn'/' $fn >> sms.txt
    rm -f $fn
done

有没有更快的方法来做到这一点？

score 6 · Accepted Answer

我尝试使用这些文件：

for ((i=1;i<70000;++i)); do printf -v fn 'file%.5d.sms.txt' $i; echo -e "HAHA\nLOL\nBye" > "$fn"; done

我尝试了您的解决方案，该解决方案大约需要4 分钟（实际）来处理。您的解决方案的问题是您分叉了sed70000 次！并且分叉相当慢。

#!/bin/bash

filename="sms.txt"

# Create file "$filename" or empty it if it already existed
> "$filename"

# Start editing with ed, the standard text editor
ed -s "$filename" < <(
   # Go into insert mode:
   echo i
   # Loop through files
   for fn in *.sms.txt; do
      # Loop through lines of file "$fn"
      while read l; do
         # Insert line "$l" with "$fn" appended to
         echo "$l$fn"
      done < "$fn"
   done
   # Tell ed to quit insert mode (.), to save (w) and quit (q)
   echo -e ".\nwq"
)

这个解决方案花了大约。6秒。

不要忘记，ed是标准的文本编辑器，不要忽视它！如果你喜欢ed，你可能也会喜欢ex！

干杯!

score 2 · Accepted Answer

几乎与 gniourf_gniourf 的解决方案相同，但没有 ed：

for i in *.sms.txt 
do   
   while read line   
   do    
     echo $line $i
   done < $i
done >sms.txt

score 2 · Accepted Answer

什么，不爱了awk？

awk '{print $0" "FILENAME}' *.sms.txt >sms.txt

使用gawk，这在我机器上的 gniourf_gniourf样本上花费了1-2 秒（根据）。time

mawk比gawk这里快 0.2 秒。

score 1 · Accepted Answer

这个 perl 脚本在每一行的末尾添加了实际的文件名。

#!/usr/bin/perl
use strict;
while(<>){
    chomp;
    print $_, $ARGV, "\n";
}

像这样称呼它：

scriptname *.sms.txt > sms.txt

由于只有一个进程并且不涉及正则表达式处理，它应该非常快。

linux - 合并多个文件的更快方法

4 回答 4

Related

Reference