shell - 从日志生成摘要报告：对命令的输出进行添加（使用 AWK/SED 或任何其他方式）并格式化输出

Question

我一次处理多个文件。每个文件都有摘要统计信息。在流程结束时，我想创建一个汇总文件，将所有统计信息加起来。我已经知道如何从日志文件中挖掘统计信息。但我希望能够添加数字并回显到另一个文件这是我用来挖掘时间的。

find . -iname "$srch1*" -exec grep "It took" {} \; -print

输出是这样的

    It took 0 hours, 11 minutes and 4 seconds to process that file.
./filepart000010-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 56 seconds to process that file.
./filepart000007-20140204-154923.dat.gz.log
It took 0 hours, 29 minutes and 54 seconds to process that file.
./filepart000001-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 33 seconds to process that file.
./filepart000004-20140204-154923.dat.gz.log
It took 0 hours, 59 minutes and 38 seconds to process that file.
./filepart000000-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 50 seconds to process that file.
./filepart000005-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 10 seconds to process that file.
./filepart000002-20140204-154923.dat.gz.log
It took 0 hours, 10 minutes and 39 seconds to process that file.
./filepart000008-20140204-154923.dat.gz.log
It took 0 hours, 12 minutes and 27 seconds to process that file.
./filepart000009-20140204-154923.dat.gz.log
It took 0 hours, 22 minutes and 36 seconds to process that file.
./filepart000003-20140204-154923.dat.gz.log
It took 0 hours, 11 minutes and 40 seconds to process that file.
./filepart000006-20140204-154923.dat.gz.log

我想要的是这样的

Summary 
filepart000006-20140204-154923.dat.gz.log  0 hours, 11 minutes and 40 seconds

然后找出其中最长的时间并输出一些消息，例如。

 Total time taken =____________

我是并行运行的，所以花费的时间是最长的。

然后做一些这样的计算。

find . -iname "$srch*" -exec grep "Processed Files" {} \; -print

        Processed Files:   7936635
./filename-20131102-part000000-20140204-153310.dat.gz.log
        Processed Files:   3264805
./filename-20131102-part000001-20140204-153310.dat.gz.log
        Processed Files:   1607547
./filename-20131102-part000008-20140204-153310.dat.gz.log
        Processed Files:   3180478
./filename-20131102-part000003-20140204-153310.dat.gz.log
        Processed Files:   1595497
./filename-20131102-part000007-20140204-153310.dat.gz.log
        Processed Files:   1568532
./filename-20131102-part000009-20140204-153310.dat.gz.log
        Processed Files:   3259884
./filename-20131102-part000002-20140204-153310.dat.gz.log
        Processed Files:   3141542
./filename-20131102-part000004-20140204-153310.dat.gz.log
        Processed Files:   3124221
./filename-20131102-part000005-20140204-153310.dat.gz.log
        Processed Files:   3136845
./filename-20131102-part000006-20140204-153310.dat.gz.log

如果我只想要指标

( find . -iname "dl-aster-full-20131102*" -exec grep "Processed Files" {} \;) | cut -d":" -f2
   7936635
   3264805
   1607547
   3180478
   1595497
   1568532
   3259884
   3141542
   3124221
   3136845

基于以上 2 只创建一个摘要文件。

Filename                                                  Processed files 
filename-20131102-part000000-20140204-153310.dat.gz.log   7936635

....然后是以上所有内容的摘要。

   ( 7936635 +
   3264805 +
   1607547 +
   3180478.....etc
   1595497
   1568532
   3259884
   3141542
   3124221
   3136845 ) as 


 Total Files = ____________

所以总体喜欢这个。

Filename                                                  Processed files 
    filename-20131102-part000000-20140204-153310.dat.gz.log   7936635
     Total Files = ____________ ( sum of all above )

需要做的就是——获取格式的输出

 Filename                                                  Processed files 
    filename-20131102-part000000-20140204-153310.dat.gz.log   7936635

在我上面的命令中，它们位于不同的行，然后对已经输出的数字进行求和。

我的问题是。- 我怎样才能像上面那样执行加法 - 使用任何东西。我会避免使用 PERL，因为我不确定，它会安装在运行 shell 的任何地方——我怎样才能像上面那样格式化输出。我已经知道如何提取输出

score 2 · Accepted Answer

使用下面的 sed 命令，您可以获得输出（文件名和 grep 结果放在一行中），然后下一个对您来说很容易。 （每个文件的 grep 结果应该只有一行）

find . -iname "$srch1*" -exec grep "It took" {} \; -print |sed -r 'N;s/(.*)\n(.*)/\2 \1/'

./filepart000010-20140204-154923.dat.gz.log    It took 0 hours, 11 minutes and 4 seconds to process that file.
./filepart000007-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 56 seconds to process that file.
./filepart000001-20140204-154923.dat.gz.log It took 0 hours, 29 minutes and 54 seconds to process that file.
./filepart000004-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 33 seconds to process that file.
./filepart000000-20140204-154923.dat.gz.log It took 0 hours, 59 minutes and 38 seconds to process that file.
./filepart000005-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 50 seconds to process that file.
./filepart000002-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 10 seconds to process that file.
./filepart000008-20140204-154923.dat.gz.log It took 0 hours, 10 minutes and 39 seconds to process that file.
./filepart000009-20140204-154923.dat.gz.log It took 0 hours, 12 minutes and 27 seconds to process that file.
./filepart000003-20140204-154923.dat.gz.log It took 0 hours, 22 minutes and 36 seconds to process that file.
./filepart000006-20140204-154923.dat.gz.log It took 0 hours, 11 minutes and 40 seconds to process that file.


find . -iname "$srch*" -exec grep "Processed Files" {} \; -print| sed -r 'N;s/(.*)\n(.*)/\2 \1/' 
./filename-20131102-part000000-20140204-153310.dat.gz.log         Processed Files:   7936635
./filename-20131102-part000001-20140204-153310.dat.gz.log         Processed Files:   3264805
./filename-20131102-part000008-20140204-153310.dat.gz.log         Processed Files:   1607547
./filename-20131102-part000003-20140204-153310.dat.gz.log         Processed Files:   3180478
./filename-20131102-part000007-20140204-153310.dat.gz.log         Processed Files:   1595497
./filename-20131102-part000009-20140204-153310.dat.gz.log         Processed Files:   1568532
./filename-20131102-part000002-20140204-153310.dat.gz.log         Processed Files:   3259884
./filename-20131102-part000004-20140204-153310.dat.gz.log         Processed Files:   3141542
./filename-20131102-part000005-20140204-153310.dat.gz.log         Processed Files:   3124221
./filename-20131102-part000006-20140204-153310.dat.gz.log         Processed Files:   3136845

如果您需要计算最长时间和总时间，请使用以下脚本（您应该可以格式化输出。）

find . -iname "$srch1*" -exec grep "It took" {} \; -print |sed -r 'N;s/(.*)\n(.*)/\2 \1/' > temp1
awk 'function s2t(x) { h=int(x/3600);m=int((x-h*3600)/60);s=x-h*3600-m*60}
{a=$4*3600+$6*60+$9;max=a>max?a:max;t+=a}
END{ s2t(max);print "max is",h,m,s;
s2t(t);print "sum is " ,h,m,s}' temp1

max is 0 59 38
sum is  3 46 27

第二个：

find . -iname "$srch*" -exec grep "Processed Files" {} \; -print| sed -r 'N;s/(.*)\n(.*)/\2 \1/'  > temp2
awk '{sum+=$NF}END{print "Total Files = ", sum}' temp2

Total Files =  31815986

shell - 从日志生成摘要报告：对命令的输出进行添加（使用 AWK/SED 或任何其他方式）并格式化输出

1 回答 1

Related

Reference