bash - 解析日志文件的最佳方式

Question

我有一个看起来像这样的日志文件：

Client connected with ID 8127641241
< multiple lines of unimportant log here>
Client not responding
Total duration: 154.23583
Sent: 14
Received: 9732
Client lost

Client connected with ID 2521598735
< multiple lines of unimportant log here>
Client not responding
Total duration: 12.33792
Sent: 2874
Received: 1244
Client lost

该日志包含许多以 . 开头Client connected with ID 1234和结尾的块Client lost。他们永远不会混淆（一次只有一个客户）。

我将如何解析此文件并生成如下统计信息：

在此处输入图像描述

我主要是询问解析过程，而不是格式。

我想我可以遍历所有行，在找到Client connected一行时设置一个标志并将 ID 保存在一个变量中。然后 grep 行，保存值，直到找到该Client lost行。这是一个好方法吗？有更好的吗？

score 3 · Accepted Answer

这是使用的快速方法awk：

awk 'BEGIN { print "ID Duration Sent Received" } /^(Client connected|Total duration:|Sent:)/ { printf "%s ", $NF } /^Received:/ { print $NF }' file | column -t

结果：

ID          Duration   Sent  Received
8127641241  154.23583  14    9732
2521598735  12.33792   2874  1244

score 2 · Accepted Answer

如果您确定日志文件不会有错误，并且字段始终按相同的顺序排列，则可以使用以下内容：

#!/bin/bash

ids=()
declare -a duration
declare -a sent
declare -a received
while read _ _ _ _ id; do
   ids+=( "$id" )
   read _ _ duration[$id]
   read _ sent[$id]
   read _ received[$id]
done < <(grep '\(^Client connected with ID\|^Total duration:\|^Sent:\|Received:\)' logfile)

# printing the data out, for control purposes only
for id in "${ids[@]}"; do
   printf "ID=%s\n\tDuration=%s\n\tSent=%s\n\tReceived=%s\n" "$id" "${duration[$id]}" "${sent[$id]}" "${received[$id]}"
done

输出是：

$ ./parsefile
ID=8127641241
    Duration=154.23583
    Sent=14
    Received=9732
ID=2521598735
    Duration=12.33792
    Sent=2874
    Received=1244

但数据存储在相应的关联数组中。这是相当有效的。在另一种编程语言（例如 perl）中它可能会稍微高效一些，但是由于您只用和标记了您的帖子，bash我想我完全回答了您的问题。sedgrep

解释：grep只过滤我们感兴趣的行，而 bash 只读取我们感兴趣的字段，假设它们总是以相同的顺序出现。该脚本应该易于理解并根据您的需要进行修改。

score 2 · Accepted Answer

awk：

awk 'BEGIN{print "ID Duration Sent Received"}/with ID/&&!f{f=1}f&&/Client lost/{print a[1],a[2],a[3],a[4];f=0}f{for(i=1;i<=NF;i++){
        if($i=="ID")a[1]=$(i+1)
        if($i=="duration:")a[2]=$(i+1)
        if($i=="Sent:")a[3]=$(i+1)
        if($i=="Received:")a[4]=$(i+1)
}}'log

如果你的数据块之间总是有一个空行，上面的 awk 脚本可以简化为：

 awk -vRS="" 'BEGIN{print "ID Duration Sent Received"}
{for(i=1;i<=NF;i++){
        if($i=="ID")a[1]=$(i+1)
        if($i=="duration:")a[2]=$(i+1)
        if($i=="Sent:")a[3]=$(i+1)
        if($i=="Received:")a[4]=$(i+1)
}print a[1],a[2],a[3],a[4];}' log

输出：

ID Duration Sent Received
8127641241 154.23583 14 9732
2521598735 12.33792 2874 1244

如果您想获得更好的格式，请将输出通过管道传输到|column -t

你得到：

ID          Duration   Sent  Received
8127641241  154.23583  14    9732
2521598735  12.33792   2874  1244

score 2 · Accepted Answer

中的一个解决方案perl

#!/usr/bin/perl

use warnings;
use strict;

print "\tID\tDuration\tSent\tReceived\n";

while (<>) {
  chomp;
  if (/Client connected with ID (\d+)/) {
    print "$1\t";
  }
  if (/Total duration: ([\d\.]+)/) {
    print "$1\t";
  }
  if (/Sent: (\d+)/) {
    print "$1\t";
  }
  if (/Received: (\d+)/) {
    print "$1\n";
  }
}

样本输出：

        ID  Duration    Sent    Received
8127641241  154.23583   14  9732
2521598735  12.33792    2874    1244

score 1 · Accepted Answer

使用段落模式啜饮文件

使用 Perl 或 AWK，您可以使用一种特殊的段落模式来插入记录，该模式使用记录之间的空白行作为分隔符。在 Perl 中，用于-00使用段落模式；在 AWK 中，您将RS变量设置为空字符串（例如""）以执行相同的操作。然后您可以解析每条记录中的字段。

使用面向行的语句

或者，您可以使用 shell while 循环一次读取每一行，然后使用 grep 或 sed 解析每一行。您甚至可以使用 case 语句，具体取决于解析的复杂性。

例如，假设您的记录中始终有 5 个匹配字段，您可以执行以下操作：

while read; do
    grep -Eo '[[:digit:]]+'
done < /tmp/foo | xargs -n5 | sed 's/ /\t/g'

循环将产生：

23583   14  9732    2521598735  33792
2874    1244    8127641241  23583   14
9732    2521598735  33792   2874    1244

您当然可以使用格式，添加标题行等等。关键是你必须知道你的数据。

AWK、Perl 甚至 Ruby 是解析面向记录格式的更好选择，但如果您的需求是基本的，shell 肯定是一个选择。

score 0 · Accepted Answer

Perl 的一个简短片段：

perl -ne '
    BEGIN {print "ID Duration Sent Received\n";}
    print "$1 " if /(?:ID|duration:|Sent:|Received:) (.+)$/;
    print "\n" if /^Client lost/;
' filename | column -t

score 0 · Accepted Answer

awk -v RS= -F'\n' '
BEGIN{ printf "%15s%15s%15s%15s\n","ID","Duration","Sent","Received" }
{
   for (i=1;i<=NF;i++) {
      n = split($i,f,/ /)    
      if ( $i ~ /^(Client connected|Total duration:|Sent:|Received:)/ ) {
         printf "%15s",f[n]
      }
   }
   print ""
}'

bash - 解析日志文件的最佳方式

7 回答 7

使用段落模式啜饮文件

使用面向行的语句

Related

Reference