1

我有一个日志文件,其中包含以下日志语句

例如

Before starting transaction id = <unique number>
After starting transaction id = <unique number>

....

Before starting transaction id = <unique number>
After starting transaction id = <unique number>

当我为“之前”执行简单的 grep 时,我看到 400 条语句,但是当我为“之后”执行简单的 grep 时,我看到 402 条语句。

如何找到这些语句不成对出现的地方。

4

3 回答 3

2

提取BeforeAfterid,然后对它们进行比较,如下所示:

$ diff -wb <(grep Before file | cut -d= -f2 | sort) <(grep After file | cut -d= -f2 | sort)

如果您的 shell 不支持进程替换ie <(...),则使用临时文件:

$ grep Before file | cut -d= -f2 | sort > before
$ grep After file | cut -d= -f2 | sort > after
$ diff -wb before after
于 2013-02-15T09:34:32.507 回答
2

如果前后配对应该相同unique number

awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file

将打印那些未配对的 id。

例如:

kent$  cat file
Before starting transaction id = 1
After starting transaction id = 1
Before starting transaction id = 2
After starting transaction id = 2
Before starting transaction id = 3
Before starting transaction id = 4
After starting transaction id = 4
After starting transaction id = 5

kent$  awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file
id: 3
id: 5
于 2013-02-15T09:42:53.787 回答
1

grep也不是最好的工作,因为它不能读取多行。您可以使用 -B1 成对阅读它们,但您仍然需要使用更强大的工具(例如或其他工具sed)来解析它们。awk

这是另一种方法,以防万一您在行前遇到外来线(在echo那里只是为了让您可以空运行它):

$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 54675
Before starting transaction id = 567
After starting transaction id = 567' | 
  sort -k6 | uniq -u -f5 # end cmd
After starting transaction id = 54675

它通过仅检查唯一 ID 来工作。由于我不知道你在那里获得什么样的内容,也许它们是现有条目的重复,在这种情况下你必须以不同的方式做。这是更安全的方法,它捕获这两种情况并返回 ID 频率大于或小于 2 的事件:

$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 567
Before starting transaction id = 567
After starting transaction id = 567' | 
  sort -k6 | uniq -c -f5 | grep -v "^[[:space:]]*2[[:space:]]"
3 After starting transaction id = 567
于 2013-02-15T09:45:55.397 回答