regex - 从日志文件中获取范围模式中的特定行

Question

我目前正在尝试创建一个自动化过程来动态解析选择的特别大的日志文件（25MB+），并通过 Java Servlet 将它们返回给用户。

由于这些日志的大小，我正在尝试执行 Linux 解析命令来检索与用户相关的部分，然后再将它们加载到内存中。这些部分可以分布在整个日志中。

我仍处于掌握正则表达式和文本解析工具（如 sed）的早期阶段，我希望有人能指出我当前问题的正确方向。

我有一系列日志，这些日志引用了一行中的特定项目（例如 KEY1），然后是关于该项目的未知数量的信息行。

然后日志将切换到下一项并重复

我正在尝试确定是否存在基于 linux 的文本命令的任何组合，可以采用格式的文件

This is the first line and should not display
This is a section containing the text KEY1
Line 1
Line 2
Line 3
Line 4
This is a section containing the text KEY2
BadLine 1
BadLine 2
This is a second section containing the text KEY1
Line 5
Line 6
This is a section containing the text KEY3
BadLine 3
BadLine 4
BadLine 5
BadLine 6
This is a third section containing the text KEY1
Line 7
Line 8
Line 9
This is the last line

并返回：

This is a section containing the text KEY1
Line 1
Line 2
Line 3
Line 4
This is a second section containing the text KEY1
Line 5
Line 6
This is a third section containing the text KEY1
Line 7
Line 8
Line 9
This is the last line

命令

sed -n '/KEY1/,/KEY2/p' file

完成了第一部分的工作，但我很难找到一种通用的方法来提取我需要的一切。

任何帮助，将不胜感激。

谢谢

- 编辑 -

2013/06/20 03:10:01 PM| FINE |S9180 |[Device] [ID:128] 
foo
bar
foo
bar
------------------------------------------
foo
bar
------------------------------------------
2013/06/20 03:10:02 PM| FINE |S9180 |[Device] [ID:132] 
Other foo
Other bar
------------------------------------------
Other foo
Other bar
Other foo
Other bar
------------------------------------------
2013/06/20 03:10:03 PM| FINE |S9180 |[Device] [ID:128] 
foo
bar
------------------------------------------
foo
bar
foo
bar
------------------------------------------
foo
bar

为澄清起见，这是我正在使用的格式。我正在尝试在日志中获取特定设备的所有信息。例如，键[ID:128]下的所有文本，但忽略[ID:132]下的部分（或 ID:128以外的任何其他 id ，因为没有特定的设备进入顺序）

score 2 · Accepted Answer

GNU sed的代码，经过一些编辑：

sed -rn '/\[ID:[0-9]+\]/{/\[ID:128\]/!{s/.*\B(\[ID:[0-9]+\])\B.*/\1/;H}};${x;s/\n//;s/\]\n\[/\\]|\\[/g;s@(.*)]@/\\[ID:128\\]/,/\\\1\\]/\{/\\\1\\]/!p\}@p}' file|sed -nrf - file

$猫文件
2013/06/20 下午 3:10:01| FINE |S9180 |[设备] [ID:128]
富
酒吧
富
酒吧
------------------------------------------
富
酒吧
------------------------------------------
2013/06/20 下午 3:10:02| FINE |S9180 |[设备] [ID:132]
其他富
其他酒吧
------------------------------------------
其他富
其他酒吧
其他富
其他酒吧
------------------------------------------
2013/06/20 下午 3:10:03| FINE |S9180 |[设备] [ID:128]
富
酒吧
------------------------------------------
富
酒吧
富
酒吧
------------------------------------------
富
酒吧
2013/06/20 下午 3:10:02| FINE |S9180 |[设备] [ID:32]
其他富
其他酒吧
------------------------------------------
其他富
其他酒吧
其他富
其他酒吧
------------------------------------------
2013/06/20 下午 3:10:03| FINE |S9180 |[设备] [ID:128]
富
酒吧
------------------------------------------
富
酒吧
富
酒吧
------------------------------------------
富
酒吧
2013/06/20 下午 3:10:02| FINE |S9180 |[设备] [ID:132]
其他富
其他酒吧
------------------------------------------
其他富
其他酒吧
其他富
其他酒吧
------------------------------------------
2013/06/20 下午 3:10:03| FINE |S9180 |[设备] [ID:17]
富
酒吧
------------------------------------------
富
酒吧
富
酒吧
------------------------------------------
富
酒吧

$sed -rn "/\[ID:[0-9]+\]/{/\[ID:128\]/!{s/.*\B(\[ID:[0-9]+\] )\B.*/\1/;H}};${x;s/\n//;s/\]\n\[/\\]|\\[/g;s@(.*) ]@/\\[ID:128\\]/,/\\\1\\]/\{/\\\1\\]/!p\}@p}" 文件|sed -nrf - 文件
2013/06/20 下午 3:10:01| FINE |S9180 |[设备] [ID:128]
富
酒吧
富
酒吧
------------------------------------------
富
酒吧
------------------------------------------
2013/06/20 下午 3:10:03| FINE |S9180 |[设备] [ID:128]
富
酒吧
------------------------------------------
富
酒吧
富
酒吧
------------------------------------------
富
酒吧
2013/06/20 下午 3:10:03| FINE |S9180 |[设备] [ID:128]
富
酒吧
------------------------------------------
富
酒吧
富
酒吧
------------------------------------------
富
酒吧

第一个sed调用“收集”所有具有正则表达式模式的键，/\[ID:[0-9]+\]/除了[ID:128]. 第二个调用使用收集的密钥过滤不需要的部分。

score 0 · Accepted Answer

我认为更通用的方法是：

perl -ne 'print if /KEY1/../KEY(?!1)/' input.txt | perl -ne 'print unless /KEY(?!1)/'

和

perl -ne 'print if /ID:128/../ID:(?!128)/' file.txt | perl -ne 'print unless /ID:(?!128)/'

这里有一些重要的概念：

KEY(?!1) 表示“KEY 后面不跟 1”
“perl -ne”表示“默认禁用打印”
因此，仅当文本匹配模式“带有 KEY1 的行，任意数量的行，带有 KEY 的行不跟随 1”时才启用打印
第二个 perl 调用删除了带有 KEY2 和 KEY3 的行，否则它们将被打印

我想有更好的方法来删除 KEY2 和 KEY3 行，但我不知道如何做到这一点：一些 perl 大师可以帮助你更多！

regex - 从日志文件中获取范围模式中的特定行

2 回答 2

Related

Reference