regex - 找出一块文本，正则表达式

Question

给定一个大日志文件，grep 文本块的最佳方法是什么？

text to be ignored
more text to be ignored
---                                 <---- start capture here
lots of 
text with separators like "---"
---
spanning 
multiple lines
---                                 <---- end capture here
text to be ignored
more text to be ignored

什么是已知的？

行中的最大字符数（55 个但可能更少）
块中的行数
分隔符（可能会重复）

什么正则表达式会匹配这个块？期望的输出：文本块列表。

请假设 Linux 命令行环境

score 2 · Accepted Answer

几年前，我用它把补丁分成大块：

sed -e '$ {x;q}' -e '/@@/ !{H;d}' -e '/@@/ x' # note - i know sed better now

替换/@@/为/---/。

要删除 first'---'和 after last之前的所有内容，请'---'添加-e '1,/---/d'和删除整个-e '$ {x;q}'.

结果将是这样的：

sed -e '1,/---/d' -e '/---/ !{H;d}' -e x

刚刚对其进行了测试，它适用于给定的示例。

score 0 · Accepted Answer

把事情简单化：

$ awk 'NR==FNR {if (/^---/) { if (!start) start=NR; end=NR } next} FNR>=start && FNR<=end' file file
---                                 <---- start capture here
lots of
text with separators like "---"
---
spanning
multiple lines
---                                 <---- end capture here

$ awk 'NR==FNR {if (/^---/) { if (!start) start=NR; end=NR } next} FNR>start && FNR<end' file file
lots of
text with separators like "---"
---
spanning
multiple lines

score 0 · Accepted Answer

如果你有足够的内存，你可以使用以下行。但是请注意，它会将整个日志文件读入内存！

perl -0777 -lnE 'm{ ^--- .+ ^--- }xms and say $&' logfile

regex - 找出一块文本，正则表达式

3 回答 3

Related

Reference