regex - 在 bash 中操作多行部分

Question

我正在寻找从文本文件中提取然后添加到一个部分中的项目列表。Sed 和 grep 几乎可以工作，但它们需要大量的黑客攻击。是否有另一个实用程序可以使这更容易，也许是 awk？

一是提取。我想要一个“section [”和“]”之间所有项目的列表，但是第一个模式中可能有空格/换行符，这使得后面的查找变得困难。换行符是一个很好的列表分隔符，所以我只想要特定部分的“[”和“]”之间的所有字符（即“section []”而不是“wrongSection []”）

文本示例（每个文件只有 1 个部分）：

文件 1：

section []
wrongSection [foo]

输出 1 空

文件 2：

section [item1]
wrongSection [foo]

输出 2：

item1

文件 3：

section
[
    item1
    item2
]
wrongSection [foo]

输出 3：

item1
item2

grep 可以用来抓取，但不会忽略非抓取组

$ grep -Po "(?ims)^(?:\s*section\s*\n*\s*\[).*?(?:\])" file.txt

我的第二个问题是添加一个新项目（'itemX'）。Sed 讨厌多行，但如果我假设 [ 在 1 行内，则以下工作有效

$ sed '/^\s*section/N;/^\s*section\s*\n?\s*\[/a itemX' file.txt

总之，我正在尝试在输出中不想要的可能多行模式之间读取/添加多行。我最好放弃 bash 并使用 perl/groovy/python/等吗？

score 1 · Accepted Answer

使用非 gnu awk：

awk -v FS='[ \n]*[\\[\\]][ \n]*' '{gsub(/\n+ +/, "\n");
           for(i=1; i<=NF; i+=2) {if ($i=="section") print $(i+1)}}' RS= file

\n
item1\n
item1\n
item2\n

score 1 · Accepted Answer

试试这个，它适用于任何现代 awk：

$ cat file1
section []
wrongSection [foo]
$ 
$ awk -v RS=']' 'sub(/.*section[[:space:]]+\[*/,""){gsub(/^\n+|\n+$/,""); gsub(/[[:blank:]]/,""); print; exit}' file1


$ cat file2                                                                   
section [item1]
wrongSection [foo]
$ 
$ awk -v RS=']' 'sub(/.*section[[:space:]]+\[*/,""){gsub(/^\n+|\n+$/,""); gsub(/[[:blank:]]/,""); print; exit}' file2
item1

$ cat file3                                                                   
section
[
    item1
    item2
]
wrongSection [foo]
$ 
$ awk -v RS=']' 'sub(/.*section[[:space:]]+\[*/,""){gsub(/^\n+|\n+$/,""); gsub(/[[:blank:]]/,""); print; exit}' file3
item1
item2
$

regex - 在 bash 中操作多行部分

2 回答 2

Related

Reference