awk - awk 和多行匹配（子正则表达式）

Question

我正在尝试使用 awk 来解析多行表达式。其中一个看起来像这样：

_begin  hello world !
_attrib0    123
_attrib1    super duper
_attrib1    yet another value
_attrib2    foo
_end

我需要提取与_begin 和_attrib1 关联的值。因此，在示例中，awk 脚本应返回（每行一个）：

hello world ! super duper yet another value

使用的分隔符是制表符 (\t) 字符。空格仅在字符串中使用。

score 8 · Accepted Answer

以下 awk 脚本完成了这项工作：

#!/usr/bin/awk -f
BEGIN { FS="\t"; }
/^_begin/      { output=$2; }
$1=="_attrib1" { output=output " " $2; }
/^_end/        { print output; }

您没有指定是否要将制表符 ( \t) 作为输出字段分隔符。如果你这样做，请告诉我，我会更新答案。（或者你可以；这很简单。）

当然，如果你想要一个可怕的选择（因为我们已经接近万圣节了），这里有一个解决方案sed：

$ sed -ne '/^_begin./{s///;h;};/^_attrib1[^0-9]/{s///;H;x;s/\n/ /;x;};/^_end/{;g;p;}' input.txt 
hello world ! super duper yet another value

这是如何运作的？Mwaahahaa，我很高兴你问。

/^_begin./{s///;h;};-- 当我们看到_begin时，将其剥离并将该行的其余部分存储到 sed 的“保持缓冲区”。
/^_attrib1[^0-9]/{s///;H;x;s/\n/ /;x;};-- 当我们看到_attrib1时，将其剥离，将其附加到保持缓冲区，交换保持缓冲区和模式空间，用空格替换换行符，然后再次交换保持缓冲区和模式空间。
/^_end/{;g;p;}-- 我们已经到了尽头，所以将保持缓冲区拉入模式空间并打印它。

这假设您的输入字段分隔符只是一个选项卡。

很简单。谁曾说sed是奥术？！

score 1 · Accepted Answer

这应该有效：

#!/bin/bash 

awk 'BEGIN {FS="\t"} {if ($1=="_begin" || $1=="_attrib1") { output=output " " $2 }} END{print output}'

2 回答 2