awk - 删除模式后的所有连续重复行

Question

我有一个包含数据的文件

cell (HB)
input
input
input
Z
output
A
input
cell (BP)
input
input
Z1
output
A1
input

我想要输出为

cell (HB)
Z
output
A
input
cell (BP)
Z1
output
A1
input

我想删除所有在行后出现单词的input连续cell行

我试过代码

awk '{for (i=1;i<=NF;i++) if (!a[$i]++) print($i,FS)}{print("\n")}' file

但是没有想要的改变。

score 3 · Accepted Answer

使用gnu-awk你可以使用RSand RT：

awk -v RS='cell [^\n]*\n(input\n)+' '{sub(/\n.+/, "\n", RT); ORS=RT} 1' file

cell (HB)
Z
output
A
input
cell (BP)
Z1
output
A1
input

这里：

-v RS='cell [^\n]*\n(input\n)+'设置RS为cell后跟一个空格，然后是任何内容，直到换行符，然后是包含文本input的多行。
sub(...)：删除第一个换行符后删除的所有内容
ORS=RT: 设置输出记录分隔符与包含的文本相同RT
1：打印每条记录ORS

score 3 · Accepted Answer

仅使用您显示的示例，您能否尝试以下操作。用 GNU 编写和测试awk。

awk '
!/input/{
  if(count==1){
    print prev
  }
  count=0
  prev=""
}
/input/{
  count++
  prev=$0
  next
}
1
END{
  if(count==1){
    print prev
  }
}
' Input_file

score 1 · Accepted Answer

比预想的短，不知道是不是有什么问题：

$ awk '!(f&&/input/){print;f=0}/cell/{f=1}' file

输出：

cell (HB)
Z
output
A
input
cell (BP)
Z1
output
A1
input

score 1 · Accepted Answer

这可能对您有用（GNU sed）：

sed -E ':a;N;s/(cell.*)\n.*input/\1/;ta;P;D' file

通过设置打开扩展正则表达式-E。

打开一个两行窗口。

如果该行包含cell，然后下一行包含input，则删除最后一行并重复。

否则，打印/删除第一行并重复。

这个通用解决方案一起删除了重复的行。

sed -E 'N;/^(.*)\n\1$/{:a;s/\n.*//;$!{N;/^(.*)\n\1$/ba};D};P;D' file

通过设置打开扩展正则表达式-E。

打开一个两行窗口。

如果窗口中的行重复，请删除最后一行并继续这样做，直到两行不同，然后删除第一行。

否则，打印/删除第一行并重复。

awk - 删除模式后的所有连续重复行

4 回答 4

Related

Reference