regex - 拆分 pcregrep 多行匹配

Question

tl; dr：如何使用 pcregrep 拆分每个多行匹配？

长版：我有一些文件以（小写）字符开头，有些以数字或特殊字符开头。如果我至少有两行以小写字母开头，我希望在我的输出中使用它。但是，我希望每个发现都被分隔/拆分，而不是相互附加。这是正则表达式：

pcregrep -M "([a-z][^\n]*\n){2,}"

因此，如果我提供这样的文件：

-- Header -- 
info1 
info2 
something 
< not interesting > 
dont need this 
+ new section 
additional 1 
additional 2

给出的结果是

info1 
info2
something 
additional 1
additional 2

然而，我想要的是这样的：

info1 
info2 
something 

additional 1
additional 2

这可能和/或我必须开始使用 Python（或类似的）吗？即使建议从这里开始使用其他东西，首先知道它是否可能仍然会很高兴。

谢谢！

score 1 · Accepted Answer

以下sed似乎可以解决问题：

sed -n '/^[a-z]/N;/^[a-z].*\n[a-z]/{p;:l n;/^[a-z]/{p;bl};a\

}'

解释：

/^[a-z]/{           # if a line starts with a LC letter
  N;                   # consume the next line while conserving the previous one
  /^[a-z].*\n[a-z]/{   # test whether the second line also starts with a LC letter
    p;                   # print the two lines of the buffer
    l: n;                # define a label "l", and reads a new line
    /^[a-z]/{            # if the new line still starts with a LC letter
      p;                   # print it
      bl                   # jump back to label "l"
    }
    a\
                         # append a new line after every group of success 
  }
}

样品运行：

$ echo '-- Header --
> info1
> info2
> something
> < not interesting >
> dont need this
> + new section
> additional 1
> additional 2 ' | sed -n '/^[a-z]/N;/^[a-z].*\n[a-z]/{p;:l n;/^[a-z]/{p;bl};a\
>
> }'
info1
info2
something

additional 1
additional 2

regex - 拆分 pcregrep 多行匹配

1 回答 1

Related

Reference