regex - 使用正则表达式匹配多行

Question

我有以下文字。

^0001   HeadOne


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

^0002   HeadTwo


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.


^004    HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

^0004   HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

下面是我用来查找的正则表达式。

@@([\n\r\s]*)(.*)([\n\r\s]+)\^

但这只是捕捉^0001，^0003因为它们只有一个段落，但在我的文本中有多个段落内容。

我正在使用 VS 代码，有人可以告诉我如何在 VS 代码或 NPP 中使用 REGEX 捕获这样的多参数字符串。

谢谢

score 2 · Accepted Answer

关于 VSCode 正则表达式的一件奇怪的事情是它\s不匹配所有换行符。需要使用[\s\r]来匹配所有这些。

牢记这一点，您希望匹配所有以 a 开头的子字符串，@@然后^在一行的开头或字符串的结尾延伸到 a。

我建议：

@@.*(?:[\n\r]+(?!\s*\^).*)*

查看正则表达式演示

注意：要仅在行首匹配，请在模式的开头@@添加.^^@@.*(?:[\s\r]+(?!\s*\^).*)*

注意 2：从VSCode 1.29开始，您需要启用search.usePCRE2选项以在您的正则表达式模式中启用前瞻。

细节

^- 一行的开始
@@- 文字@@
.*- 该行的其余部分（除换行符之外的 0+ 个字符）
(?:[\n\r]?(?!\s*\^).*)*- 0 次或多次连续出现：
- [\n\r]+(?!\s*\^)- 一个或多个换行符后面没有 0+ 空格，然后是^字符
- .*- 线路的其余部分

在 Notepad++中，使用^@@.*(?:\R(?!\h*\^).*)*where\R匹配换行符，并\h*匹配 0 个或多个水平空格（删除 if^始终是分隔线上的第一个字符）。

score 0 · Accepted Answer

我将您的输入数据插入 /tmp/test 并使用 perl 语法使以下内容工作

grep -Pzo "@@(?:\s*\n)+((?:.*\s*\n)+)(?:\^.*)*\n+" /tmp/test

这应该将不以 ^ 开头的段落放入 $1。您可能需要将 \r 添加回其中以使其完美匹配

regex - 使用正则表达式匹配多行

2 回答 2

Related

Reference