regex - 正则表达式搜索和替换子循环（？）

Question

天，

我是正则表达式的新手，并试图节省转换垃圾 PDF“html”的时间，它已经用段落替换了所有列表项。使用 Dreamweaver CS6 或 notepad++ 我想知道是否我在应该是列表项的 p 标记周围手动添加 ul 标记，然后我可以使用列表项搜索/替换 ul 标记内的所有段落。

我一直在节省时间做以下事情：

Find: <p>Activity ([^>]*)</p>
Replace: <h2>Activity $1</h2>

Find: <p class="s23">([^>]*)</p>
Replace: <h3>$1</h3>

但我不知道是否可以在某个地方的正则表达式中循环，例如：

Find: *loop within ul* <p>([^>]*)</p>
Replace: <li>$1</li>

score 2 · Accepted Answer

If you have a look at what a regular expression is, you will realize that it is not possible to do flow control like loops with a regex alone. Quoting Wikipedia:

In computing, a regular expression provides a concise and flexible means to “match” (specify and recognize) strings of text, such as particular characters, words, or patterns of characters.

emphasis mine – simply put, a regex is a fancy way to find a string; it either does (it matches), or not. It is not a a set of logical processing instructions with a controllable flow – i.e. not a program.

However, there are other ways to achieve what you are after using a regex alone, as long as you use an editor that supports “Replace all” (probably a given) as well as multi-line matches and capture groups in its regex engine. Searching for

(<ul>)(<p>.*</p>)?<p>([^<])*</p>(<p>.*</p>)?(</ul>)

will match any <p></p> block inside an <ul></ul> block by allowing for an arbitrary number of preceding and following <p></p> blocks, including 0 of either. Assuming your backreference syntax is $x from your code examples, the replacement string would be

$1$2<li>$3</li>$4$5

– replace all occurences and you should be set.

regex - 正则表达式搜索和替换子循环（？）

1 回答 1

Related

Reference