regex - 使用正则表达式将 HTML 转换为 wikitext

Question

我一直在尝试在 notepad++ 中使用正则表达式来自动化我需要对文档进行的许多更改，但我认为我并不真正理解语法。

我有几个文本部分类似于：

<a class='endnote' href='#cite1'><sup>[1]</sup></a>

数字是唯一的变量，我想将其更改为：

<ref name="cite1" />

和

    <div id='cite1'>
    <p class='cite'><sup>1</sup>a bunch of text</p>
    </div>

数字是唯一的变量，我想将其更改为：

 <ref name="cite1">a bunch of text</ref>

score 0 · Accepted Answer

第一的

第一个字符串可以使用 this 替换，它验证锚标记有一个名为 endnote 的类，并提取不包括#.

正则表达式：<a\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sclass=['"]endnote['"])(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\shref=['"]\#(cite[^'"]*)['"])(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?> .*?<\/a>

用。。。来代替：<ref name="$1" />

在此处输入图像描述

第二

可以使用此正则表达式替换第二个字符串

正则表达式：<div\b(?=\s|>)(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?>.*?<p\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sclass=['"](cite)['"])(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?><sup>([^<]*)<\/sup>(.*?)<\/p>.*?<\/div>

用。。。来代替：<ref name="$1$2">$3</ref>

在此处输入图像描述

score 0 · Accepted Answer

现在你应该使用Parsoid将 HTML 转换回 wikitext，而不是发明你自己的解析器（又一个）。

regex - 使用正则表达式将 HTML 转换为 wikitext

2 回答 2

第一的

第二

Related

Reference