regex - Autohotkey 正则表达式在多行中去除 html 标签

Question

我在 html 文件中有以下标记，我只需要使用自动热键和正则表达式从中获取文本“XX(1119601.1)”。由于结束标签仅在几个换行符后出现，我无法获取标签之间的文本。

       <dd class="call_number">
      <!-- holdings allowed -->
    XX(1119601.1)

       </dd>

对此的任何帮助将不胜感激。

score 0 · Accepted Answer

txt =
(Ltrim
    <dd class="call_number">
       <!-- holdings allowed -->
    XX(1119601.1)
    </dd>
)

RegexMatch(txt, "<dd .+?>(.*)</dd>", m)
msgbox % RegexReplace(m1, "<!.+>")

此代码首先匹配标签中的所有内容（您可以使其更具体，例如仅匹配标签中的字符串），然后替换 Html 注释。

您也可以使用 RegexReplace 删除不需要的换行符。

编辑： 将 RegexMatch 更改为不自动删除换行符。

regex - Autohotkey 正则表达式在多行中去除 html 标签

1 回答 1

Related

Reference