html - 正则表达式查找内容，然后回溯到初始 HTML 标记

Question

我正在尝试使用正则表达式来匹配以标签开头并具有some specific content. 然后，我想替换从特定段落标签到页面末尾的所有内容。

我试过使用表达式<p.*?some content.*</html>，但它会抓住它看到的第一个标签，然后一直到最后。我希望它只识别内容之前的段落标记，允许段落标记和内容之间的其他内容和标记。

我怎样才能some specific content使用正则表达式，然后回溯到它在内容之前看到的第一个段落标签，然后从那里选择所有内容到最后？

如果有帮助，我正在使用 EditPad Pro 的“搜索和替换”功能（尽管这可能适用于任何使用正则表达式的东西）。

score 0 · Accepted Answer

对于简单的输入，使用正则表达式

<p[^<]*some content.*<\/html>

但更安全的是使用正则表达式

<p(?:[^<]*|<(?!p\b))*some content.*<\/html>

score 0 · Accepted Answer

首先，这是 Java 代码，但我想它可以很容易地适应其他正则表达式引擎/编程语言。

因此，据我了解，您需要一种情况，即给定输入的一部分以某些目标内容/短语开头并紧随其后。然后，您想用其他内容替换初始标记之后的所有内容吗？

如果这是正确的，你可以这样做：

String input; // holds your input text/html
String targetPhrase = "some specific content"; // some target content/phrase
String replacement; // holds the replacement value

Pattern p = Pattern.compile("<p[^>]*>(" + targetPhrase + ".*)$", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
m.replaceFirst(replacement);

当然，正如上面评论中提到的，你真的不想对 HTML 使用正则表达式。

或者，如果您知道标签就是这样，没有属性或任何东西，您可以尝试使用子字符串。

因此，例如，如果您正在寻找"some specific content"，您可以尝试以下操作：

String input; // your input text/html
String replacement; // the replacement value(s)

int index = input.indexOf("<p>some specific content");
if (index > -1) {
    String output = input.substring(0, index);
    output += "<p>" + replacement;

    // now output holds your modified text/html
}

html - 正则表达式查找内容，然后回溯到初始 HTML 标记

2 回答 2

Related

Reference