php - 这个嵌套 {* *} 的正则表达式模式有什么问题？

Question

我有这个 HTML 文档

{*
<h2 class="block_title bg0">ahmooooooooooooooooooooooooooooooooooooooooooodi</h2>
<div class="block_content padding bg0">{welc_msg}</div>
<br/>
    {*
    hii<br /><span>5
    *}

    {*
    hii<br /><span>5

    *}
*}

我想删除它，所以我想删除之间的任何东西{* *}

我写了正则表达式模式：

preg_replace("#(\{\*(.*?)\*\})+#isx",'',$html);

它可以工作，但理想情况下它不能 100% 工作，它*}最终会离开。

你能给我真正的模式吗？

score 1 · Accepted Answer

您需要一个递归正则表达式来匹配嵌套括号。它应该如下所示：

"#(\{\*([^{}]*?(?R)[^{}]*?)\*\})+#isx"

score 1 · Accepted Answer

如果您的正则表达式引擎支持匹配的嵌套结构（PHP 支持），那么您可以像这样一次性删除（可能是嵌套的）元素：

一次性应用递归正则表达式：

function stripNestedElementsRecursive($text) {
    return preg_replace('/
        # Match outermost (nestable) "{*...*}" element.
        \{\*        # Element start tag sequence.
        (?:         # Group zero or more element contents alternatives.
          [^{*]++   # Either one or more non-start-of-tag chars.
        | \{(?!\*)  # or "{" that is not beginning of a start tag.
        | \*(?!\})  # or "*" that is not beginning of an end tag.
        | (?R)      # or a valid nested matching tag element.
        )*          # Zero or more element contents alternatives.
        \*\}        # Element end tag sequence.
        /x', '', $text);
}

上面的递归正则表达式匹配最外层 {*...*}的元素，其中可能包含嵌套元素。

但是，如果您的正则表达式引擎不支持匹配的嵌套结构，您仍然可以完成工作，但您不能一次性完成。可以制作匹配最内层 {*...*}元素的正则表达式（即不包含任何嵌套元素的正则表达式）。可以以递归方式应用此正则表达式，直到文本中没有更多元素，如下所示：

递归应用的非递归正则表达式：

function stripNestedElementsNonRecursive($text) {
    $re = '/
        # Match innermost (not nested) "{*...*}" element.
        \{\*        # Element start tag sequence.
        (?:         # Group zero or more element contents alternatives.
          [^{*]++   # Either one or more non-start-of-tag chars.
        | \{(?!\*)  # or "{" that is not beginning of a start tag.
        | \*(?!\})  # or "*" that is not beginning of an end tag.
        )*          # Zero or more element contents alternatives.
        \*\}        # Element end tag sequence.
        /x';
    while (preg_match($re, $text)) {
        $text = preg_replace($re, '', $text);
    }
    return $text;
}

使用正则表达式处理嵌套结构是一个高级主题，必须小心行事！如果有人真的想将正则表达式用于诸如此类的高级应用程序，我强烈建议您阅读有关该主题的经典著作：杰弗里·弗里德尔（ Jeffrey Friedl ）的《掌握正则表达式》（第 3 版）。老实说，这是我读过的最有用的书。

快乐的正则表达式！

php - 这个嵌套 {* *} 的正则表达式模式有什么问题？

2 回答 2

一次性应用递归正则表达式：

递归应用的非递归正则表达式：

Related

Reference