php - 使用正则表达式将标记解析为抽象语法树

Question

本题补充：Recursive processing of markup using Regular Expression and DOMDocument

所选答案提供的代码对理解构建基本语法树有很大帮助。但是，我现在在收紧正则表达式以仅匹配我的语法而不是匹配{.但不匹配时遇到了麻烦{{。理想情况下，我希望它只匹配我的语法，即：

{<anchor>}
{!image!}
{*strong*}
{/emphasis/}
{|code|}
{-strikethrough-}
{>small<}

两个标签，a也small需要不同的结束标签。我尝试$re_closetag从原始代码示例进行修改以反映这一点，但它仍然与文本匹配太多。

例如：

http://www.google.com/>} bang 
smäll<} boom

我的测试字符串是：

tëstïng {{ 汉字/漢字 }} testing {<http://www.google.com/>} bang {>smäll<} boom {* strông{/ ëmphäsïs {- strïkë {| côdë |} -} /} *} {*wôw*} 1, 2, 3

score 1 · Accepted Answer

您可以在 RE 本身中或在比赛之后控制它。

在重新，控制哪些标签可以“打开”修改这部分$re_next：

(?:\{(?P<opentag>[^{\s]))  # match an open tag
      #which is "{" followed by anything other than whitespace or another "{"

目前它会查找任何不是{空格的字符。只需更改为：

(?:\{(?P<opentag>[<!*/|>-]))

现在它只查找您特定的打开标签。

关闭标签部分一次只匹配一个字符，具体取决于当前上下文中打开的标签。（这就是$opentag参数的用途。）所以要匹配一对字符，只需$opentag在递归调用中更改要查找的内容。例如：

        if (isset($m['opentag']) && $m['opentag'][1] !== -1) {
            list($newopen, $_) = $m['opentag'];

            // change the close character to look for in the new context
            if ($newopen==='>') $newopen = '<';
            else if ($newopen==='<') $newopen = '>';

            list($subast, $offset) = str_to_ast($s, $offset, array(), $newopen);
            $ast[] = array($newopen, $subast);
        } else if (isset($m['text']) && $m['text'][1] !== -1) {

或者，您可以保持 RE 原样，并在事后决定如何处理比赛。例如，如果您匹配一个@字符但{@不是允许的打开标记，您可以引发解析错误或简单地将其视为文本节点（附加array('#text', '{@')到 ast）或介于两者之间的任何内容。

php - 使用正则表达式将标记解析为抽象语法树

1 回答 1

Related

Reference