php - 需要正则表达式的解释

Question

谁能解释一下这个正则表达式的含义？

$html = preg_replace("# <(?![/a-z]) | (?<=\s)>(?![a-z]) #exi", "htmlentities('$0')", $html);

有人在如何以比使用 strip_tags 函数更安全的方式去除标签？但我无法理解。

这是我在stackoverflow上的第一篇文章，所以如果我犯了任何错误，请原谅我。

谢谢！

score 2 · Accepted Answer

#...#      the # and # are just characters to start en end a REGEX
           (you can use a lot of character for this)
#exi       the e, x and i flags. See the PHP.net site for information
           about it

<          the < character
(?!...)    a negative lookahead. The REGEX matches when the characters
           after this are NOT equal to one of those
[/a-z]     a character class, matches for the / character and the
           letters a - z
|          OR
(?<=\s)    a positive lookbehind. The REGEX maches when there is
           \s (whitepspace) before
>          the > character
(?![a-z])  negative lookahead for the letters a - z

所以基本上，它匹配所有不用作标签的字符<。>例如，<fooand </foowill not match 并且foo>will not aswell。但是1 < 3会匹配。这将被传递给htmlentities函数并成为1 < 3. 现在，您可以strip_tags只删除标签。

score 0 · Accepted Answer

在我看来，它只是试图根据 < 或 > 之后的以下字符是否是数字来确定什么不是 HTML 标记。

这意味着它将捕获<以下内容：

<span>This is <5 ml.</span>

并将其替换为与该字符等效的 HTML 实体，从而使您可以安全地使用strip_tags而不会破坏字符串的含义（如您引用的问题中所述）。

score 0 · Accepted Answer

寻找<没有跟随的a-z

或者

>后面没有的空格a-z

然后它将它替换为htmlentities('$0')$0 是您的整个比赛！

i选项忽略大小写

e做正常替换

x忽略文字空白

php - 需要正则表达式的解释

3 回答 3

Related

Reference