php - 用 preg_replace 替换所有 html 代码

Question

我想将所有 html 代码替换为空白空间。我想我应该使用preg_replace函数，但我不确定如果 html 代码看起来像这样，我应该怎么做：

&#8221;
&#946;

$text="&#946; something &#8221; test..."

$text=preg_replace("&# [what should be here?] ;", " ", $text);

echo $text;
result =  something  test...

我认为它应该只是数字，因为我在这里只找到了数字：http ://www.ascii.cl/htmlcodes.htm

score 3 · Accepted Answer

你可以看看strip_tags正是这样做的。然而那些不是 HTML 代码，它们被称为 HTML 实体。

匹配您想要的正则表达式如下所示：

(&#.+?;)

它相当简单，寻找&#然后任何重复的字符直到;。

编辑：正如 Qtax 指出的那样，它们不必是数字。点匹配所有。

score 2 · Accepted Answer

HTML 字符引用可以通过两种方式定义。假设您只想替换数字字符引用，则需要一个解析这些格式的正则表达式：

&#D;其中 D 是十进制数
&#xH;其中 H 是十六进制数

兼顾两者的正则表达式：

/&#(\d+|x[\da-f]+);/i

score 0 · Accepted Answer

如果您想替换所有 HTML 实体&foo;，可以使用以下内容：

preg_replace('/&(?:[a-z]+|#x[\da-f]+|#\d+);/i', ' ', $text);

如果要解码它们，请使用html_entity_decode.

score 0 · Accepted Answer

&<something>;是 HTML 实体的语法。如果要替换所有这些，请使用此正则表达式：

preg_replace('/&.*?;/', '', $subject); // from ampersand till the next semicolon

它将用空字符串替换所有 HTML 实体，包括ä,&x20;和其他

php - 用 preg_replace 替换所有 html 代码

4 回答 4

Related

Reference