php - 正则表达式在末尾修剪 html 空格

Question

可以使用以下组合填充文本字段

<p></p> 
<p>&nbsp;</p>
<br>
<span></span>
<div></div>

以及其他几种变体，包括空格和

我想删除它，因为它搞砸了网络上的格式。

我正在考虑一个递归函数，它删除尾随
和“”，然后找到最后一个结束标签，找到匹配的开始标签，并将内容提供给自己。如果返回内容为空，则移除标签。

它可以在 mssqlserver 2008、vbscript（经典 asp）或 php 中作为存储过程运行。

score 0 · Accepted Answer

最简单的答案是这样，不涉及复杂的正则表达式：

$html = str_replace('<span></span>', '' ,$html);
$html = str_replace('<p></p>', '' ,$html);
$html = str_replace('<div></div>', '' ,$html);

将 $html 替换为所有输出的字符串。

简单！

score 0 · Accepted Answer

这可以通过正则表达式来完成，我认为在这种情况下，DOM 并不是最简单的方法。一个php的例子：

$pattern = '~(?><(p|span|div)\b[^>]*+>(?>\s++|&nbsp;)*</\1>|<br/?+>|&nbsp;|\s++)+$~i';
$result = preg_replace($pattern, '', $text);

解释：

~
 (?>                          # open an atomic group
     <(p|span|div)\b[^>]*+>   # opening tags, note that this subpattern allows
                              # attributes with [^>]*+ you can remove it if you
                              # don't need it
           (?>\s++|&nbsp;)*   # content allowed inside the tags *

     </\1>                    # closing tag (refer to the first capturing group)
   |                          # OR
     <br/?+>                  # stand alone tag <br>
   |                          # OR
     &nbsp;                   # &nbsp;
   |                          # OR
     \s++                     # white characters
  )+$
~i

(*) 请注意，此模式不处理嵌套标签，例如：<div><p></p><\div>但可以使用递归模式解决问题：

$pattern = '~(<(p|span|div)\b[^>]*+>(?1)*</\2>|<br/?+>|&nbsp;|\s++)+$~i';

这里(?1)指的是第一个捕获组。

php - 正则表达式在末尾修剪 html 空格

2 回答 2

Related

Reference