html - 清除从 Word 转换的 HTML 文档中的多余标签

Question

我在 Word 到 HTML 转换后生成了大型 HTML 文档。HTML 代码包含大量额外的标签。我想用正则表达式清除额外的标签。我正在使用 UltraEdit 编辑器 (v11.20)。我尝试了一些正则表达式来在 html 中找到所需的位置，但对我不起作用。（例如'*'）

这是代码示例：

<P LANG="en-US" CLASS="western" ALIGN=JUSTIFY STYLE="margin-left: -0.49in; margin-right: -0.59in; text-indent: 0.3in; margin-bottom: 0in">
<FONT COLOR="#943634">       </FONT><FONT COLOR="#943634"><FONT FACE="Arial, sans-serif"><FONT SIZE=5 STYLE="font-size: 20pt"><B> TEXT TEXT</B></FONT></FONT></FONT></P>

我想使用正则表达式来替换它

<h1> TEXT TEXT TEXT</h1>

注意，里面有空格

<font color="#943634"> </font>标签

此外，标签内的文本<B> </B>可以很长，并且可以移动到新行。

score 2 · Accepted Answer

2

我通过使用查找和替换命令清除标签来解决问题，只需重新运行几次。

于 2016-07-21T13:09:37.263 回答

score 0 · Accepted Answer

出色地，

要删除标签 p 添加在标签 img 周围，请尝试：

function wp_bootstrap_filter_ptags_on_images( $content ){
    return preg_replace( '/<p>\s*(<a .*>)?\s*(<img .* \/>)\s*(<\/a>)?\s*<\/p>/iU', '\1\2\3', $content );
}
add_filter('the_content', 'wp_bootstrap_filter_ptags_on_images');

......这不是全部，但它是一个东西，而不是什么......！:-)

html - 清除从 Word 转换的 HTML 文档中的多余标签

2 回答 2

Related

Reference