我还建议使用 PHP DOM 而不是正则表达式,这通常是不准确的。这是一个示例代码,您可以使用它从字符串中去除所有 img 标签和所有背景属性:
// ...loading the DOM
$dom = new DOMDocument();
@$dom->loadHTML($string); // Using @ to hide any parse warning sometimes resulting from markup errors
$dom->preserveWhiteSpace = false;
// Here we strip all the img tags in the document
$images = $dom->getElementsByTagName('img');
$imgs = array();
foreach($images as $img) {
$imgs[] = $img;
}
foreach($imgs as $img) {
$img->parentNode->removeChild($img);
}
// This part strips all 'background' attribute in (all) the body tag(s)
$bodies = $dom->getElementsByTagName('body');
$bodybg = array();
foreach($bodies as $bg) {
$bodybg[] = $bg;
}
foreach($bodybg as $bg) {
$bg->removeAttribute('background');
}
$str = $dom->saveHTML();
我选择了 body 标签而不是 table,因为它<table>
本身没有background
属性,它只有bgcolor
. 要去除背景内联 css 属性,您可以使用sabberworm 的 PHP CSS Parser
来解析从 DOM 检索到的 CSS:试试这个
// Selecting all the elements since each one could have a style attribute
$alltags = $dom->getElementsByTagName('*');
$tags = array();
foreach($alltags as $tag) {
$tags[] = $tag;
} $css = array();
foreach($tags as &$tag) {
$oParser = new CSSParser("p{".$tag->getAttribute('style')."}");
$oCss = $oParser->parse();
foreach($oCss->getAllRuleSets() as $oRuleSet) {
$oRuleSet->removeRule('background');
$oRuleSet->removeRule('background-image');
}
$css = $oCss->__toString();
$css = substr_replace($css, '', 0, 3);
$css = substr_replace($css, '', -2, 2);
if($css)
$tag->setAttribute('style', $css);
}
一起使用所有这些代码,例如,如果你有一个
$string = '<!DOCTYPE html>
<html><body background="http://yo.ur/background/dot/com" etc="an attribute value">
<img src="http://your.pa/th/to/image"><img src="http://anoth.er/path/to/image">
<div style="background-image:url(http://inli.ne/css/background);border: 1px solid black">div content...</div>
<div style="background:url(http://inli.ne/css/background);border: 1px solid black">2nd div content...</div>
</body></html>';
PHP将输出
<!DOCTYPE html>
<html><body etc="an attribute value">
<div style="border: 1px solid black;">div content...</div>
<div style="border: 1px solid black;">2nd div content...</div>
</body></html>