php - 使用 php 正则表达式从 html 标签元素中删除属性

Question

想删除html标签中的任何属性，我认为这可以使用正则表达式来实现，但我不擅长使用正则表达式。

尝试使用 str_replace 但这不是正确的方法。我搜索了与此类似的问题，但找不到任何问题。

例子：

在变量中得到这样的 html 标签：

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

调用某些 preg_match()

$new_str = preg_match('', $str)

预期输出：

$new_str = '
<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>';

请注意，我不打算剥离 html 标签，而我只需要删除标签中的任何标签元素。

php strip_tags() isn't an option

得到这方面的帮助将不胜感激。

score 1 · Accepted Answer

虽然 regex 可以完成这项任务，但通常鼓励使用 DOM 函数进行过滤或其他 HTML 操作。这是一个可重用的类，它使用 DOM 方法删除不需要的属性。您只需设置所需的 HTML 标记和属性，它就会过滤掉不需要的 HTML 部分。

class allow_some_html_tags {
    var $doc = null;
    var $xpath = null;
    var $allowed_tags = "";
    var $allowed_properties = array();

    function loadHTML( $html ) {
        $this->doc = new DOMDocument();
        $html = strip_tags( $html, $this->allowed_tags );
        @$this->doc->loadHTML( $html );
        $this->xpath = new DOMXPath( $this->doc );
    }
    function setAllowed( $tags = array(), $properties = array() ) {
        foreach( $tags as $allow ) $this->allowed_tags .= "<{$allow}>";
        foreach( $properties as $allow ) $this->allowed_properties[$allow] = 1;
    }
    function getAttributes( $tag ) {
        $r = array();
        for( $i = 0; $i < $tag->attributes->length; $i++ )
            $r[] = $tag->attributes->item($i)->name;
        return( $r );
    }
    function getCleanHTML() {
        $tags = $this->xpath->query("//*");
        foreach( $tags as $tag ) {
            $a = $this->getAttributes( $tag );
            foreach( $a as $attribute ) {
                if( !isset( $this->allowed_properties[$attribute] ) )
                    $tag->removeAttribute( $attribute );
            }
        }
        return( strip_tags( $this->doc->saveHTML(), $this->allowed_tags ) );
    }
}

该类使用strip_tags了两次——一次是为了快速消除不需要的标签，然后在从剩余的属性中删除后，它会消除由 DOM 函数（doctype、html、body）插入的附加标签。要使用，只需执行以下操作：

$comments = new allow_some_html_tags();
$comments->setAllowed( array( "p", "span", "ul", "li" ), array("tabindex") );
$comments->loadHTML( $str );
$clean = $comments->getCleanHTML();

setAllowed 函数采用两个数组 - 一组允许的标签和一组允许的属性（如果您以后决定要保留一些）。我已经更改了您的输入字符串，以在某处包含一个添加的 tabindex="1" 属性来说明过滤。$clean 的输出是：

<p>content</p>
<span>content</span>
<ul tabindex="3"></ul><li>content</li>

score 0 · Accepted Answer

在 php 中删除 html 标签的最简单方法是strip_tags()

或者你可以通过删除

preg_replace("/<.*?>/", "", $str);

score 0 · Accepted Answer

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

$clean = preg_replace('/ .*".*"/', '', $str);

echo $clean;

将返回：

<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>

但是请不要使用正则表达式来解析 HTML，使用 DOM 解析器。

php - 使用 php 正则表达式从 html 标签元素中删除属性

3 回答 3

Related

Reference