15

I used @Alex's approach here to remove script tags from a HTML document using the built in DOMDocument. The problem is if I have a script tag with Javascript content and then another script tag that links to an external Javascript source file, not all script tags are removed from the HTML.

$result = '
<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>
            hey
        </title>
        <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
        <script>
            alert("hello");
        </script>
    </head>
    <body>hey</body>
</html>
';

$dom = new DOMDocument();
if($dom->loadHTML($result))
{
    $script_tags = $dom->getElementsByTagName('script');

    $length = $script_tags->length;

    for ($i = 0; $i < $length; $i++) {
        if(is_object($script_tags->item($i)->parentNode)) {
            $script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
        }
    }

    echo $dom->saveHTML();
}

The above code outputs:

<html>
    <head>
        <meta charset="utf-8">
        <title>hey</title>
        <script>
        alert("hello");
        </script>
    </head>
    <body>
        hey
    </body>
</html>

As you can see from the output, only the external script tag was removed. Is there anything I can do to ensure all script tags are removed?

4

2 回答 2

23

你的错误实际上是微不足道的。一个DOMNode对象(及其所有后代 -DOMElementDOMNodeList其他一些!)在其父元素更改时自动更新,尤其是在其子元素数量更改时。这写在 PHP 文档中的几行上,但大部分都被扫到了地毯下。

如果您使用 循环($k instanceof DOMNode)->length,然后从节点中删除元素,您会注意到length属性实际上发生了变化!我不得不编写自己的库来抵消这个和其他一些怪癖。

解决方案:

if($dom->loadHTML($result))
{
    while (($r = $dom->getElementsByTagName("script")) && $r->length) {
            $r->item(0)->parentNode->removeChild($r->item(0));
    }
echo $dom->saveHTML();

我实际上并没有循环 - 只是一次弹出第一个元素。结果:http ://sebrenauld.co.uk/domremovescript.php

于 2013-04-10T12:44:25.387 回答
15

为了避免你得到一个活动节点列表的惊喜——当你删除节点时它会变短——你可以使用一个副本到一个数组中iterator_to_array

foreach(iterator_to_array($dom->getElementsByTagName($tag)) as $node) {
    $node->parentNode->removeChild($node);
};  
于 2016-06-10T23:07:15.843 回答