php - PHP DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity

Question

I trying to get the "link" elements from certain webpages. I can't figure out what i'm doing wrong though. I'm getting the following error:

Severity: Warning

Message: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 536

Filename: controllers/test.php

Line Number: 34

Line 34 is the following in the code:

      $dom->loadHTML($html);

my code:

            $url = "http://www.amazon.com/";

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    if($html = curl_exec($ch)){

        // parse the html into a DOMDocument
        $dom = new DOMDocument();

        $dom->recover = true;
        $dom->strictErrorChecking = false;

        $dom->loadHTML($html);

        $hrefs = $dom->getElementsByTagName('a');

        echo "<pre>";
        print_r($hrefs);
        echo "</pre>";

        curl_close($ch);


    }else{
        echo "The website could not be reached.";
    }

score 42 · Accepted Answer

这意味着某些 HTML 代码无效。这只是一个警告，而不是错误。您的脚本仍将处理它。抑制警告集

 libxml_use_internal_errors(true);

或者你可以通过这样做完全抑制警告

@$dom->loadHTML($html);

score 15 · Accepted Answer

这可能是由一个流氓&符号引起的，该符号紧随其后的是一个适当的标签。否则你会收到一个丢失的;错误。请参阅：警告：DOMDocument::loadHTML(): htmlParseEntityRef: 期待 ';' 在实体中，。

解决方案是 - 将&符号替换为&
或者如果你必须拥有&它，那么你可以将它包含在：<![CDATA[-]]>

score 2 · Accepted Answer

HTML 格式不正确。如果格式不够好，将 HTML 加载到 DOM 文档中甚至可能会失败。如果 loadHTML 不起作用，那么抑制错误是没有意义的。如果您无法将 HTML 加载到 DOM 中，我建议使用 HTML Tidy 之类的工具来“清理”格式不佳的 HTML。

HTML Tidy 可以在这里找到http://www.htacg.org/tidy-html5/

php - PHP DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity

3 回答 3

Related

Reference