1

我正在使用 PHP 的 DomDocument 类来解析 HTML。

当我给它带有锚点的 html 并要求它找到所有锚点并将它们存储在一个数组中时,它给了我一个空数组,就好像没有锚点一样。

为什么会这样,我该如何解决?

这是代码:

$dom = new DOMDocument();
$domObject->loadHTML($content);
$anchors = $domObject->getElementsByTagName('a');
print_r($anchors); // returns empty array.

$content 看起来像这样:

     <p>
        Friend David, I do not think we shall need a call bell as Hello! can be heard 10 to 20 feet away. What you think? Edison - P.S. first cost of sender & receiver to manufacture is only $7.00.[12] Hello, hello! New York, hello!
       </p>
       <a href="http://the-irf.com/hello/hello5.html">Prev</a>
       <a href="hello7.html">Next</a>
       <a href="end.html">End</a>
    </body>
</html>
4

2 回答 2

2

$domObject被设置在哪里?尝试这个:

$matchList = array();
$dom = new DOMDocument();
$dom->loadHTML($content);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) {
    array_push($matchList, $anchor->getAttribute('href'));
}
var_dump($matchList);
于 2013-04-10T23:49:14.233 回答
1

请注意,代码 - 修复$dom/$domNode错字后,不会返回空数组。相反,它返回:

DOMNodeList Object
(
)

这意味着它只返回了一个具有私有属性的对象。print_r()因此它在输出中看起来是空的。

但是结果不是空的并且DOMNodeList实现了Iterator接口。所以你可以遍历结果:

foreach($anchors as $anchor) {
    var_dump($anchor->nodeValue);
}

检查结果是否为空的一种更简单的方法是检查节点列表的长度:

echo "The query returned " . $anchors->length . " nodes";

这里有一个完整的例子:

$html = <<<EOF
<html>
  <head></head>
  <body>
     <p> 
        Friend David, I do not think we shall need a call bell as Hello! can be heard 10 to 20 feet away. What you think? Edison - P.S. first cost of sender & receiver to manufacture is only $7.00.[12] Hello, hello! New York, hello!
       </p>
       <a href="http://the-irf.com/hello/hello5.html">Prev</a>
       <a href="hello7.html">Next</a>
       <a href="end.html">End</a>
    </body>
</html>
EOF;

$domObject = new DOMDocument();
$domObject->loadHTML($html);
$anchors = $domObject->getElementsByTagName('a');

$links = array();
foreach($anchors as $anchor) {
    $links[] = $anchor->getAttribute('href');
}

var_dump($links);

输出

string(36) "http://the-irf.com/hello/hello5.html"
string(11) "hello7.html"
string(8) "end.html"
于 2013-04-10T23:55:03.433 回答