php - 简单的 HTML DOM 解析器返回错误的元素树

Question

我遇到了 HTML DOM 解析器的问题。这是我使用的：

$url = 'http://topmmanews.com/2013/04/06/ufc-on-fuel-tv-9-results/';

$page = file_get_html($url);

$ret = $page->find("div.posttext",0);

应该返回我 count($ret->children()) = 10。但是，它只返回 3，即第 3 个之后的所有元素

合并到其中并仅创建一个元素。

谁能帮我知道我的代码是否有问题或者是简单的 HTML DOM 解析器错误？

score 1 · Accepted Answer

正如所Álvaro G. Vicario指出的，您的目标 HTML 格式不正确。我尝试了您的代码，但正如您在此处看到的，它显示了三个子节点和 6 个其他节点：

在此处输入图像描述

但是另一种可能有用的方法是使用DOMDocument并且DOMXPath像这样：

$url = 'http://topmmanews.com/2013/04/06/ufc-on-fuel-tv-9-results/';
$html = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadHTML($html);

$dom_xpath = new DOMXpath($dom);

// XPATH to return the first DIV with class "posttext"
$elements = $dom_xpath->query("(//div[@class='posttext'])[1]");

然后您可以遍历子节点并读取值或您想要的任何内容。

score 0 · Accepted Answer

phpquery 使用 DOM，因此它是一个更可靠的解析器，但 html 错误：

$html = file_get_contents('http://topmmanews.com/2013/04/06/ufc-on-fuel-tv-9-results/');
$dom = phpQuery::newDocumentHTML($html);
$ret = $dom->find("div.posttext")->eq(0);
echo count($ret->children());
#=> 10

php - 简单的 HTML DOM 解析器返回错误的元素树

2 回答 2

Related

Reference