xpath - 使用 XPath 从 html 页面获取标题标签？

Question

我有两个页面试图从使用 Xpath 查询中提取标题标签。此页面有效： http: //www.hobbyfarms.com/farm-directory/category-home-and-barn-resources-1.aspx

此页面没有： http ://cattletoday.com/links/Barns_and_Metal_Buildings/page-1.html?s=A

这是我的代码：

$dom = new DOMDocument();
@$dom->loadHTMLFile($href);
$xpath = new DOMXPath($dom);

$titleNode = $xpath->query("//title");
foreach ($titleNode as $n) {
    $pageTitle = $n->nodeValue;
}

我也试过这个：

$xpath->query('//title')->item(0)->textContent

但它也不适用于一个 URL。

有谁知道为什么会这样？并希望有一个解决方案。

score 4 · Accepted Answer

文件已压缩，以下脚本有效：

$href = 'http://cattletoday.com/links/Barns_and_Metal_Buildings/page-1.html?s=A';
$dom = new DOMDocument();
$file = gzdecode(file_get_contents($href));
$dom->loadHTML($file);
$xpath = new DOMXPath($dom); 
$titleNode = $xpath->query('//title');
var_dump($titleNode->item(0));

（注意使用的 gzdecode 函数）

score 2 · Accepted Answer

第二个页面使用 XHTML 命名空间，因此您必须使用具有该命名空间的 XPath 限定：

$xpath->registerNamespace("xhtml", "http://www.w3.org/1999/xhtml");
$titleNode = $xpath->query("//xhtml:title|//title");

xpath - 使用 XPath 从 html 页面获取标题标签？

2 回答 2

Related

Reference