php - 简单的 Web 抓取 PHP Xpath DOM

Question

我正在尝试学习网络抓取并使用此示例从页面获取链接。有没有更好的方法来做到这一点，或者例如获得 h1 的最简单方法是什么？

$html = file_get_contents('page.html');

//parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

//grab all the links on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $url = $href->getAttribute('href');
    echo "<br />Link: $url";

}

score 2 · Accepted Answer

无需在您的 expath 前加上/html/body，//a应该可以正常工作。

另外，我会使用foreach而不是 for 循环，但这主要是一种风格选择。

php - 简单的 Web 抓取 PHP Xpath DOM

1 回答 1

Related

Reference