0
$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd";
$pageContentData = file_get_contents($urlToScrap);
$doc = new DOMDocument();
$doc->loadHTML($pageContentData);
$listOfDivs = $doc->getElementsByTagName("div");
foreach ($listOfDivs as $div) {
    if($div->getAttribute("class") == "doc-banner-icon"){
        $img = $div->getElementsByTagName("img");
        var_dump($img->getAttribute("src"));
    }
}

返回空。

我在dom中有以下元素:

<div class="doc-banner-icon"><img src="somesrc"></div>

我正在尝试获取 img src,由于页面中有很多图像,我想先获取父 div,然后提取其中的图像。

解决方案在这里:

$urlToScrap = "https://play.google.com/store/apps/details?id=flipboard.app#?t=W251bGwsMSwxLDIxMiwiZmxpcGJvYXJkLmFwcCJd";
$pageContentData = file_get_contents($urlToScrap);
$doc = new DOMDocument();
$doc->loadHTML($pageContentData);
$listOfDivs = $doc->getElementsByTagName("div");
foreach ($listOfDivs as $div) {
    if($div->getAttribute("class") == "doc-banner-icon"){
        $listOfImages = $div->getElementsByTagName("img");
        foreach($listOfImages as $img){
            var_dump($img->getAttribute("src"));
        }
    }
}
4

1 回答 1

0

您没有遗漏任何东西,var_dumpDOMNodeList. 试试这个:

$listOfImages = $doc->getElementsByTagName("img");

foreach ($listOfImages as $img) {
    $imgClass = $img->getAttribute('class');

    echo $imgClass;
}

在您更新的问题中,只需更改:

$img->getAttribute("src")

到:

$img->item(0)->getAttribute("src")

鉴于您的选择标准相当复杂,您可以考虑使用 XPath 而不是手动导航:

$doc = new DOMDocument();
$doc->loadHTML($pageContentData);

$xpath = new DOMXPath($doc);
$img = $xpath->query("//div[@class = 'doc-banner-icon']/img");

var_dump($img->item(0)->getAttribute('src'));
于 2013-07-09T12:57:53.133 回答