dom - 修复代码以从 dom 文档中获取数据 (getElementby...)

Question

网址：sayuri.go.jp/used-cars

$content = file_get_contents('http://www.sayuri.co.jp/used-cars/');
$dom = new DOMDocument;
$dom->loadHTML($content);

部分源代码：

<td colspan="4">

<h4 class="stk-title"><a href="/used-cars/B37753-Toyota-Wish-japanese-used-cars">Toyota Wish G</a></h4>
</td>

<td colspan="4">

我正在尝试浏览源代码，对于上面的每个部分，我都想保存 URL，例如：“/used-cars/B37753-Toyota-Wish-japanese-used-cars”

这是我正在使用但到目前为止不成功的代码

$p = $dom->getElementsByTagName("h4");

$titles = array();

   foreach ($p as $node) {
     if ($node->hasAttributes()) {
     if($node->getAttribute('class') == "stk-title") {
       foreach ($node->attributes as $attr) {
         if ($attr->nodeName == "href") {
            array_push($titles , $attr->nodeValue); 
           }
         }
       }
     }
   }


print_r($titles) ;

它应该给我一个包含每辆车的所有 url 的数组： ("/used-cars/B37753-Toyota-Wish-japanese-used-cars" , "" , "" ......)

但它返回一个空数组 - 我想我在代码中犯了一个错误，它无法访问网址。

我还需要将汽车名称保存在变量中，例如：$car_name = "Toyota Wish G"

score 0 · Accepted Answer

使用 XPath：

$doc = new DOMDocument;
$doc->loadHTMLFile('http://www.sayuri.co.jp/used-cars/');

$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//table[@class="itemlist-table"]//h4[@class="stk-title"]/a');

$links = array();
foreach ($nodes as $node) {
    $links[] = array(
        'href' => $node->getAttribute('href'),
        'text' => $node->textContent,
    );
}

print_r($links);

dom - 修复代码以从 dom 文档中获取数据 (getElementby...)

1 回答 1

Related

Reference