1

使用 PHP 和 DOM 如何从以下代码(网页的一部分)中获取 PLACE、ADDRESS、LOCALITY、REGION、POSTAL CODE 和 COUNTRY。

从现在开始我已经开发了一部分代码来获取其他内容。这是到目前为止的代码。

$dochtml = new DOMDocument();
$dochtml->loadHTMLfile('');
$xpath = new DOMXpath($dochtml);

$descr = $xpath->query('//div[@class="description"]')->item(0);
    print_r($descr->nodeValue);

$abbr  = $dochtml->getElementsByTagName("abbr")->item(0);
    $title = $abbr->getAttribute("title");
    echo $title;

这是代码的其余部分。

<div class="vcard location p">
    <div class="fn org">
        <a href="link here">PLACE</a>
    </div>
    <div class="adr">
        <div class="street-address">ADDRESS<br></div>
        <div>
            <span class="locality">LOCALITY</span>,
            <span class="region">REGION</span>
            <span class="postal-code">POSTAL CODE</span>,
            <span class="country-name">COUNTRY</span>
        </div>
    </div>
</div>

更新

我对以下内容有一个小问题,在页面中有很多<abbr>标签,但是我想要的两个标签带有类dtstartdtend如下所示是#eventDetailInfo. 不幸的是,并非所有人都有第二个abbr标签,class=dtend所以它从“相关事件”中获得第一个标签。所以我的问题是我如何将它限制在这个特定的 ID 上?

<div id="eventDetailInfo">
        <div class="p">
         <div><abbr class="dtstart" title="2012-07-16T21:00:00">Monday, July 16th, 2012</abbr></div>    
         <div><abbr class="dtend" title="2012-08-16T21:00:00">Monday, August 16th, 2012</abbr></div>    
        </div>
</div>
4

3 回答 3

3

通过阅读DOMXPath文档,我建议的解决方案概述如下。

按类获取元素

$nodes = $xpath->query('//div[contains(@class, "street-address")]');

按 ID 获取元素

$node = $xpath->query('//div[@id="someid"]');

解决方案

要提取您的值,您可以使用类似(工作示例):

<?php
$html = '<div class="vcard location p">
    <div class="fn org">
        <a href="link here">PLACE</a>
    </div>
    <div class="adr">
        <div class="street-address">ADDRESS<br></div>
        <div>
            <span class="locality">LOCALITY</span>,
            <span class="region">REGION</span>
            <span class="postal-code">POSTAL CODE</span>,
            <span class="country-name">COUNTRY</span>
        </div>
    </div>
    <div id="eventDetailInfo">
        <div class="p">
         <div><abbr class="dtstart" title="2012-07-16T21:00:00">Monday, July 16th, 2012</abbr></div>    
         <div><abbr class="dtend" title="2012-08-16T21:00:00">Monday, August 16th, 2012</abbr></div>    
        </div>
    </div>
</div>';

$document = new DOMDocument();
$document->loadHTML($html);
$xPath = new DOMXpath($document);

function extractNodeValue($query, $xPath, $attribute = null) {
    $node = $xPath->query("//{$query}")->item(0);
    if (!$node) {
        return null;
    }
    return $attribute ? $node->getAttribute($attribute) : $node->nodeValue;
}

$place = extractNodeValue('div[contains(@class, "fn")]/a', $xPath);
$address = extractNodeValue('div[contains(@class, "street-address")]',$xPath);
$locality = extractNodeValue('span[contains(@class, "locality")]',$xPath);
$region = extractNodeValue('span[contains(@class, "region")]', $xPath);
$postalCode = extractNodeValue('span[contains(@class, "postal-code")]', $xPath);
$countryName = extractNodeValue('span[contains(@class, "country-name")]', $xPath);
$start = extractNodeValue('div[@id="eventDetailInfo"]/div/div/abbr[contains(@class, "dtstart")]', $xPath, 'title');
$end = extractNodeValue('div[@id="eventDetailInfo"]/div/div/abbr[contains(@class, "dtend")]', $xPath, 'title');

var_dump($place, $address, $locality, $region, $postalCode, $countryName, $start, $end);

输出:

string(5) "PLACE" string(7) "ADDRESS" string(8) "LOCALITY" string(6) "REGION" string(11) "POSTAL CODE" string(7) "COUNTRY" string(19) "2012-07-16T21:00:00" string(19) "2012-08-16T21:00:00"
于 2012-07-31T20:20:21.463 回答
0

你几乎完成了你的代码:

<?php

$dochtml = new DOMDocument();
$dochtml->loadHTML('<div class="vcard location p">
    <div class="fn org">
        <a href="link here">PLACE</a>
    </div>
    <div class="adr">
        <div class="street-address">ADDRESS<br></div>
        <div>
            <span class="locality">LOCALITY</span>,
            <span class="region">REGION</span>
            <span class="postal-code">POSTAL CODE</span>,
            <span class="country-name">COUNTRY</span>
        </div>
    </div>
</div>');

$xpath = new DOMXpath($dochtml);

$place       = $xpath->query('//div[@class="fn org"]/a')->item(0)->nodeValue;
$address     = $xpath->query('//div[@class="street-address"]')->item(0)->nodeValue;
$locality    = $xpath->query('//span[@class="locality"]')->item(0)->nodeValue;
$region      = $xpath->query('//span[@class="region"]')->item(0)->nodeValue;
$postalCode  = $xpath->query('//span[@class="postal-code"]')->item(0)->nodeValue;
$countryName = $xpath->query('//span[@class="country-name"]')->item(0)->nodeValue;

此处提供实时代码。

于 2012-07-31T20:30:19.993 回答
-1

如果您知道 CSS 选择器,请使用PHPQuery或类似的库。

于 2012-07-31T20:39:55.230 回答