3

有没有办法抓取所有具有部分匹配的 id 的元素。例如,如果我想获取网页上的所有 HTML 元素,其 id 属性以开头msg_但之后可能是任何内容。

这是我到目前为止所做的:

$doc = new DomDocument;

// We need to validate our document before refering to the id
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('{URL IS HERE}'));
foreach($doc->getElementById('msg_') as $element) { 
   foreach($element->getElementsByTagName('a') as $link)
   {
      echo $link->nodeValue . "\n";
   }
}

但我需要弄清楚如何用这个位进行部分 id 匹配: $doc->getElementById('msg_')或者是否有其他方法来完成这个......??

基本上,我需要获取所有 'a' 标签,它们是 id 开头的元素的子元素,从msg_ 技术上讲,总是只有 1 个a标签,但我不知道如何只获取第一个子元素,即为什么我也在使用 foreach 。

DomDocument PHP 类可以做到这一点吗?

这是我现在使用的代码,它也不起作用:

$str = '';
$filename = 'http://dream-portal.net/index.php/board,65.0.html';
@set_time_limit(0);

$fp = fopen($filename, 'rb');
while (!feof($fp))
{
    $str .= fgets($fp, 16384);
}
fclose($fp);

$doc = new DOMDocument();
$doc->loadXML($str);

$selector = new DOMXPath($doc);

$elements = $selector->query('//row[starts-with(@id, "msg_")]');

foreach ($elements as $node) {
    var_dump($node->nodeValue) . PHP_EOL;
}

HTML如下(在span标签中):

<td class="subject windowbg2">
<div>
  <span id="msg_6555">
    <a href="http://dream-portal.net/index.php?topic=834.0">Poll 1.0</a>
  </span>
  <p>
    Started by 
    <a href="http://dream-portal.net/index.php?action=profile;u=1" title="View the profile of SoLoGHoST">SoLoGHoST</a>
    <small id="pages6555">
      « 
      <a class="navPages" href="http://dream-portal.net/index.php?topic=834.0">1</a>
      <a class="navPages" href="http://dream-portal.net/index.php?topic=834.15">2</a>
        »
    </small>

                        with 963 Views

  </p>
</div>
</td>

这是<span id="msg_一部分,其中有很多(HTML 页面上至少有 15 个)。

4

1 回答 1

4

用这个:

$str = file_get_contents('http://dream-portal.net/index.php/board,65.0.html');

$doc = new DOMDocument();
@$doc->loadHTML($str);

$selector = new DOMXPath($doc);

foreach ($selector->query('//*[starts-with(@id, "msg_")]') as $node) {
    var_dump($node->nodeValue) . PHP_EOL;
}

给你:

string(8) "Poll 1.0"
string(12) "Shoutbox 2.2"
string(24) "Polaroid Attachments 1.6"
string(24) "Featured News Slider 1.3"
string(17) "Image Resizer 1.0"
string(8) "Blog 2.2"
string(13) "RSS Feeds 1.0"
string(19) "Adspace Manager 1.2"
string(21) "Facebook Like Box 1.0"
string(15) "Price Table 1.0"
string(13) "SMF Links 1.0"
string(19) "Download System 1.2"
string(16) "[*]Site News 1.0"
string(12) "Calendar 1.3"
string(16) "Page Peel Ad 1.1"
string(20) "Sexy Bookmarks 1.0.1"
string(15) "Forum Staff 1.2"
string(21) "Facebook Comments 1.0"
string(15) "Attachments 1.4"
string(25) "YouTube Channels 0.9 Beta"
于 2013-04-27T03:27:33.733 回答