0

我有一个看起来像这样的页面:

...
<div class="container">

<div class="info">
<h3>Info 1</h3>
<span class="title">Title for Info 1</span>
<a href="http://www.example.com/1">Link to Example 1</a>
</div> <!-- /info -->

<div class="info">
<h3>Info 2</h3>
<span class="title">Title for Info 2</span>
<a href="http://www.example.com/2">Link to Example 2</a>
</div> <!-- /info -->

<div class="info">
<h3>Info 3</h3>
<span class="title">Title for Info 3</span>
<a href="http://www.example.com/3">Link to Example 3</a>
</div> <!-- /info -->

</div> <!-- /container -->
...

每个 info 类 div 的结构都是相同的,我希望能够遍历文档并为每个具有类 info 的 div 将各种组件解析为数组或单个变量以达到目的以某种人类可读的格式输出数据,例如 csv 文件或 HTML 表格。

我尝试过使用 DOMDocument 方法,并使用 getElementByTagName 来提取每个标签的内容,但是由于 div 包含多种标签类型(h3、a、span),我还没有弄清楚如何完成我正在寻找的内容去做。

最后,我希望能够以如下格式放置数据:

divclass, h3, spanclass, spantitle, ahref, a
info, Info 1, title, Title for Info 1, http://www.example.com/1, Link to Example 1
...

谢谢!

4

1 回答 1

4
<?php
$html = '
<div class="container">

<div class="info">
<h3>Info 1</h3>
<span class="title">Title for Info 1</span>
<a href="http://www.example.com/1">Link to Example 1</a>
</div> <!-- /info -->

<div class="info">
<h3>Info 2</h3>
<span class="title">Title for Info 2</span>
<a href="http://www.example.com/2">Link to Example 2</a>
</div> <!-- /info -->

<div class="info">
<h3>Info 3</h3>
<span class="title">Title for Info 3</span>
<a href="http://www.example.com/3">Link to Example 3</a>
</div> <!-- /info -->

</div> <!-- /container -->
';


$dom_document = new DOMDocument();

$dom_document->loadHTML($html);
$dom_document->preserveWhiteSpace = false;

//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

$elements = $dom_xpath->query("//*[@class='info']");

if (!is_null($elements)) {

  foreach ($elements as $element) {
    echo "\n[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }

  }
}
于 2012-05-05T03:07:01.160 回答