0

我想解析html文档。我需要'h2'之后所有'p'的内容。

要解析的 html:(示例)

<h1>Lorem ipsum</h1>
<p>
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, 
</p>

<h2>Aenean commodo</h2>
<p>
    Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
</p>

<h2>consectetuer adipiscing</h2>
<p>
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, 
</p>

在这里,我想(动态地)获取最后两个“p”标签。


这是我的PHP代码:

$dom = new DOMDocument();
$dom->loadHTMLFile($html_file);
libxml_use_internal_errors(true);

$h2_tags = $dom->getElementsByTagName('h2');

foreach($h2_tags as $single_tag) {

     echo $single_tag->textContent;         
     print_r($single_tag);

}   

这只会给我 h2 的文本内容。但我需要h2之后的'p'。这是可能的还是我需要使用其他类?

4

2 回答 2

2

您可以尝试以下代码:

$dom = new DOMDocument();
$dom->loadHTMLFile($html_file);
libxml_use_internal_errors(true);

$xpath = new DomXPath($dom);
$nodeList = $xpath->evaluate('//p[preceding::h2]/text()');

foreach ($nodeList as $domElement){
   echo $domElement->textContent."<br><br>";
}

参考输出: http: //phpfiddle.org/main/code/7i5-3ir

于 2013-10-14T21:39:04.863 回答
0
<?php

$items = array();

$document = new DOMDocument;
@$document->loadHTMLFile('example.html');

foreach ($document->getElementsByTagName('h2') as $node) {
    while ($node = $node->nextSibling) {
        if ($node->nodeType == XML_ELEMENT_NODE) {
            if ($node->nodeName == 'p') {
                $items[] = $node->textContent;
            }

            break;
        }
    }
}

print_r($items);
于 2013-10-15T08:09:30.513 回答