我正在使用 DOMXPath 来获取特定节点的内容。对于我的问题,我想获取除嵌套 div 之外的匹配 div 的所有文本。
$html =
'<div itemscope="itemscope" itemtype="http://schema.org/Event">
<span itemprop="name"> Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1)</span>
<meta itemprop="startDate" content="2016-04-21">
Thu, 04/21/16
8:00 p.m
<div itemprop="offers" itemscope="itemscope" itemtype="http://schema.org/AggregateOffer">
Priced from: <span itemprop="lowPrice">$35</span>
<span itemprop="offerCount">1938</span> tickets left
</div>
<meta itemprop="endDate" content="2020-3-2"> end date of year
<div itemprop="attendee" itemscope="itemscope" itemtype="http://schema.org/Person">
<span itemprop="name">Jane Doe</span>
<meta itemprop="birthDate" content="1975-05-06">
<div itemprop="sibling" itemscope="itemscope" itemtype="http://schema.org/Person">
<span itemprop="name">Fatima Zohra</span>
<meta itemprop="birthDate" content="1991-6-5">Jan 6
</div>
</div>
</div>';
我首先尝试了以下方法,但这并没有返回嵌套的 div:
$tags = $xpath->query("//div[@itemscope='itemscope'][not(self::div)]/text()");
我目前的尝试如下,但不起作用:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[not(ancestor::div)]');
foreach ($tags as $node) {
echo $node->nodeValue; // body
}