xpath - xpath试图选择一个div内的内容，除了一个，包括文本

Question

我试图选择一个 div 内的内容，这个 div 里面有一些文本和一些额外的标签。我不想选择里面的第一个 div。我正在尝试使用此选择器，但只给了我标签，没有文字

//div[@class='contentDealDescriptionFacts cf']/div[@class='viewHalfWidthSize' and position()=2]/*[not(@class='subHeadline')]

给我带来问题的 div 是这个：

<div class="viewHalfWidthSize">
    .......
</div>

<div class="viewHalfWidthSize">
    <div class="subHeadline firefinder-match">The Fine Print</div> <----------Except this div I want everything inside of this div!!
    <strong class="firefinder-match">Validity: </strong>
    Expires 27 June 2013.
    <br class="firefinder-match">
    <strong class="firefinder-match">Purchase: </strong>
    Limit 1 per 2 people. May buy multiple as gifts.
    <br class="firefinder-match">
    <strong class="firefinder-match">Redemption: </strong>
    Booking required online at
    <a target="_blank" href="http://grouponbookings.co.uk/lautre-pied-march/"      class="firefinder-match">http://grouponbookings.co.uk/lautre-pied-march/</a>
. 48-hour cancellation policy; late cancellation incurs a £30 surcharge per person.
    <br class="firefinder-match">
    <strong class="firefinder-match">Further information: </strong>
    Valid Mon-Sun midday-2.45pm; Mon-Wed 6pm-10.45pm. Must be 18 or older, ID may be   requested. Valid only on set tasting menu only; menu is dependent on market changes and seasonality and is subject to change. Max. two hours seating time. Discretionary service charge will be added to the bill based on original price. Original value verified 19 March 2013 at 9.01am.
   <br class="firefinder-match">
   <a target="_blank" href="http://www.groupon.co.uk/universal-fine-print" style="color: #339933;" class="firefinder-match">See the rules</a>
that apply to all deals.
</div>

score 0 · Accepted Answer

匹配元素节点而*不是文本节点。尝试替换*以node()选择所有节点类型。

要分解您的 XPath 正在做什么：

您正在文档 ( //) 中的任何位置查找具有“contentDealDescriptionFacts cf”类的 div。

然后你正在寻找也有类的第二个 div viewHalfWidthSize。请注意，这不是具有该类的第二个 div，而是第二个 AND 具有该类的 div，因此如果具有该类的 div 是第三个和第四个，则它不会与具有该类的第二个 div 匹配position() = 4。如果你想要第二个viewHalfWidthSizediv，那么你会想要[@class='viewHalfWidthSize'][position()=2].

最后，您将返回没有 class 的所有元素的节点列表subHeadline。如果您更改为*，node()那么您将获得所有节点的节点列表。

以下 XPath：

//div[@class='contentDealDescriptionFacts cf']/div[@class='viewHalfWidthSize' and position()=2]/node()[not(name(.)='div' and position() = 1)]

只要第一个子节点是您要忽略的 div，就应该返回您想要的内容。

如果您将其更改为：

//div[@class='contentDealDescriptionFacts cf']/div[@class='viewHalfWidthSize' and position()=2]/node()[position() != count(../div[1]/preceding-sibling::node()) + 1]

那么它应该无论如何都可以工作。它返回您的节点列表，然后计算出第一个 div 之前有多少个前面的节点，并检查该位置是否大于该位置（即第一个 div 的位置）并将其从列表中排除。

作为另一种选择，您可以只修改原始解决方案，但not(@class='subHeadline')您不应该这样做

not(contains(concat(' ', @class, ' '), ' subHeadline '))

subHeadline假设您的类是空格分隔的，它将检查类属性是否包含字符串中的任何位置。然后，这将匹配您的具有类的片段"subHeadline firefinder-match"

xpath - xpath试图选择一个div内的内容，除了一个，包括文本

1 回答 1

Related

Reference