-1

我在尝试提取divXML 中两个标签之间的文本时遇到问题。

想象一下,我有以下 XML:

<div class="default_style_wrap" >

<!-- Body starts -->
    <!-- Irrelvent Data -->
    <div style="clear:both" />
    <!-- Irrelvent Data -->
    <div class="name_address" >...</div>
    <!-- Irrelvent Data -->
    <div style="clear:both" />
    <!-- Irrelvent Data -->
    <span class="img_comments_right" >...</span>

    <!-- Text that I want to get -->
Two members of the Expedition 35 crew wrapped up a 6-hour, 38 minute spacewalk at 4:41 p.m. EDT Friday to deploy and retrieve several science experiments on the exterior of the International Space Station and install a new navigational aid.
    <br />
    <br />
The spacewalkers' first task was to install the Obstanovka experiment on the station's Zvezda service module. Obstanovka will study plasma waves and the effect of space weather on Earth's ionosphere.

    <!-- Irrelvent Data Again -->
    <span class="img_comments_right" >...</span>
    <!-- Text that I want to get -->
After deploying a pair of sensor booms for Obstanovka, Vinogradov and Romanenko retrieved the Biorisk experiment from the exterior of Pirs. The Biorisk experiment studied the effect of microbes on spacecraft structures.
    <br />
    <br />
This was the 167th spacewalk in support of space station assembly and maintenance, totaling 1,055 hours, 39 minutes. Vinogradov's seven spacewalks total 38 hours, 25 minutes. Romanenko completed his first spacewalk.
    <!-- Body ends -->
</div>

由于它在代码中可能没有反映,default_style_wrap是所有其他不相关divsspans. 与我相关的文本基本上是所有无标签文本,但正如您所见,中间还有其他标签,例如img_comments_right,它让我发疯。

正如我在另一篇文章中看到的那样,我尝试了以下操作:

"//div[@class='article_container']/*[not(self::div)]";

但这似乎根本没有返回任何文本,即使返回,我也不知道如何排除spans.

有任何想法吗?

4

4 回答 4

0

解决方案:

您可以使用or运算符为运算符指定多个条件,not如下所示:

not(expr1 or expr2)

因此,您可以添加self::span另一个条件not来将它们从结果中排除;

//div[@class='default_style_wrap']/*[not(self::div or self::span)]

divPS:标签关闭不当似乎存在问题。以适当的方式关闭它们。

于 2021-07-28T15:19:26.307 回答
-1

您应该能够使用此 XPath 获取文本:

div[@class = 'default_style_wrap']/text()[normalize-space()]

它选择所有text()作为 *default_style_wrap* 子节点的节点<div>,过滤掉空(或仅空白)节点。

如果您使用单独的模板,您可以将每个块整齐地放在自己的段落中,例如:

<xsl:template match="/">
    <xsl:apply-templates select="div[@class = 'default_style_wrap']/text()[normalize-space()]" />
</xsl:template>

<xsl:template match="text()">
    <p><xsl:value-of select="." /></p>
</xsl:template>
于 2013-04-25T22:29:48.387 回答
-1

您可以使用此 xpath:

//div[@class='default_style_wrap']/text()
于 2013-04-25T22:16:10.177 回答
-1

您应该尝试以下查询。它选择节点的所有以下兄弟姐妹<span>,它们是文本节点

query = '//span[@class="img_comments_right"]/following-sibling::text()';
于 2013-04-20T03:19:34.787 回答