我在尝试提取div
XML 中两个标签之间的文本时遇到问题。
想象一下,我有以下 XML:
<div class="default_style_wrap" >
<!-- Body starts -->
<!-- Irrelvent Data -->
<div style="clear:both" />
<!-- Irrelvent Data -->
<div class="name_address" >...</div>
<!-- Irrelvent Data -->
<div style="clear:both" />
<!-- Irrelvent Data -->
<span class="img_comments_right" >...</span>
<!-- Text that I want to get -->
Two members of the Expedition 35 crew wrapped up a 6-hour, 38 minute spacewalk at 4:41 p.m. EDT Friday to deploy and retrieve several science experiments on the exterior of the International Space Station and install a new navigational aid.
<br />
<br />
The spacewalkers' first task was to install the Obstanovka experiment on the station's Zvezda service module. Obstanovka will study plasma waves and the effect of space weather on Earth's ionosphere.
<!-- Irrelvent Data Again -->
<span class="img_comments_right" >...</span>
<!-- Text that I want to get -->
After deploying a pair of sensor booms for Obstanovka, Vinogradov and Romanenko retrieved the Biorisk experiment from the exterior of Pirs. The Biorisk experiment studied the effect of microbes on spacecraft structures.
<br />
<br />
This was the 167th spacewalk in support of space station assembly and maintenance, totaling 1,055 hours, 39 minutes. Vinogradov's seven spacewalks total 38 hours, 25 minutes. Romanenko completed his first spacewalk.
<!-- Body ends -->
</div>
由于它在代码中可能没有反映,default_style_wrap
是所有其他不相关divs
和spans
. 与我相关的文本基本上是所有无标签文本,但正如您所见,中间还有其他标签,例如img_comments_right
,它让我发疯。
正如我在另一篇文章中看到的那样,我尝试了以下操作:
"//div[@class='article_container']/*[not(self::div)]";
但这似乎根本没有返回任何文本,即使返回,我也不知道如何排除spans
.
有任何想法吗?