11

如何br在以下行中的标签后提取文本:

<div id='population'>
    The Snow Leopard Survival Strategy (McCarthy <em>et al.</em> 2003, Table
    II) compiled national snow leopard population estimates, updating the work
    of Fox (1994). Many of the estimates are acknowledged to be rough and out
    of date, but the total estimated population is 4,080-6,590, as follows:<br>
    <br>
    Afghanistan: 100-200?<br>
    Bhutan: 100-200?<br>
    China: 2,000-2,500<br>
    India: 200-600<br>
    Kazakhstan: 180-200<br>
    Kyrgyzstan: 150-500<br>
    Mongolia: 500-1,000<br>
    Nepal: 300-500<br>
    Pakistan: 200-420<br>
    Russia: 150-200<br>
    Tajikistan: 180-220<br>
    Uzbekistan: 20-50
</div>

我做到了:

xpathSApply(h, '//div[@id="population"]', xmlValue)

但我现在被困住了......

4

1 回答 1

33

如果您意识到文本也是一个节点,它会有所帮助。div 中的所有文本<br/>都可以通过以下方式检索:

//div[@id="population"]/text()[preceding-sibling::br]

从技术上讲,标签之间 <br/>意味着:

//div[@id="population"]/text()[preceding-sibling::br and following-sibling::br]

...但我想这不是你现在想要的。

于 2012-06-28T20:54:40.997 回答