bash - 如何选择元素后面的文本？

Question

我有以下xmllint示例选择一个元素：

$ curl -s http://lists.opencsw.org/pipermail/users/2015-January/date.html |
xmllint --html --xpath '/html/body/p/b[contains(., "Messages:")]' -
<b>Messages:</b>

粗体元素后面是我感兴趣的消息数量。当我使用parent轴时显示：

$ curl -s http://lists.opencsw.org/pipermail/users/2015-January/date.html |
xmllint --html --xpath '/html/body/p/b[contains(., "Messages:")]/parent::*' -
<p><b>Starting:</b> <i>Thu Jan  1 23:17:09 CET 2015</i><br><b>Ending:</b> <i>Sat Jan 31 14:51:07 CET 2015</i><br><b>Messages:</b> 28</p>

我认为following-sibling轴可能会给我这个数字，但它没有这样做：

$ curl -s http://lists.opencsw.org/pipermail/users/2015-January/date.html |
xmllint --html --xpath '/html/body/p/b[contains(., "Messages:")]/following-sibling::*' -
XPath set is empty

score 2 · Accepted Answer

您所追求的这个文本节点确实是一个后续兄弟节点，但它是一个文本节点，而不是一个元素节点。像这样的表达

following-sibling::*

仅查找以下兄弟姐妹元素的以下兄弟姐妹。要匹配文本节点，请使用text()：

$ curl -s http://lists.opencsw.org/pipermail/users/2015-January/date.html |
xmllint --html --xpath '/html/body/p/b[contains(., "Messages:")]/following-sibling::text()'

上面的命令在我的电脑上不起作用，在 Mac OS X 上使用 bash - 但我相信它对你有用。如果我首先保存结果curl然后使用

$ xmllint example.html --html --xpath '/html/body/p/b[contains(., "Messages:")]/following-sibling::text()'

结果是_28。那不是真正的下划线，而是我想指出的空白。要删除前导空格，请使用

$ xmllint example.html --html --xpath 'normalize-space(/html/body/p/b[contains(., "Messages:")]/following-sibling::text())'

不，使用正则表达式并不是一个真正的选择。

bash - 如何选择元素后面的文本？

1 回答 1

Related

Reference