r - 为什么 xpath 会再次找到排除的节点？

Question

考虑这个页面：

<n1 class="a">
  1
</n1>
<n1 class="b">
  <b>bold</b>
  2
</n1>

如果我首先选择第一个n1using class="a"，我应该排除第二个n1，这确实是真的：

library(rvest)
b_nodes = read_html('<n1 class="a">1</n1>
<n1 class="b"><b>bold</b>2</n1>') %>% 
  html_nodes(xpath = '//n1[@class="b"]')
b_nodes
# {xml_nodeset (1)}
# [1] <n1 class="b"><b>bold</b>2</n1>

但是，如果我们现在使用这个“子集”页面：

b_nodes %>% html_nodes(xpath = '//n1')
# {xml_nodeset (2)}
# [1] <n1 class="a">1</n1>
# [2] <n1 class="b"><b>bold</b>2</n1>

节点是如何1“重新发现”的？

注意：我知道如何使用两个单独的 xpath 获得我想要的东西。这是一个关于为什么“子集”没有按预期工作的概念性问题。我的理解是b_nodes应该完全排除第一个节点——b_nodes对象甚至不应该知道该节点存在。

score 2 · Accepted Answer

html_nodes(xpath = '//n1')

//的缩写/descendant-or-self::n1，当前节点是整个文档

将其更改为.//n1，.表示当前节点是您之前选择的节点

score 0 · Accepted Answer

我不确定您要做什么，但是，您为什么不尝试使用 foreach 遍历节点？我是说：

$XML = read_html('
<n1s>
<n1 class="a">1</n1>
<n1 class="b"><b>bold</b>2</n1></n1s>') %>%


$valueA = '';
$valueB = '';    
foreach ($XML->xpath('//n1') as $n1) {
        switch ((string)$n1['class']){
              case 'a':
                    $valueA = $XML->n1;
                     break;
              case 'b':
                    $valueB = $XML->n1;
                     break;
        }
    }

我希望这可以帮助你。问候！

r - 为什么 xpath 会再次找到排除的节点？

2 回答 2

Related

Reference