python - .next_element 和 .previous_element BeautifulSoup4 的概念冲突

Question

我刚刚浏览了 B4 文档并获得了一些Going back and forth关于html family tree.

last_a_tag = soup.find("a", id="link3")
last_a_tag
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
last_a_tag.next_element
# u'Tillie'  
last_a_tag.previous_element
# u' and\n' ## upto this is Good to understand!
last_a_tag.previous_element.next_element
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

冲突在这里浮现在我的脑海中。根据.Previous_element概念last_a_tag.previous_element.next_elementt应该给出<a class="sister" href="http://example.com/tillie" id="link3">但为什么如上所示的完整？

编辑

last_a_tag.previous_element
# u' and\n'  <~~Perfect
last_a_tag.previous_element.next_element
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

为什么不到下面呢？

#<a class="sister" href="http://example.com/tillie" id="link3">

怎么到下面的部分？ Tillie</a> <~~这里是混乱

帮助我理解。

score 2 · Accepted Answer

您仍在查看对标签的引用，当它被打印时，它所包含的所有子项也会被打印。

标签不仅仅是开始<a ...>元素，它还包括任何子元素和结束元素。例如，您仍然需要通过.next_element（这将是u'Tillie'）来接触树中的那些孩子。

在树中导航不会在打开和关闭文本片段之间移动，而是在树中的元素之间移动。原始 XML/HTML 文档以特定顺序定义了这些元素，但这不是您在此处查看的内容。您正在查看标签的嵌套结构和适合其他标签的文本，一直到根。

所以下面的 HTML 结构：

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

变成一个结构如下：

p
\
  a
  \
    "Elsie"
  ", "
  a
  \
    "Lacie"
  " and "
  a
  \
    "Tillie"
  "; and they lived at the bottom of a well."

（简化以删除大量空格）。

如果您有对最后一个a元素的引用，则该集合中的前一个元素是 text " and "，下一个是"Tillie". 之后"Tillie"是正文"; and they lived at the bottom of a well."。在文本" and "出现之前，文本"Lacie"等。

python - .next_element 和 .previous_element BeautifulSoup4 的概念冲突

1 回答 1

Related

Reference