python - 如果孩子已经是 Beautiful Soup findall 中的容器，如何只选择父母？

Question

如果子项已包含在搜索结果中，如何让 BeautifulSoup 仅选择父项？由于我的逻辑可能会替换父标签，我不希望再次选择孩子。

 soup = BeautifulSoup(string)
        my_span_tags = soup.findAll('span', myattrib=re.compile(''))
        #Loop all span tags
        for each in my_span_tags:
            #replaceWith or replaceWithChildren as per requirement

HTML 示例

 <span myattrib="1"> Foo </span> works fine . 

 <span myattrib="1"> Foo <span myattrib="1"> Foo </span> </span>

当我对孩子做一些操作时会引起麻烦，因为例如父母已经改变并且它抛出 AttributeError: 'NoneType' object has no attribute 'index' 。关于该特定错误有一个问题： Problem using replaceWith to replace HTML tags with BeautifulSoup on Python

我的问题是如果已经在 BS 中选择了父母，如何排除孩子？

目前，选择看起来像一个 Python 列表：

[<span myattrib="1"> Foo <span myattrib="1"> Foo </span></span>(Parent with Child),<span myattrib="1"> Foo </span>(Child)]

注意我想避免的重复？

score 0 · Accepted Answer

如果您当前的选择选择了children，那么您可以使用.parentBeautifulSoup 中的方法来获取当前所选项目的父项。

python - 如果孩子已经是 Beautiful Soup findall 中的容器，如何只选择父母？

1 回答 1

Related

Reference