如果子项已包含在搜索结果中,如何让 BeautifulSoup 仅选择父项?由于我的逻辑可能会替换父标签,我不希望再次选择孩子。
soup = BeautifulSoup(string)
my_span_tags = soup.findAll('span', myattrib=re.compile(''))
#Loop all span tags
for each in my_span_tags:
#replaceWith or replaceWithChildren as per requirement
HTML 示例
<span myattrib="1"> Foo </span> works fine .
<span myattrib="1"> Foo <span myattrib="1"> Foo </span> </span>
当我对孩子做一些操作时会引起麻烦,因为例如父母已经改变并且它抛出 AttributeError: 'NoneType' object has no attribute 'index' 。关于该特定错误有一个问题: Problem using replaceWith to replace HTML tags with BeautifulSoup on Python
我的问题是如果已经在 BS 中选择了父母,如何排除孩子?
目前,选择看起来像一个 Python 列表:
[<span myattrib="1"> Foo <span myattrib="1"> Foo </span></span>(Parent with Child),<span myattrib="1"> Foo </span>(Child)]
注意我想避免的重复?