2

例子:

import bs4

html = '''
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
<p class="scroll-down">∨ <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> ∨&lt;/p></div>
'''
soup = bs4.BeautifulSoup(html)

我如何从中获得以下(一个美丽的汤对象)soup

<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
</div>
4

1 回答 1

4

只需搜索它:

soup.find('p', class_='scroll-down')

我使用该类来限制查找,但由于这里没有其他p元素有点多余。

相反,如果您需要删除标记,请使用上述方法首先找到它,然后调用.extract()它以将其从文档中删除:

>>> soup.find('p', class_='scroll-down').extract()
<p class="scroll-down"> <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> </p>
>>> print soup

<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
</div>

两件事:从.extract()方法中返回删除的标签,您可以将其保存以备后用。该标签已从文档中完全删除,如果您仍然需要它在文档中,则必须稍后手动重新添加它。

或者,您可以使用该.decompose()方法从文档中完全删除标签,而不返回引用。然后标签就永远消失了。

于 2012-11-21T12:50:42.843 回答