python - 如何找到所有
的特定范围内
班级？

Question

环境：

美丽的汤 4

Python 2.7.5

逻辑：

'find_all'<li>实例在<ul>一个类中，my_class例如：

<ul class='my_class'>
<li>thing one</li>
<li>thing two</li>
</ul>

澄清：只需获取<li>标签之间的“文本”。

蟒蛇代码：

（下面的 find_all 不正确，我只是把它放在上下文中）

from bs4 import BeautifulSoup, Comment
import re

# open original file
fo = open('file.php', 'r')
# convert to string
fo_string = fo.read()
# close original file
fo.close()
# create beautiful soup object from fo_string
bs_fo_string = BeautifulSoup(fo_string, "lxml")
# get rid of html comments
my_comments = bs_fo_string.findAll(text=lambda text:isinstance(text, Comment))
[my_comment.extract() for my_comment in my_comments]

my_li_list = bs_fo_string.find_all('ul', 'my_class')

print my_li_list

score 18 · Accepted Answer

这个？

>>> html = """<ul class='my_class'>
... <li>thing one</li>
... <li>thing two</li>
... </ul>"""
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(html)
>>> for ultag in soup.find_all('ul', {'class': 'my_class'}):
...     for litag in ultag.find_all('li'):
...             print litag.text
... 
thing one
thing two

解释：

soup.find_all('ul', {'class': 'my_class'})查找所有ul类别为的标签my_class。

然后我们找到这些li标签中的所有ul标签，并打印标签的内容。

score 2 · Accepted Answer

这可以使用 BeautifulSoup3，这台机器上没有 4。

>>> [li.string for li in bs_fo_string.find('ul', {'class': 'my_class'}).findAll('li')]
[u'thing one', u'thing two']

这个想法是首先搜索具有“my_class”类的 ul，然后在该 ul 中查找所有 li。

如果您有其他具有相同类的 ul，您可能还想在 ul 搜索中使用 findAll，并将列表理解更改为嵌套。

python - 如何找到所有的特定范围内班级？

2 回答 2

解释：

Related

Reference

python - 如何找到所有
的特定范围内
班级？