你可以试试这样的。它基本上完成了您在上面所做的事情 - 首先遍历所有section
-classed td
,然后遍历其中的所有span
文本。这会打印出类,以防万一您需要更加严格:
In [1]: from bs4 import BeautifulSoup
In [2]: html = # Your html here
In [3]: soup = BeautifulSoup(html)
In [4]: for td in soup.find_all('td', {'class': 'section'}):
...: for span in td.find_all('span'):
...: print span.attrs['class'], span.text
...:
['username'] xxUsername
['comment']
A test comment
或者使用比必要的更复杂的单线,将所有内容存储回您的列表中:
In [5]: results = [span.text for td in soup.find_all('td', {'class': 'section'}) for span in td.find_all('span')]
In [6]: results
Out[6]: [u'xxUsername', u'\nA test comment\n']
或者在同一个主题上,一个字典,键是类的元组,值是文本本身:
In [8]: results = dict((tuple(span.attrs['class']), span.text) for td in soup.find_all('td', {'class': 'section'}) for span in td.find_all('span'))
In [9]: results
Out[9]: {('comment',): u'\nA test comment\n', ('username',): u'xxUsername'}
假设这个更接近你想要的,我建议重写为:
In [10]: results = {}
In [11]: for td in soup.find_all('td', {'class': 'section'}):
....: for span in td.find_all('span'):
....: results[tuple(span.attrs['class'])] = span.text
....:
In [12]: results
Out[12]: {('comment',): u'\nA test comment\n', ('username',): u'xxUsername'}