我有一些看起来像这样的html:
<tr>
<td>some text</td>
<td>some other text</td>
<td>some <b>problematic</b> other <br /> text</td>
</tr>
和一些试图获取标签值并打印每个内部值的python:
soup = BeautifulSoup(data, convertEntities=BeautifulSoup.HTML_ENTITIES)
for row in soup.findAll('tr'):
print repr(row) # this prints the whole 'tr' element text just fine.
for col in row.contents:
print col.string
所以全文正确打印捕获的html,但'col'为最后一个元素打印None:
some text
some other text
None
我不熟悉 BeatifulSoup 或 python,但似乎最后一个元素的内部标签导致解析问题?
谢谢