python - 从 HTML 中读取元素 -

Question

我有以下 HTML：

<tr style='background:#DDDDDD;'>
    <td><b>ASD</b></td>
    <td colspan='3'>1231</td>
</tr>

这个元素在页面上没有重复，所以它是独一无二的。我想将单元格的内容（1231）放入某个变量中。我尝试使用 HTML.parser 但它不起作用

score 0 · Accepted Answer

看用beautifulsoup 很棒，

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html) ## feed your html page to beautifulsoup

pleaseFind = soup.find(text="ASD")

whatINeed = pleaseFind.findNext('td')

print whatINeed.text

score 0 · Accepted Answer

您可以使用 urllib2（您不必安装任何新的东西（至少对于 Windows 版本的 Python））：http ://docs.python.org/2/howto/urllib2.html

例子：

import urllib2
response = urllib2.urlopen('your URL')
html = response.read()
#html is a string containing everything on your page

#this line (it could be a bit cleaner) finds substring "<td colspan='3'>" and
#searches between it's position and the next "</td>"
pos=html.find("<td colspan='3'>")
print html[pos+len("<td colspan='3'>")+1:html.find("</td>", pos))]

python - 从 HTML 中读取元素 -

2 回答 2

Related

Reference