使用以下代码如何解析 html 表格结果?可以在更前面找到 html 的示例。
import requests
from lxml import etree
import StringIO
def http_request():
try:
url = "http://somehost/somehtml.html"
r = requests.get(url, auth=("theUser", "thepass"))
r.encoding ='ISO-8859-1'
html = r.content
parse_result(html)
except requests.HTTPError, e:
return False
sys.exit(1)
def parse_result(result):
parser = etree.HTMLParser()
tree = etree.parse(StringIO.StringIO(result), parser)
# Here should be the logic to parse the html result :)
if __name__ == '__main__':
http_request()
这是html:
<!DOCTYPE html PUBLIC "-//W3C//Dtd XHTML 1.0 Strict//EN"
"http://www.w3.org/tr/xhtml1/Dtd/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux/x86 (vers 25 March 2009), see www.w3.org" />
<title></title>
</head>
<body>
<table border="1">
<tr>
<td valign="top"><B>name</B></td>
<td>result name a</td>
</tr>
<tr>
<td valign="top"><B>inUse</B></td>
<td>false</td>
</tr>
</table>
<table border="1">
<tr>
<td valign="top"><B>name</B></td>
<td>result name b</td>
</tr>
<tr>
<td valign="top"><B>inUse</B></td>
<td>false</td>
</tr>
</table>
<table border="1">
<tr>
<td valign="top"><B>name</B></td>
<td>result name c</td>
</tr>
<tr>
<td valign="top"><B>inUse</B></td>
<td>true</td>
</tr>
</table>
</body>
</html>
并且预期的结果将检索名称和inUse字段结果,即“结果名称”和“假”。