python - ElementTree XML 解析和 urllib2.urlopen

Question

我正在使用以下方法打开一个 URL：

response = urllib2.urlopen(url, data, timeout=_TIMEOUT)

并使用response.read()，它给出以下输出：

<XMLlookup licenseid="X4X6X42" reason="OK" status="1" />

但是当我想使用 ElementTree 解析它时，如下所示：

print response.read()
t = ET.parse(response)
r = t.getroot()
print r.attrib.get('status')

给我以下错误信息：

File "<string>", line 62, in parse
File "<string>", line 38, in parse
cElementTree.ParseError: no element found: line 1, column 0

但是当我删除该行时response.read()，代码可以正常工作。我究竟做错了什么？

score 5 · Accepted Answer

您只能读取一次响应，因为它是一个文件对象（实际上是一个 addinfourl）。read 由于您总是阅读整个文本，因此后续调用将返回一个空字符串。

read因此，要么在 using 之前不调用ET.parse(response)，要么将结果存储在字符串中并将其用于 ET ：

txt = response.read()
# do what you want with txt (without changing it)
t = ET.fromstring(txt)

score 4 · Accepted Answer

4

You need to use:

t = ET.fromstring(response.read())

于 2014-11-11T22:02:33.127 回答

score 2 · Accepted Answer

instead of

response.read()
t = ET.parse(response)
r = t.getroot()

try

resp = response.read()
t = ET.fromstring(resp)
r = t.getroot()

or

t = ET.fromstring(response.read())
r = t.getroot()

Also, you should note that not all HTML is parsable as XML. If your request returns XHTML then you will be fine, but otherwise you will get a very similar error to what you are seeing.

python - ElementTree XML 解析和 urllib2.urlopen

3 回答 3

Related

Reference