2

我正在使用以下方法打开一个 URL:

response = urllib2.urlopen(url, data, timeout=_TIMEOUT)

并使用response.read(),它给出以下输出:

<XMLlookup licenseid="X4X6X42" reason="OK" status="1" />

但是当我想使用 ElementTree 解析它时,如下所示:

print response.read()
t = ET.parse(response)
r = t.getroot()
print r.attrib.get('status')

给我以下错误信息:

File "<string>", line 62, in parse
File "<string>", line 38, in parse
cElementTree.ParseError: no element found: line 1, column 0

但是当我删除该行时response.read(),代码可以正常工作。我究竟做错了什么?

4

3 回答 3

5

您只能读取一次响应,因为它是一个文件对象(实际上是一个 addinfourl)。read 由于您总是阅读整个文本,因此后续调用将返回一个空字符串。

read因此,要么在 using 之前不调用ET.parse(response),要么将结果存储在字符串中并将其用于 ET :

txt = response.read()
# do what you want with txt (without changing it)
t = ET.fromstring(txt)
于 2014-11-11T22:06:08.630 回答
4

You need to use:

t = ET.fromstring(response.read())
于 2014-11-11T22:02:33.127 回答
2

instead of

response.read()
t = ET.parse(response)
r = t.getroot()

try

resp = response.read()
t = ET.fromstring(resp)
r = t.getroot()

or

t = ET.fromstring(response.read())
r = t.getroot()

Also, you should note that not all HTML is parsable as XML. If your request returns XHTML then you will be fine, but otherwise you will get a very similar error to what you are seeing.

于 2014-11-11T22:01:28.860 回答