1

这对我有用:


import xml.etree.ElementTree as ET
from urllib2 import urlopen

url = 'http://example.com'
# this url points to a `xml` page
tree = ET.parse(urlopen(url))

但是,当我切换到 时requests,出现了问题:


import requests
import xml.etree.ElementTree as ET
url = 'http://example.com'
# this url points to a `xml` page
tree = ET.parse(requests.get(url))

引用错误如下所示:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
----> 1 tree = ET.parse(requests.get(url, proxies={'http': '192.168.235.36:7788'}))

/usr/lib/python2.7/xml/etree/ElementTree.py in parse(source, parser)
   1180 def parse(source, parser=None):
   1181     tree = ElementTree()
-> 1182     tree.parse(source, parser)
   1183     return tree
   1184 

/usr/lib/python2.7/xml/etree/ElementTree.py in parse(self, source, parser)
    645         close_source = False
    646         if not hasattr(source, "read"):
--> 647             source = open(source, "rb")
    648             close_source = True
    649         try:

TypeError: coercing to Unicode: need string or buffer, Response found


所以,我的问题是:我的情况有什么问题requests,我怎样才能让它工作?ETrequests

4

2 回答 2

3

您正在将requests 响应对象传递给 ElementTree;你想传入原始文件对象

r = requests.get(url, stream=True)
ET.parse(r.raw)

.raw返回“类文件”套接字对象,从中ElementTree.parse()读取,就像从urllib2响应中读取一样(它本身就是一个类文件对象)。

具体例子:

>>> r = requests.get('http://www.enetpulse.com/wp-content/uploads/sample_xml_feed_enetpulse_soccer.xml', stream=True)
>>> tree = ET.parse(r.raw)
>>> tree
<xml.etree.ElementTree.ElementTree object at 0x109dadc50>
>>> tree.getroot().tag
'spocosy'

如果您有一个压缩 URL,原始套接字(如urllib2)返回未解码的压缩数据;在这种情况下,您可以在二进制响应内容ET.fromstring()上使用该方法:

r = requests.get(url)
ET.fromstring(r.content)
于 2013-06-05T07:15:18.953 回答
0

您不是为 ElementTree 提供响应文本,而是requests Response对象本身,这就是您收到类型错误的原因:need string or buffer, Response found。改为这样做:

r = requests.get(url)
tree = ET.fromstring(r.text)
于 2013-06-05T07:14:22.747 回答