12

我很困惑为什么我不能使用urllib2从FriendFeed下载一些 JSON 响应的全部内容。

>>> import urllib2
>>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json')
>>> stream.headers['content-length']
'168928'
>>> data = stream.read()
>>> len(data)
61058
>>> # We can see here that I did not retrieve the full JSON
... # given that the stream doesn't end with a closing }
... 
>>> data[-40:]
'ce2-003048343a40","name":"Vincent Racani'

如何使用 urllib2 检索完整响应?

4

4 回答 4

18

获取所有数据的最佳方法:

fp = urllib2.urlopen("http://www.example.com/index.cfm")

response = ""
while 1:
    data = fp.read()
    if not data:         # This might need to be    if data == "":   -- can't remember
        break
    response += data

print response

原因是.read(),鉴于套接字的性质,不能保证返回整个响应。我认为这在文档中进行了讨论(也许urllib),但我找不到它。

于 2009-12-01T05:25:21.097 回答
4

使用tcpdump(或类似的东西)来监视实际的网络交互 - 然后您可以分析为什么某些客户端库会损坏该站点。确保通过编写测试脚本重复多次,以便查看问题是否一致:

import urllib2
url = 'http://friendfeed.com/api/room/friendfeed-feedback/profile?format=json'
stream = urllib2.urlopen(url)
expected = int(stream.headers['content-length'])
data = stream.read()
datalen = len(data)
print expected, datalen, expected == datalen

该网站一直为我工作,所以我不能举出发现失败的例子:)

于 2010-11-24T14:43:30.790 回答
2

继续调用 stream.read() 直到它完成......

while data = stream.read() :
    ... do stuff with data
于 2009-12-01T04:54:09.603 回答
0
readlines() 

也有效

于 2009-12-01T05:03:03.263 回答