python - 在 Python 2.7 中打开 URL 时返回乱码文本

Question

我想打开一个 StackExchange API（搜索端点）URL 并解析结果 [0]。文档说所有结果都是 JSON 格式 [1]。我在我的网络浏览器中打开了这个 URL，结果非常好 [2]。但是，当我尝试使用 Python 程序打开它时，它会返回我无法解析的编码文本。这是一个片段

á¬ôŸ?ÍøäÅ€ˆËç?bçÞIË
¡ëf)j´ñ‚TF8¯KÚpr®´Ö©iUizEÚD +¦¯÷tgNÈÃ‘.G¾LPUç?Ñ‘Ù~]ŒäÖÂ9Ÿð1£µ$JNóa?Z&Ÿtž'³Ðà#Í°¬õÅj5ŸE÷*æJî”Ï&gt;íÓé’çÔqQI’†ksS™¾þEíqÝýly

我打开 URL 的程序如下。我做错了什么？

''' Opens a URL and returns the result '''
def open_url(query):
    request = urllib2.Request(query)
    response = urllib2.urlopen(request)
    text = response.read()
    #results = json.loads(text)
    print text


title = openRawResource, AssetManager.AssetInputStream throws IOException on read of larger files


page1_query = stackoverflow_search_endpoint % (1,urllib.quote_plus(title),access_token,key)

[0] https://api.stackexchange.com/2.1/search/advanced?page=1&pagesize=100&order=desc&sort=relevance&q=openRawResource%2C+AssetManager.AssetInputStream+throws+IOException+on+read+of+larger+files&site =stackoverflow&access_token= ******&key=******

[1] https://api.stackexchange.com/docs

[2] http://hastebin.com/qoxaxahaxa.sm

灵魂

我找到了解决方案。下面是你将如何做到的。

request = urllib2.Request(query)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
    buf = StringIO( response.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()
    result = json.loads(data)

无法发布完整的输出，因为它太大了。非常感谢 Evert 和 Kristaps 指出有关解压缩和在请求中设置标头的问题。此外，另一个类似的问题需要研究 [3]。

[3] python urllib2 会自动解压从网页获取的 gzip 数据吗？

score 2 · Accepted Answer

The next paragraph of the documentation says:

Additionally, all API responses are compressed. The Content-Encoding header is always set, but some proxies will strip this out. The proper way to decode API responses can be found here.

Your output does look like it may be compressed. Browsers automatically decompress data (depending on the Content-Encoding), so you would need to look at the header and do the same: results = json.loads(zlib.decompress(text)) or something similar.

Do check the here link as well.

score 1 · Accepted Answer

我找到了解决方案。下面是你将如何做到的。

request = urllib2.Request(query)
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
    buf = StringIO( response.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()
    result = json.loads(data)

无法发布完整的输出，因为它太大了。非常感谢 Evert 和 Kristaps 指出有关解压缩和在请求中设置标头的问题。此外，另一个类似的问题需要研究 [1]。

[1] python urllib2 会自动解压从网页获取的 gzip 数据吗？

python - 在 Python 2.7 中打开 URL 时返回乱码文本

2 回答 2

Related

Reference