python-3.x - 如何使用 urllib3 下载文件？

Question

这是基于此站点上的另一个问题：What's the best way to download file using urllib3 但是，我不能在那里发表评论，所以我问了另一个问题：

如何使用 urllib3 下载（更大的）文件？

我尝试使用与 urllib2 相同的代码（在 Python 3 中从 web 下载文件），但使用 urllib3 失败：

http = urllib3.PoolManager()

with http.request('GET', url) as r, open(path, 'wb') as out_file:       
    #shutil.copyfileobj(r.data, out_file) # this writes a zero file
    shutil.copyfileobj(r.data, out_file)

这表示“字节”对象没有“读取”属性

然后我尝试使用该问题中的代码，但它陷入了无限循环，因为数据始终为“0”：

http = urllib3.PoolManager()
r = http.request('GET', url)

with open(path, 'wb') as out:
    while True:
        data = r.read(4096)         
        if data is None:
            break
        out.write(data)
r.release_conn()

但是，如果我读取内存中的所有内容，则文件会正确下载：

http = urllib3.PoolManager()
r = http.request('GET', url)
with open(path, 'wb') as out:
    out.write(data)

我不想这样做，因为我可能会下载非常大的文件。不幸的是，urllib 文档没有涵盖本主题中的最佳实践。

（另外，请不要建议 requests 或 urllib2，因为它们在自签名证书方面不够灵活。）

score 11 · Accepted Answer

您非常接近，缺少的部分正在设置preload_content=False（这将是即将发布的版本中的默认设置）。您还可以将响应视为类似文件的对象，而不是.data属性（这是一个神奇的属性，希望有一天会被弃用）。

- with http.request('GET', url) ...
+ with http.request('GET', url, preload_content=False) ...

此代码应该可以工作：

http = urllib3.PoolManager()

with http.request('GET', url, preload_content=False) as r, open(path, 'wb') as out_file:       
    shutil.copyfileobj(r, out_file)

urllib3 的响应对象也尊重io接口，所以你也可以做...

import io
response = http.request(..., preload_content=False)
buffered_response = io.BufferedReader(response, 2048)

只要您添加preload_content=False到您的三个尝试中的任何一个并将响应视为类似文件的对象，它们都应该工作。

不幸的是，urllib 文档没有涵盖本主题中的最佳实践。

你完全正确，我希望你会考虑通过在此处发送拉取请求来帮助我们记录这个用例：https ://github.com/shazow/urllib3

python-3.x - 如何使用 urllib3 下载文件？

1 回答 1

Related

Reference