python - 使用 urllib2 将大型二进制文件流式传输到文件

Question

我使用以下代码将大文件从 Internet 流式传输到本地文件：

fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
    fp.write(line)
fp.close()

这可行，但下载速度很慢。有更快的方法吗？（文件很大，所以我不想将它们保存在内存中。）

score 112 · Accepted Answer

没有理由逐行工作（小块并且需要 Python 为您找到行尾！-），只需将其分成更大的块，例如：

# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3

response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
    while True:
        chunk = response.read(CHUNK)
        if not chunk:
            break
        f.write(chunk)

尝试使用各种 CHUNK 大小来找到满足您要求的“最佳位置”。

score 69 · Accepted Answer

你也可以使用shutil：

import shutil
try:
    from urllib.request import urlopen # Python 3
except ImportError:
    from urllib2 import urlopen # Python 2

def get_large_file(url, file, length=16*1024):
    req = urlopen(url)
    with open(file, 'wb') as fp:
        shutil.copyfileobj(req, fp, length)

score 6 · Accepted Answer

我曾经使用mechanize模块及其 Browser.retrieve() 方法。过去它占用 100% 的 CPU 并且下载的东西非常缓慢，但最近的一些版本修复了这个错误并且运行得非常快。

例子：

import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')

Mechanize 是基于 urllib2 的，所以 urllib2 也可以有类似的方法……但我现在找不到。

score 4 · Accepted Answer

您可以使用 urllib.retrieve() 下载文件：

例子：

try:
    from urllib import urlretrieve # Python 2

except ImportError:
    from urllib.request import urlretrieve # Python 3

url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")

python - 使用 urllib2 将大型二进制文件流式传输到文件

4 回答 4

Related

Reference