python - 如何仅下载前 x 个字节的数据 Python

Question

情况：要下载的文件是一个大文件（>100MB）。这需要相当长的时间，尤其是在互联网连接缓慢的情况下。

问题：但是，我只需要文件头（前 512 个字节），它将决定是否需要下载整个文件。

问题：有没有办法只下载文件的前 512 个字节？

附加信息：目前在 Python2.7 中使用 urllib.urlretrieve 完成下载

score 2 · Accepted Answer

我认为curl并且head会比这里的 Python 解决方案更好：

curl https://my.website.com/file.txt | head -c 512 > header.txt

编辑：另外，如果您绝对必须在 Python 脚本中使用它，您可以使用管道subprocess执行命令执行curlhead

编辑 2：对于一个完整的 Python 解决方案：该urlopen函数（urllib2.urlopen在 Python 2 和urllib.request.urlopenPython 3 中）返回一个类似文件的流，您可以在该流上使用该read函数，它允许您指定字节数。例如，urllib2.urlopen(my_url).read(512)将返回的前 512 个字节my_url

score 0 · Accepted Answer

如果您尝试读取的 url 以标头响应，那么您可以在 Python 2Content-Length中获取文件大小。urllib2

def get_file_size(url):
    request = urllib2.Request(url)
    request.get_method = lambda : 'HEAD'
    response = urllib2.urlopen(request)
    length = response.headers.getheader("Content-Length")
    return int(length)

可以调用该函数来获取长度并与某个阈值进行比较来决定是否下载。

if get_file_size("http://stackoverflow.com") < 1000000:
    # Download

（请注意，Python 3 的实现略有不同：）

from urllib import request

def get_file_size(url):
    r = request.Request(url)
    r.get_method = lambda : 'HEAD'
    response = request.urlopen(r)
    length = response.getheader("Content-Length")
    return int(length)

python - 如何仅下载前 x 个字节的数据 Python

2 回答 2

Related

Reference