如何寻找远程 (HTTP) 文件的特定位置,以便只能下载该部分?
我想寻找 4 并从那里下载 3 个字节,所以我将拥有:456
还有,如何检查远程文件是否存在?我试过了, os.path.isfile() 但是当我传递远程文件 url 时它返回 False。
如果您通过 HTTP 下载远程文件,则需要设置Range
myUrlclass.addheader("Range","bytes=%s-" % (existSize))
class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.
This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the
"206 Partial Content" reponse from the HTTP server is what we
import urllib2
import byterange
range_handler = range.HTTPRangeHandler()
opener = urllib2.build_opener(range_handler)
# install it
# create Request and set Range header
req = urllib2.Request('http://www.python.org/')
req.header['Range'] = 'bytes=30-50'
f = urllib2.urlopen(req)
def http_error_206(self, req, fp, code, msg, hdrs):
# 206 Partial Content Response
r = urllib.addinfourl(fp, hdrs, req.get_full_url())
r.code = code
r.msg = msg
return r
def http_error_416(self, req, fp, code, msg, hdrs):
# HTTP's Range Not Satisfiable error
raise RangeError('Requested Range Not Satisfiable')
更新:“更好的实现”已移至github: byterange.py文件中的excid3/urlgrabber 。
我强烈推荐使用requests库。它很容易成为我用过的最好的 HTTP 库。特别是,要完成您所描述的内容,您将执行以下操作:
import requests
url = "http://www.sffaudio.com/podcasts/ShellGameByPhilipK.Dick.pdf"
# Retrieve bytes between offsets 3 and 5 (inclusive).
r = requests.get(url, headers={"range": "bytes=3-5"})
# If a 4XX client error or a 5XX server error is encountered, we raise it.
AFAIK,使用 fseek() 或类似方法是不可能的。您需要使用 HTTP Range 标头来实现此目的。服务器可能支持也可能不支持此标头,因此您的里程可能会有所不同。
import urllib2
myHeaders = {'Range':'bytes=0-9'}
req = urllib2.Request('http://www.promotionalpromos.com/mirrors/gnu/gnu/bash/bash-1.14.3-1.14.4.diff.gz',headers=myHeaders)
partialFile = urllib2.urlopen(req)
s2 = (partialFile.read())
编辑:这当然是假设远程文件是指存储在 HTTP 服务器上的文件......
如果您想要的文件在 FTP 服务器上,则 FTP 只允许指定起始偏移量而不是范围。如果这是您想要的,那么下面的代码应该可以做到(未经测试!)
import ftplib
fileToRetrieve = 'somefile.zip'
fromByte = 15
ftp = ftplib.FTP('ftp.someplace.net')
outFile = open('partialFile', 'wb')
ftp.retrbinary('RETR '+ fileToRetrieve, outFile.write, rest=str(fromByte))
您可以使用httpio访问远程 HTTP 文件,就好像它们是本地的一样:
pip install httpio
import zipfile
import httpio
url = "http://some/large/file.zip"
with httpio.open(url) as fp:
zf = zipfile.ZipFile(fp)