标头是请求的一部分,URL 是请求的一部分。当您将 URL 传递给urllib.request
函数时,Python 会为您创建一个请求。
创建一个Request
对象,将标题添加到该对象并使用它而不是字符串 URL:
request = Request(urlunparse(parsed), headers={'User-Agent': 'My own agent string'})
但是,urlretrieve()
在代码中被标记为“遗留 API”,不支持使用Request
对象。删除几行支持 'file://' url 很简单:
import contextlib
import tempfile
from urllib.error import ContentTooShortError
从 urllib.request 导入 urlopen
_url_tempfiles = []
def urlretrieve(url, filename=None, reporthook=None, data=None):
with contextlib.closing(urlopen(url, data)) as fp:
headers = fp.info()
# Handle temporary file setup.
if filename:
tfp = open(filename, 'wb')
else:
tfp = tempfile.NamedTemporaryFile(delete=False)
filename = tfp.name
_url_tempfiles.append(filename)
with tfp:
result = filename, headers
bs = 1024*8
size = -1
read = 0
blocknum = 0
if "content-length" in headers:
size = int(headers["Content-Length"])
if reporthook:
reporthook(blocknum, bs, size)
while True:
block = fp.read(bs)
if not block:
break
read += len(block)
tfp.write(block)
blocknum += 1
if reporthook:
reporthook(blocknum, bs, size)
if size >= 0 and read < size:
raise ContentTooShortError(
"retrieval incomplete: got only %i out of %i bytes"
% (read, size), result)
return result