我正在使用 Python 3.3.1。我创建了一个名为的函数download_file()
,它下载文件并将其保存到磁盘。
#!/usr/bin/python3
# -*- coding: utf8 -*-
import datetime
import os
import urllib.error
import urllib.request
def download_file(*urls, download_location=os.getcwd(), debugging=False):
"""Downloads the files provided as multiple url arguments.
Provide the url for files to be downloaded as strings. Separate the
files to be downloaded by a comma.
The function would download the files and save it in the folder
provided as keyword-argument for download_location. If
download_location is not provided, then the file would be saved in
the current working directory. Folder for download_location would be
created if it doesn't already exist. Do not worry about trailing
slash at the end for download_location. The code would take carry of
it for you.
If the download encounters an error it would alert about it and
provide the information about the Error Code and Error Reason (if
received from the server).
Normal Usage:
>>> download_file('http://localhost/index.html',
'http://localhost/info.php')
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test')
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test/')
In Debug Mode, files are not downloaded, neither there is any
attempt to establish the connection with the server. It just prints
out the filename and its url that would have been attempted to be
downloaded in Normal Mode.
By Default, Debug Mode is inactive. In order to activate it, we
need to supply a keyword-argument as 'debugging=True', like:
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
debugging=True)
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test',
debugging=True)
"""
# Append a trailing slash at the end of download_location if not
# already present
if download_location[-1] != '/':
download_location = download_location + '/'
# Create the folder for download_location if not already present
os.makedirs(download_location, exist_ok=True)
# Other variables
time_format = '%Y-%b-%d %H:%M:%S' # '2000-Jan-01 22:10:00'
# "Request Headers" information for the file to be downloaded
accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
accept_encoding = 'gzip, deflate'
accept_language = 'en-US,en;q=0.5'
connection = 'keep-alive'
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:20.0) \
Gecko/20100101 Firefox/20.0'
headers = {'Accept': accept,
'Accept-Encoding': accept_encoding,
'Accept-Language': accept_language,
'Connection': connection,
'User-Agent': user_agent,
}
# Loop through all the files to be downloaded
for url in urls:
filename = os.path.basename(url)
if not debugging:
try:
request_sent = urllib.request.Request(url, None, headers)
response_received = urllib.request.urlopen(request_sent)
except urllib.error.URLError as error_encountered:
print(datetime.datetime.now().strftime(time_format),
':', filename, '- The file could not be downloaded.')
if hasattr(error_encountered, 'code'):
print(' ' * 22, 'Error Code -', error_encountered.code)
if hasattr(error_encountered, 'reason'):
print(' ' * 22, 'Reason -', error_encountered.reason)
else:
read_response = response_received.read()
output_file = download_location + filename
with open(output_file, 'wb') as downloaded_file:
downloaded_file.write(read_response)
print(datetime.datetime.now().strftime(time_format),
':', filename, '- Downloaded successfully.')
else:
print(datetime.datetime.now().strftime(time_format),
': Debugging :', filename, 'would be downloaded from :\n',
' ' * 21, url)
此功能适用于下载 PDF、图像和其他格式,但它会给 html 文件等文本文档带来麻烦。我怀疑这个问题最后与这条线有关:
with open(output_file, 'wb') as downloaded_file:
所以,我也尝试过以wt
模式打开它。也尝试过w
仅使用模式。但这并不能解决问题。
另一个问题可能是编码,所以我还包括第二行:
# -*- coding: utf8 -*-
但这仍然行不通。可能是什么问题,如何使它适用于文本和二进制文件?
不起作用的示例:
>>>download_file("http://docs.python.org/3/tutorial/index.html")
当我在 Gedit 中打开它时,它显示为:
同样在 Firefox 中打开时: