python - PycURL 附件和进度函数

Question

使用您向其发送请求的 API 处理一个小型项目，然后它会返回带有 zip 文件的响应，然后您可以下载该 zip 文件。我在自动下载的第一次尝试使用 setopt(curl.WRITEDATA, fp) 函数，但是每次我尝试时都会使我的 Python 脚本崩溃。然后我改变了策略并使用 WRITEFUNCTION 将数据写入缓冲区，然后将其写入一个始终可以正常工作的文件。

这一切都很好，但后来我想添加一个进度条来查看已经下载了多少文件并提供一些用户反馈等。这就是事情开始变得奇怪的地方，因为现在进度条在一秒钟内达到 100%并且 zip 文件尚未完成下载。当我将进度函数更改为仅打印正在下载的文件的大小时，它会报告大约 100 个字节的数量（比 zip 文件小得多）。无论如何使用pycurl（和下面的curl）中的功能来跟踪附件下载的进度而不是请求本身？

此外，如果有人可以帮助解决可能有帮助的 WRITEDATA 问题，我想这两个问题可能是相关的。

score 5 · Accepted Answer

以下代码将使用下载文件pycurl并显示当前进度（作为文本）：

import pycurl
# for displaying the output text
from sys import stderr as STREAM

# replace with your own url and path variables
url = "http://ovh.net/files/100Mb.dat"
path = 'test_file.dat'

# use kiB's
kb = 1024

# callback function for c.XFERINFOFUNCTION
def status(download_t, download_d, upload_t, upload_d):
    STREAM.write('Downloading: {}/{} kiB ({}%)\r'.format(
        str(int(download_d/kb)),
        str(int(download_t/kb)),
        str(int(download_d/download_t*100) if download_t > 0 else 0)
    ))
    STREAM.flush()

# download file using pycurl
with open(path, 'wb') as f:
    c = pycurl.Curl()
    c.setopt(c.URL, url)
    c.setopt(c.WRITEDATA, f)
    # display progress
    c.setopt(c.NOPROGRESS, False)
    c.setopt(c.XFERINFOFUNCTION, status)
    c.perform()
    c.close()

# keeps progress onscreen after download completes
print()

输出应如下所示：

Downloading: 43563/122070 kiB (35%)

如果您想使用实际的进度条，也可以这样做。但这需要更多的工作。

以下代码使用tqdm包生成进度条。它会在文件下载时实时更新，并显示下载速度和预计剩余时间。由于tqdm工作方式的限制，requests还需要包。这也与total_dl_d变量是数组而不是整数的原因有关。

import pycurl
# needed to predict total file size
import requests
# progress bar
from tqdm import tqdm

# replace with your own url and path variables
url = "http://ovh.net/files/10Mb.dat"
path = 'test_file.dat'

# show progress % and amount in bytes
r = requests.get(url)
total_size = int(r.headers.get('content-length', 0))
block_size = 1024

# create a progress bar and update it manually
with tqdm(total=total_size, unit='iB', unit_scale=True) as pbar:
    # store dotal dl's in an array (arrays work by reference)
    total_dl_d = [0]
    def status(download_t, download_d, upload_t, upload_d, total=total_dl_d):
        # increment the progress bar
        pbar.update(download_d - total[0])
        # update the total dl'd amount
        total[0] = download_d

    # download file using pycurl
    with open(path, 'wb') as f:
        c = pycurl.Curl()
        c.setopt(c.URL, url)
        c.setopt(c.WRITEDATA, f)
        # follow redirects:
        c.setopt(c.FOLLOWLOCATION, True)
        # custom progress bar
        c.setopt(c.NOPROGRESS, False)
        c.setopt(c.XFERINFOFUNCTION, status)
        c.perform()
        c.close()

对所描述问题的可能原因的解释：

（问题中没有提供代码，所以我不得不猜测一下究竟是什么导致了上述问题......）

基于变量名 ( fpie file_path)...
文件写入 (WRITEDATA) 问题可能是由于提供了文件路径 (str) 而不是文件对象 (io.BufferedWriter)。

根据我自己的经验...
文件XFERINFOFUNCTION下载过程中反复调用回调。回调仅提供文件总大小和已下载的总大小作为参数。它不计算自上次调用以来的增量（差异）。进度条描述的问题（“进度条在一秒钟内达到 100% 并且 zip 文件尚未完成下载”）可能是由于（下载的）总量被用作增量update时的数量预计金额。如果进度条每次都增加总量，那么它不会反映实际下载量。它将显示更大的数量。然后，它会超过100%，并且会出现各种故障。

资料来源：

score 0 · Accepted Answer

对于@Elliot G.的tqdm回答，我们应该只得到一个标题而不是像这样的整个身体；

# show progress % and amount in bytes
r = requests.head(url)   # <==============================
total_size = int(r.headers.get('content-length', 0))
block_size = 1024

由于缺乏我的声誉，我无法对@Elliot G. 的回答发表评论，所以我发布了这个。

python - PycURL 附件和进度函数

2 回答 2

Related

Reference