python - 在函数内的 for 循环上使用 tqdm 来检查进度

Question

我正在使用 for 循环遍历目录树中的大型组文件。

这样做时，我想通过控制台中的进度条监控进度。因此，我决定为此使用 tqdm。

目前，我的代码如下所示：

for dirPath, subdirList, fileList in tqdm(os.walk(target_dir)):
        sleep(0.01)
        dirName = dirPath.split(os.path.sep)[-1]
        for fname in fileList:
        *****

输出：

Scanning Directory....
43it [00:23, 11.24 it/s]

所以，我的问题是它没有显示进度条。我想知道如何正确使用它并更好地了解它的工作原理。此外，如果还有其他可以在此处使用的 tqdm 替代方案。

score 7 · Accepted Answer

除非您知道“完成”是什么意思，否则您无法显示完成百分比。

在os.walk运行时，它不知道最终要迭代多少文件和文件夹：返回类型为os.walkno __len__。它必须一直向下查看目录树，枚举所有文件和文件夹，以便计算它们。换句话说，os.walk为了告诉你它将生产多少物品，它必须做两次它的所有工作，这是低效的。

如果您对显示进度条一无所知，您可以将数据假脱机到内存列表中：list(os.walk(target_dir)). 我不推荐这个。如果您正在遍历大型目录树，这可能会消耗大量内存。更糟糕的是，如果followlinks是True并且你有一个循环目录结构（孩子链接到他们的父母），那么它可能会永远循环下去，直到你用完 RAM。

score 3 · Accepted Answer

如文档中所述，这是因为您需要提供进度指示器。根据您对文件的处理方式，您可以使用文件计数或文件大小。

其他答案建议将os.walk()生成器转换为列表，以便获得__len__属性。但是，这将花费您大量的内存，具体取决于您拥有的文件总数。

另一种可能性是预先计算：您首先遍历整个文件树并计算文件总数（但不保留文件列表，只保留计数！），然后您可以再次遍历并提供tqdm您预先计算的文件数：

def walkdir(folder):
    """Walk through every files in a directory"""
    for dirpath, dirs, files in os.walk(folder):
        for filename in files:
            yield os.path.abspath(os.path.join(dirpath, filename))

# Precomputing files count
filescount = 0
for _ in tqdm(walkdir(target_dir)):
    filescount += 1

# Computing for real
for filepath in tqdm(walkdir(target_dir), total=filescount):
        sleep(0.01)
        # etc...

请注意，我在 : 上定义了一个包装函数os.walkdir：因为您正在处理文件而不是目录，所以最好定义一个将在文件而不是目录上进行的函数。

但是，您可以在不使用walkdir包装器的情况下获得相同的结果，但这会有点复杂，因为您必须在遍历每个子文件夹之后恢复最后一个进度条状态：

# Precomputing
filescount = 0
for dirPath, subdirList, fileList in tqdm(os.walk(target_dir)):
    filescount += len(filesList)

# Computing for real
last_state = 0
for dirPath, subdirList, fileList in os.walk(target_dir):
    sleep(0.01)
    dirName = dirPath.split(os.path.sep)[-1]
    for fname in tqdm(fileList, total=filescount, initial=last_state):
        # do whatever you want here...
    # Update last state to resume the progress bar
    last_state += len(fileList)

score 2 · Accepted Answer

这是预先计算文件数量然后在文件上提供状态栏的更简洁的方法：

file_count = sum(len(files) for _, _, files in os.walk(folder))  # Get the number of files
with tqdm(total=file_count) as pbar:  # Do tqdm this way
    for root, dirs, files in os.walk(folder):  # Walk the directory
        for name in files:
            pbar.update(1)  # Increment the progress bar
            # Process the file in the walk

score 2 · Accepted Answer

这是因为tqdm不知道结果os.walk会持续多久，因为它是一个生成器，所以len不能调用它。os.walk(target_dir)您可以通过先转换为列表来解决此问题：

for dirPath, subdirList, fileList in tqdm(list(os.walk(target_dir))):

从tdqm模块的文档中：

如果可能，使用 len(iterable)。作为最后的手段，只显示基本的进度统计信息（没有 ETA，没有进度条）。

但是，len(os.walk(target_dir))这是不可能的，所以没有 ETA 或进度条。

正如本杰明指出的那样，使用list确实会使用一些内存，但不会太多。大约 190,000 个文件的假脱机目录导致 Python 在我的 Windows 10 机器上使用此代码使用大约 65MB 的内存。

score 1 · Accepted Answer

tqdm通过这种方式，您可以在目录路径中的所有文件上取得进展。

from tqdm import tqdm
target_dir = os.path.join(os.getcwd(), "..Your path name")#it has 212 files
for r, d, f in os.walk(target_dir):
    for file in tqdm(f, total=len(f)):
        filepath = os.path.join(r, file)
        #f'Your operation on file..{filepath}'

20%|█████████████████████ | 42/212 [05:07<17:58, 6.35s/it]

像这样你会得到进步...

score 0 · Accepted Answer

这是我对类似问题的解决方案：

    for root, dirs, files in os.walk(local_path):
        path, dirs, files = os.walk(local_path).next()
        count_files = (int(len(files)))
        for i in tqdm.tqdm(range(count_files)):
            time.sleep(0.1)
            for fname in files:
                full_fname = os.path.join(root, fname)

python - 在函数内的 for 循环上使用 tqdm 来检查进度

6 回答 6

Related

Reference