python - python os.walk 到一定程度

Question

我想构建一个程序，它使用一些基本代码来读取文件夹并告诉我文件夹中有多少文件。这是我目前的做法：

import os

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)

在“主”文件夹中有多个文件夹之前，这非常有效，因为由于文件夹/文件管理不善，它可能会返回一长串垃圾文件。所以我最多只想到第二级。例子：

Main Folder
---file_i_want
---file_i_want
---Sub_Folder
------file_i_want <--*
------file_i want <--*
------Sub_Folder_2
---------file_i_dont_want
---------file_i_dont_want

我知道如何只用 abreak和del dirs[:]从这篇文章和这篇文章中获取到第一级。

import os
import pandas as pd

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)
        del dirs[:] # or a break here. does the same thing.

但是无论我如何搜索，我都无法找到如何深入两层。我可能只是不理解上面的其他帖子还是什么？我在想类似的东西，del dirs[:2]但无济于事。有人可以指导我或向我解释如何做到这一点吗？

score 25 · Accepted Answer

你可以这样做：

depth = 2

# [1] abspath() already acts as normpath() to remove trailing os.sep
#, and we need ensures trailing os.sep not exists to make slicing accurate. 
# [2] abspath() also make /../ and ////, "." get resolved even though os.walk can returns it literally.
# [3] expanduser() expands ~
# [4] expandvars() expands $HOME
stuff = os.path.abspath(os.path.expanduser(os.path.expandvars(stuff)))

for root,dirs,files in os.walk(stuff):
    if root[len(stuff):].count(os.sep) < depth:
        for f in files:
            print(os.path.join(root,f))

关键是：if root[len(stuff):].count(os.sep) < depth

stuff它从中删除root，因此结果相对于stuff。只需计算文件分隔符的数量。

深度类似于findLinux中的命令，即-maxdepth 0什么都不做，-maxdepth 1只扫描第一级的文件，-maxdepth 2扫描包含子目录的文件。

当然，它仍然会扫描完整的文件结构，但除非它很深，否则它会起作用。

另一种解决方案是仅使用os.listdir具有最大递归级别的递归（使用目录检查），但如果您不需要它，那就有点棘手了。由于它并不难，这里有一个实现：

def scanrec(root):
    rval = []

    def do_scan(start_dir,output,depth=0):
        for f in os.listdir(start_dir):
            ff = os.path.join(start_dir,f)
            if os.path.isdir(ff):
                if depth<2:
                    do_scan(ff,output,depth+1)
            else:
                output.append(ff)

    do_scan(root,rval,0)
    return rval

print(scanrec(stuff))  # prints the list of files not below 2 deep

注意：os.listdir并os.path.isfile执行 2 次stat调用，因此不是最佳的。在 Python 3.5 中，使用os.scandir可以避免双重调用。

score 7 · Accepted Answer

您可以计算分隔符，如果它是两个级别的深度，则删除其内容，dirs因此walk不会更深地递归：

import os

MAX_DEPTH = 2
folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)
        if root.count(os.sep) - stuff.count(os.sep) == MAX_DEPTH - 1:
            del dirs[:]

Python 文档说明了以下行为：

当 topdown 为 True 时，调用者可以就地修改 dirnames 列表（可能使用 del 或 slice 赋值），并且 walk() 只会递归到名称保留在 dirnames 中的子目录；这可用于修剪搜索，强制执行特定的访问顺序，甚至在调用者再次恢复 walk() 之前通知 walk() 有关调用者创建或重命名的目录。

请注意，您需要考虑folders. 例如当y:\path1是走根时，y:\path但你不想在那里停止递归。

python - python os.walk 到一定程度

2 回答 2

Related

Reference