python - What is the Python way to walk a directory tree?

Question

I feel that assigning files, and folders and doing the += [item] part is a bit hackish. Any suggestions? I'm using Python 3.2

from os import *
from os.path import *

def dir_contents(path):
    contents = listdir(path)
    files = []
    folders = []
    for i, item in enumerate(contents):
        if isfile(contents[i]):
            files += [item]
        elif isdir(contents[i]):
            folders += [item]
    return files, folders

score 39 · Accepted Answer

Take a look at the os.walk function which returns the path along with the directories and files it contains. That should considerably shorten your solution.

score 28 · Accepted Answer

os.walk并且os.scandir是很好的选择，但是，我越来越多地使用pathlib，并且使用 pathlib 您可以使用以下.glob()方法：

root_directory = Path(".")
for path_object in root_directory.glob('**/*'):
    if path_object.is_file():
        print(f"hi, I'm a file: {path_object}")
    elif path_object.is_dir():
        print(f"hi, I'm a dir: {path_object}")

score 21 · Accepted Answer

pathlib对于任何使用( python >= 3.4)寻找解决方案的人

from pathlib import Path

def walk(path): 
    for p in Path(path).iterdir(): 
        if p.is_dir(): 
            yield from walk(p)
            continue
        yield p.resolve()

# recursively traverse all files from current directory
for p in walk(Path('.')): 
    print(p)

# the function returns a generator so if you need a list you need to build one
all_files = list(walk(Path('.')))

但是，如上所述，这不会保留由下式给出的自上而下的顺序os.walk

score 4 · Accepted Answer

确实使用

items += [item]

不好有很多原因...

该append方法正是为此而设计的（将一个元素附加到列表的末尾）
您正在创建一个元素的临时列表，只是为了将其丢弃。虽然在使用 Python 时原始速度不应该是您首先关心的问题（否则您使用的是错误的语言）仍然无缘无故地浪费速度似乎不是正确的事情。
您正在使用 Python 语言的一点不对称性......对于列表对象，写入a += b与写入不同，a = a + b因为前者会修改对象，而第二个会分配一个新列表，如果对象可以有不同的语义a也可以通过其他方式到达。在您的特定代码中，情况似乎并非如此，但是当其他人（或几年后的您自己，同样如此）必须修改代码时，它可能会成为问题。Python 甚至有一个extend语法不那么微妙的方法，专门用于处理您希望通过在末尾添加另一个列表的元素来就地修改列表对象的情况。

同样正如其他人所指出的那样，您的代码似乎正在尝试做os.walk已经做的事情......

score 3 · Accepted Answer

def dir_contents(path):
    files,folders = [],[]
    for p in listdir(path):
        if isfile(p): files.append(p)
        else: folders.append(p)
    return files, folders

score 3 · Accepted Answer

从 Python 3.4 开始有新的模块pathlib。因此，要获取所有目录和文件，可以执行以下操作：

from pathlib import Path

dirs = [str(item) for item in Path(path).iterdir() if item.is_dir()]
files = [str(item) for item in Path(path).iterdir() if item.is_file()]

score 3 · Accepted Answer

如果您想递归遍历所有文件，包括子文件夹中的所有文件，我相信这是最好的方法。

import os

def get_files(input):
    for fd, subfds, fns in os.walk(input):
       for fn in fns:
            yield os.path.join(fd, fn)

## now this will print all full paths

for fn in get_files(fd):
    print(fn)

score 1 · Accepted Answer

我没有使用内置的 os.walk 和 os.path.walk，而是使用从我发现的其他地方建议的这段代码派生的东西，这些代码我最初链接到但已替换为内联源：

import os
import stat

class DirectoryStatWalker:
    # a forward iterator that traverses a directory tree, and
    # returns the filename and additional file information

    def __init__(self, directory):
        self.stack = [directory]
        self.files = []
        self.index = 0

    def __getitem__(self, index):
        while 1:
            try:
                file = self.files[self.index]
                self.index = self.index + 1
            except IndexError:
                # pop next directory from stack
                self.directory = self.stack.pop()
                self.files = os.listdir(self.directory)
                self.index = 0
            else:
                # got a filename
                fullname = os.path.join(self.directory, file)
                st = os.stat(fullname)
                mode = st[stat.ST_MODE]
                if stat.S_ISDIR(mode) and not stat.S_ISLNK(mode):
                    self.stack.append(fullname)
                return fullname, st

if __name__ == '__main__':
    for file, st in DirectoryStatWalker("/usr/include"):
        print file, st[stat.ST_SIZE]

它递归地遍历目录，非常高效且易于阅读。

score 0 · Accepted Answer

在谷歌搜索相同的信息时，我发现了这个问题。

我在这里发布了我在http://www.pythoncentral.io/how-to-traverse-a-directory-tree-in-python-guide-to-os-walk/找到的最小、最清晰的代码（而不是只是发布 URL，以防链接失效）。

该页面有一些有用的信息，还指向其他一些相关页面。

# Import the os module, for the os.walk function
import os

# Set the directory you want to start from
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

score 0 · Accepted Answer

0

尝试使用该append方法。

于 2011-07-10T05:37:17.510 回答

score 0 · Accepted Answer

我还没有对此进行广泛的测试，但我相信这将扩展os.walk生成器，将 dirnames 加入所有文件路径，并将结果列表展平；在搜索路径中提供具体文件的直接列表。

import itertools
import os

def find(input_path):
    return itertools.chain(
        *list(
            list(os.path.join(dirname, fname) for fname in files)
            for dirname, _, files in os.walk(input_path)
        )
    )

score 0 · Accepted Answer

由于Python >= 3.4存在生成器方法Path.rglob。因此，要处理下的所有路径，some/starting/path只需执行以下操作即可

from pathlib import Path

path = Path('some/starting/path') 
for subpath in path.rglob('*'):
    # do something with subpath

要获取列表中的所有子路径，请执行list(path.rglob('*')). 要仅获取带有sql扩展名的文件，请执行 list(path.rglob('*.sql')).

python - What is the Python way to walk a directory tree?

12 回答 12

Related

Reference