python - python循环优化 - 迭代dirs 3级并删除

Question

嗨，我有以下程序，

问题： - 如何使它更优雅、更易读、更紧凑。- 我该怎么做才能将常见循环提取到另一种方法。

假设：

从给定的 rootDir 目录按如下方式组织。

proc的作用：

如果输入为 200，它将删除所有 OLDER 超过 200 天的 DIRS。不是基于修改时间，而是基于目录结构和目录名称[我稍后将在每个较旧的目录上通过暴力“rm -Rf”删除]

例如目录结构：

-2009(year dirs) [will force delete dirs e.g "rm -Rf" later]
-2010
  -01...(month dirs)
  -05 ..
      -01.. (day dirs)
         -many files. [I won't check mtime at file level - takes more time]
      -31
  -12
-2011
-2012 ...

我拥有的代码：

def get_dirs_to_remove(dir_path, olderThanDays):
    today = datetime.datetime.now();
    oldestDayToKeep = today + datetime.timedelta(days= -olderThanDays) 
    oldKeepYear = int(oldestDayToKeep.year)
    oldKeepMonth =int(oldestDayToKeep.month);
    oldKeepDay = int(oldestDayToKeep.day);
    for yearDir in os.listdir(dirRoot):
        #iterate year dir
        yrPath = os.path.join(dirRoot, yearDir);
        if(is_int(yearDir) == False):
            problemList.append(yrPath); # can't convery year to an int, store and report later 
            continue

        if(int(yearDir) < oldKeepYear):
                print "old Yr dir: " + yrPath
                #deleteList.append(yrPath); # to be bruteforce deleted e.g "rm -Rf"
                yield yrPath;
                continue
        elif(int(yearDir) == oldKeepYear):
            # iterate month dir
            print "process Yr dir: " + yrPath
            for monthDir in os.listdir(yrPath):
                monthPath = os.path.join(yrPath, monthDir)
                if(is_int(monthDir) == False):
                    problemList.append(monthPath);
                    continue
                if(int(monthDir) < oldKeepMonth):
                        print "old month dir: " + monthPath
                        #deleteList.append(monthPath);
                        yield monthPath;
                        continue
                elif (int(monthDir) == oldKeepMonth):
                    # iterate Day dir
                    print "process Month dir: " + monthPath
                    for dayDir in os.listdir(monthPath):
                        dayPath = os.path.join(monthPath, dayDir)
                        if(is_int(dayDir) == False):
                            problemList.append(dayPath);
                            continue
                        if(int(dayDir) < oldKeepDay):
                            print "old day dir: " + dayPath
                            #deleteList.append(dayPath);
                            yield dayPath
                            continue
print [ x for x in get_dirs_to_remove(dirRoot, olderThanDays)]
print "probList" %  problemList # how can I get this list also from the same proc?

score 1 · Accepted Answer

这实际上看起来很不错，除了这条评论中提到的一件大事：

print "probList" %  problemList # how can I get this list also from the same proc?

听起来您正在存储problemList在全局变量或其他东西中，并且您想修复它。这里有一些方法可以做到这一点：

产生删除文件和问题文件——例如，产生一个tuple第一个成员说它是哪种类型，第二个成员如何处理它的地方。
以problemList为参数。请记住lists 是可变的，因此调用者可以看到附加到参数。
yield最后的problemList—— 这意味着您需要重新构建使用生成器的方式，因为它不再只是一个简单的迭代器。
将生成器编码为类而不是函数，并存储problemList为成员变量。
查看内部生成器信息并将其塞入problemList其中，以便调用者可以检索它。

同时，有几种方法可以使代码更加紧凑和可读。

最简单的：

print [ x for x in get_dirs_to_remove(dirRoot, olderThanDays)]

此列表推导与原始迭代完全相同，您可以更简单地编写为：

print list(get_dirs_to_remove(dirRoot, olderThanDays))

至于算法本身，您可以对进行分区listdir，然后只使用分区list的 s。你可以懒惰地做：

yearDirs = os.listdir(dirRoot):
problemList.extend(yearDir for yearDir in yearDirs if not is_int(yearDir))
yield from (yearDir for yearDir in yearDirs if int(yearDir) < oldKeepYear)
for year in (yearDir for yearDir in yearDirs if int(yearDir) == oldKeepYear):
    # next level down

或者严格来说：

yearDirs = os.listdir(dirRoot)
problems, older, eq, newer = partitionDirs(yearDirs, oldKeepYear)
problemList.extend(problems)
yield from older
for year in eq:
    # next level down

后者可能更有意义，特别是考虑到这yearDirs已经是一个列表，而且不太可能那么大。

当然你需要编写那个partitionDirs函数——但好的是，你可以在几个月和几天的级别再次使用它。这很简单。事实上，我实际上可能通过排序来进行分区，因为它使逻辑如此明显，即使它更冗长：

def partitionDirs(dirs, keyvalue):
    problems = [dir for dir in dirs if not is_int(dir)]
    values = sorted(dir for dir in dirs if is_int(dir), key=int)
    older, eq, newer = partitionSortedListAt(values, keyvalue, key=int)

如果您环顾四周（也许搜索“python 分区排序列表”？），您可以找到很多实现该partitionSortedListAt功能的方法，但这里有一个我认为对于没有想到问题的人来说很容易理解的东西的草图这边走：

    i = bisect.bisect_right(vals, keyvalue)
    if vals[i] == keyvalue:
        return problems, vals[:i], [vals[i]], vals[i+1:]
    else:
        return problems, vals[:i], [], vals[i:]

如果您搜索“python split predicate”，您还可以找到其他方法来实现初始拆分——尽管请记住，大多数人要么关心能够分区任意迭代（您在这里不需要），要么，正确与否，担心效率（您在这里也不关心）。所以，不要寻找有人说是“最好”的答案；查看所有答案，然后选择对您来说最易读的答案。

最后，您可能会注意到您最终得到了三个看起来几乎相同的级别：

yearDirs = os.listdir(dirRoot)
problems, older, eq, newer = partitionDirs(yearDirs, oldKeepYear)
problemList.extend(problems)
yield from older
for year in eq:
    monthDirs = os.listdir(os.path.join(dirRoot, str(year)))
    problems, older, eq, newer = partitionDirs(monthDirs, oldKeepMonth)
    problemList.extend(problems)
    yield from older
    for month in eq:
        dayDirs = os.listdir(os.path.join(dirRoot, str(year), str(month)))
        problems, older, eq, newer = partitionDirs(dayDirs, oldKeepDay)
        problemList.extend(problems)
        yield from older
        yield from eq

您可以通过递归进一步简化这一点——传递到目前为止的路径，以及要检查的进一步级别列表，您可以将这 18 行变成 9 行。这是否更具可读性取决于您对信息进行编码的程度传下来并适当yield from。这是这个想法的草图：

def doLevel(pathSoFar, dateComponentsLeft):
    if not dateComponentsLeft:
        return
    dirs = os.listdir(pathSoFar)
    problems, older, eq, newer = partitionDirs(dirs, dateComponentsLeft[0])
    problemList.extend(problems)
    yield from older
    if eq:
        yield from doLevel(os.path.join(pathSoFar, eq[0]), dateComponentsLeft[1:]))
yield from doLevel(rootPath, [oldKeepYear, oldKeepMonth, oldKeepDay])

如果您使用的是没有的较旧 Python 版本，那么yield from转换较早的东西几乎是微不足道的；编写的递归版本会更丑陋，更痛苦。但是在处理递归生成器时确实没有办法避免这种情况，因为子生成器不能“通过”调用生成器。

score 1 · Accepted Answer

我建议不要使用生成器，除非你绝对确定你需要它们。在这种情况下，您不需要它们。

在下面，newer_list不是严格需要的。虽然categorizeSubdirs可以递归，但我认为复杂性的增加不值得节省重复（但这只是个人风格问题；我只在不清楚需要多少级递归或数量固定时才使用递归，但大；三个是不够的 IMO）。


def categorizeSubdirs(keep_int, base_path):
    older_list = []
    equal_list = []
    newer_list = []
    problem_list = []

    for subdir_str in os.listdir(base_path):
        subdir_path = os.path.join(base_path, subdir_str))
        try:
            subdir_int = int(subdir_path)
        except ValueError:
            problem_list.append(subdir_path)
        else:
            if subdir_int  keep_int:
                newer_list.append(subdir_path)
            else:
                equal_list.append(subdir_path)

    # Note that for your case, you don't need newer_list, 
    # and it's not clear if you need problem_list
    return older_list, equal_list, newer_list, problem_list

def get_dirs_to_remove(dir_path, olderThanDays):
    oldest_dt = datetime.datetime.now() datetime.timedelta(days= -olderThanDays) 
    remove_list = []
    problem_list = []

    olderYear_list, equalYear_list, newerYear_list, problemYear_list = categorizeSubdirs(oldest_dt.year, dir_path))
    remove_list.extend(olderYear_list)
    problem_list.extend(problemYear_list)

    for equalYear_path in equalYear_list:
        olderMonth_list, equalMonth_list, newerMonth_list, problemMonth_list = categorizeSubdirs(oldest_dt.month, equalYear_path))
        remove_list.extend(olderMonth_list)
        problem_list.extend(problemMonth_list)

        for equalMonth_path in equalMonth_list:
            olderDay_list, equalDay_list, newerDay_list, problemDay_list = categorizeSubdirs(oldest_dt.day, equalMonth_path))
            remove_list.extend(olderDay_list)
            problem_list.extend(problemDay_list)

    return remove_list, problem_list

最后的三个嵌套循环可以以代码复杂性为代价减少重复性。我不认为这是值得的，尽管理性的人可能会不同意。在其他条件相同的情况下，我更喜欢更简单的代码而不是更聪明的代码；正如他们所说，阅读代码比编写代码更难，所以如果你编写了最聪明的代码，你就不会聪明到去阅读它。：/

python - python循环优化 - 迭代dirs 3级并删除

2 回答 2

Related

Reference