顺序检查每个文件名以查找下一个可用的文件名适用于少量文件,但随着文件数量的增加很快变得更慢。
这是一个在 log(n) 时间内找到下一个可用文件名的版本:
import os
def next_path(path_pattern):
"""
Finds the next free path in an sequentially named list of files
e.g. path_pattern = 'file-%s.txt':
file-1.txt
file-2.txt
file-3.txt
Runs in log(n) time where n is the number of existing files in sequence
"""
i = 1
# First do an exponential search
while os.path.exists(path_pattern % i):
i = i * 2
# Result lies somewhere in the interval (i/2..i]
# We call this interval (a..b] and narrow it down until a + 1 = b
a, b = (i // 2, i)
while a + 1 < b:
c = (a + b) // 2 # interval midpoint
a, b = (c, b) if os.path.exists(path_pattern % c) else (a, c)
return path_pattern % b
为了测量速度的提高,我编写了一个创建 10,000 个文件的小测试函数:
for i in range(1,10000):
with open(next_path('file-%s.foo'), 'w'):
pass
并实施了幼稚的方法:
def next_path_naive(path_pattern):
"""
Naive (slow) version of next_path
"""
i = 1
while os.path.exists(path_pattern % i):
i += 1
return path_pattern % i
结果如下:
快速版本:
real 0m2.132s
user 0m0.773s
sys 0m1.312s
天真的版本:
real 2m36.480s
user 1m12.671s
sys 1m22.425s
最后,请注意,如果多个参与者同时尝试在序列中创建文件,则任何一种方法都容易受到竞争条件的影响。