我遇到了同样的问题:我希望我的代码接受文件名并返回一个文件句柄以用于with
、 自动压缩等。
就我而言,我愿意相信文件扩展名,我只需要处理 gzip 和 bzip 文件。
import gzip
import bz2
def open_by_suffix(filename):
if filename.endswith('.gz'):
return gzip.open(filename, 'rb')
elif filename.endswith('.bz2'):
return bz2.BZ2file(filename, 'r')
else:
return open(filename, 'r')
如果我们不信任文件名,我们可以比较文件的初始字节以获得魔术字符串(从https://stackoverflow.com/a/13044946/117714修改):
import gzip
import bz2
magic_dict = {
"\x1f\x8b\x08": (gzip.open, 'rb')
"\x42\x5a\x68": (bz2.BZ2File, 'r')
}
max_len = max(len(x) for x in magic_dict)
def open_by_magic(filename):
with open(filename) as f:
file_start = f.read(max_len)
for magic, (fn, flag) in magic_dict.items():
if file_start.startswith(magic):
return fn(filename, flag)
return open(filename, 'r')
用法:
# cat
for filename in filenames:
with open_by_suffix(filename) as f:
for line in f:
print f
您的用例如下所示:
for f in files:
with open_by_suffix(f) as handle:
process_file_contents(handle)