python - 关于读取小文件的python风格问题

Question

在命名文件中读取最pythonic的方法是什么，去除空行，仅包含空格或将#作为第一个字符，然后处理剩余的行？假设这一切都可以轻松地放入内存中。

注意：这样做并不难——我要的是最pythonic的方式。我已经写了很多 Ruby 和 Java 并且已经失去了我的感觉。

这是一个稻草人：

file_lines = [line.strip() for line in open(config_file, 'r').readlines() if len(line.strip()) > 0]
for line in file_lines:
  if line[0] == '#':
    continue
  # Do whatever with line here.

我对简洁感兴趣，但不以变得难以阅读为代价。

score 5 · Accepted Answer

生成器非常适合此类任务。它们是可读的，保持完美的关注点分离，并且在内存使用和时间上都很有效。

def RemoveComments(lines):
    for line in lines:
        if not line.strip().startswith('#'):
            yield line

def RemoveBlankLines(lines):
    for line in lines:
        if line.strip():
            yield line

现在将这些应用到您的文件中：

filehandle = open('myfile', 'r')
for line in RemoveComments(RemoveBlankLines(filehandle)):
    Process(line)

在这种情况下，很明显两个生成器可以合并为一个，但我将它们分开以展示它们的可组合性。

score 3 · Accepted Answer

lines = [r for r in open(thefile) if not r.isspace() and r[0] != '#']

到目前为止，.isspace()字符串方法是测试字符串是否完全是空白的最佳方法——不需要诸如len(r.strip()) == 0(ech;-) 之类的扭曲。

score 2 · Accepted Answer

for line in open("file"):
    sline=line.strip()
    if sline and not sline[0]=="#" :
       print line.strip()

输出

$ cat file
one
#
  #

two

three
$ ./python.py
one
two
three

score 1 · Accepted Answer

这符合描述，即

删除空行、仅包含空格或以 # 作为第一个字符的行，然后处理其余行

因此，以空格开始或结束的行不受限制地通过。

with open("config_file","r") as fp:
    data = (line for line in fp if line.strip() and not line.startswith("#"))
    for item in data:
        print repr(item)

score 1 · Accepted Answer

我会用这个：

processed = [process(line.strip())
             for line in open(config_file, 'r')
             if line.strip() and not line.strip().startswith('#')]

我在这里看到的唯一丑陋的是所有的重复剥离。摆脱它会使函数有点复杂：

processed = [process(line)
             for line in (line.strip() for line in open(config_file, 'r'))
             if line and not line.startswith('#')]

score 1 · Accepted Answer

我喜欢 Paul Hankin 的想法，但我会采取不同的做法：

from itertools import ifilter, ifilterfalse, imap

with open(r'c:\temp\testfile.txt', 'rb') as f:
    s1 = ifilterfalse(str.isspace, f)
    s2 = ifilter(lambda x: not x.startswith('#'), s1)
    s3 = imap(str.rstrip, s2)
    print "\n".join(s3)

如果我担心内存使用情况，我可能只会这样做，而不是使用这里建议的一些更明显的方法。我可能会定义一个iscomment函数来消除 lambda。

score 0 · Accepted Answer

该文件很小，因此性能并不是真正的问题。我会追求清晰而不是简洁：

fp = open('file.txt')
for line in fp:
    line = line.strip()
    if line and not line.startswith('#'):
        # process
fp.close()

如果需要，可以将其包装在一个函数中。

score 0 · Accepted Answer

使用稍新的习语（或使用 Python 2.5 from __future__ import with）你可以做到这一点，它具有安全清理的优点，但非常简洁。

with file('file.txt') as fp:
    for line in fp:
        line = line.strip()
        if not line or line[0] == '#':
            continue

        # rest of processing here

请注意，首先剥离该行意味着检查“#”实际上会拒绝将其作为第一个非空白的行，而不仅仅是“作为第一个字符”。如果您对此要求严格，则很容易修改。

python - 关于读取小文件的python风格问题

8 回答 8

Related

Reference