我正在使用 Python 读取文件,并且在文件中有一些用“#”字符括起来的部分:
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
现在我编写了代码来读取文件,如下所示:
with open(filename) as fh:
enumerated = enumerate(iter(fh.readline, ''), start=1)
for lino, line in enumerated:
# handle special section
if line.startswith('#'):
print("="*40)
print(line)
while True:
start = fh.tell()
lino, line = next(enumerated)
if line.startswith('#'):
fh.seek(start)
break
print("[{}] {}".format(lino,line))
输出是:
========================================
#HEADER1, SOME EXTRA INFO
[2] data first section
[3] 1 2
[4] 1 233
[5] ...
[6] // THIS IS A COMMENT
========================================
#HEADER2, SECOND SECTION
[9] 452
[10] 134
[11] // ANOTHER COMMENT
[12] ...
========================================
#HEADER3, THIRD SECTION
现在您看到线路计数器lino
不再有效,因为我正在使用seek
. 此外,在中断循环之前减少它也无济于事,因为每次调用next
. 那么在 Python 3.x 中有没有一种优雅的方法来解决这个问题呢?另外,有没有更好的方法来解决StopIteration
而不在块中放置pass
语句Except
?
更新
到目前为止,我已经根据@Dunes 的建议采用了一个实现。我不得不稍微改变一下,这样我就可以提前看看是否有新的部分开始了。我不知道是否有更好的方法来做到这一点,所以请加入评论:
类枚举文件:
def __init__(self, fh, lineno_start=1):
self.fh = fh
self.lineno = lineno_start
def __iter__(self):
return self
def __next__(self):
result = self.lineno, self.fh.readline()
if result[1] == '':
raise StopIteration
self.lineno += 1
return result
def mark(self):
self.marked_lineno = self.lineno
self.marked_file_position = self.fh.tell()
def recall(self):
self.lineno = self.marked_lineno
self.fh.seek(self.marked_file_position)
def section(self):
pos = self.fh.tell()
char = self.fh.read(1)
self.fh.seek(pos)
return char != '#'
然后读取文件并对每个部分进行如下处理:
# create enumerated object
e = EnumeratedFile(fh)
header = ""
for lineno, line, in e:
print("[{}] {}".format(lineno, line))
header = line.rstrip()
# HEADER1
if header.startswith("#HEADER1"):
# process header 1 lines
while e.section():
# get node line
lineno, line = next(e)
# do whatever needs to be done with the line
elif header.startswith("#HEADER2"):
# etc.