一些评论:
- 我不确定为什么
error_logs
是集合而不是列表。
- using
readlines()
将读取内存中的整个文件,这对于大文件来说效率低下。您应该能够一次遍历文件一行。
exp
(您使用的 for re.search
)没有在任何地方定义,但我假设它在您的代码中的其他地方。
无论如何,这里有完整的代码,可以在不读取内存中的整个文件的情况下做你想做的事情。它还将保留输入行的顺序。
import re
from collections import deque
exp = '\d'
# matches numbers, change to what you need
def error_finder(filepath, context_lines = 4):
source = open(filepath, 'r')
error_logs = []
buffer = deque(maxlen=context_lines)
lines_after = 0
for line in source:
line = line.strip()
if re.search(exp, line):
# add previous lines first
for prev_line in buffer:
error_logs.append(prev_line)
# clear the buffer
buffer.clear()
# add current line
error_logs.append(line)
# schedule lines that follow to be added too
lines_after = context_lines
elif lines_after > 0:
# a line that matched the regex came not so long ago
lines_after -= 1
error_logs.append(line)
else:
buffer.append(line)
# maybe do something with error_logs? I'll just return it
return error_logs