python - 使用 Python 2.7.3 在输出中包含文本文件匹配的周围行

Question

我一直在开发一个有助于日志分析的程序。它使用正则表达式查找错误或失败消息并将它们打印到新的 .txt 文件中。但是，如果程序包括围绕匹配内容的顶部和底部 4 行，将会更加有益。我不知道该怎么做！这是现有程序的一部分：

def error_finder(filepath):
source = open(filepath, "r").readlines()
error_logs = set()
my_data = []
for line in source:
    line = line.strip()
    if re.search(exp, line):
        error_logs.add(line)

我假设需要在最后一行添加一些内容，但我已经为此工作了一段时间，要么没有完全应用自己，要么就是想不通。

对此的任何建议或帮助表示赞赏。

谢谢！

score 0 · Accepted Answer

0

为什么是蟒蛇？

grep -C4 '^your_regex$' logfile > outfile.txt

于 2013-04-01T14:43:03.330 回答

score 0 · Accepted Answer

一些评论：

我不确定为什么error_logs是集合而不是列表。
usingreadlines()将读取内存中的整个文件，这对于大文件来说效率低下。您应该能够一次遍历文件一行。
exp（您使用的 for re.search）没有在任何地方定义，但我假设它在您的代码中的其他地方。

无论如何，这里有完整的代码，可以在不读取内存中的整个文件的情况下做你想做的事情。它还将保留输入行的顺序。

import re
from collections import deque

exp = '\d'
# matches numbers, change to what you need

def error_finder(filepath, context_lines = 4):
  source = open(filepath, 'r')
  error_logs = []

  buffer = deque(maxlen=context_lines)
  lines_after = 0

  for line in source:
    line = line.strip()
    if re.search(exp, line):
      # add previous lines first
      for prev_line in buffer:
        error_logs.append(prev_line)
      # clear the buffer
      buffer.clear()
      # add current line
      error_logs.append(line)
      # schedule lines that follow to be added too
      lines_after = context_lines
    elif lines_after > 0:
      # a line that matched the regex came not so long ago
      lines_after -= 1
      error_logs.append(line)
    else:
      buffer.append(line)

  # maybe do something with error_logs? I'll just return it
  return error_logs

score 0 · Accepted Answer

我建议使用索引循环而不是每个循环，试试这个：

error_logs = list()
for i in range(len(source)):
    line = source[i].strip()
    if re.search(exp, line):
        error_logs.append((line,i-4,i+4))

在这种情况下，您的错误日志将包含 ('line of error', line index - 4, line index + 4)，因此您可以稍后从“source”中获取这些行

python - 使用 Python 2.7.3 在输出中包含文本文件匹配的周围行

3 回答 3

Related

Reference