python - 在 Python 中提取两个日期之间的日志文件中的条目的最快方法是什么？

Question

我在每个日志文件的不同位置都有带有日期和时间的日志文件。我想将两个日期之间的各种日志文件中的条目打印到标准输出？例如：我想要前 24 小时的条目，或者我想要前一周的条目。

在 Python 中实现这一目标的有效方法是什么？

亲切的问候，亨德

score 2 · Accepted Answer

像这样的东西：

from datetime import date, timedelta, datetime

def extract_date(line):
    """Return a datetime from a log line"""
    fmt = '%Y-%m-%d %H:%M:%S'
    return datetime.strptime(line.split(' ')[:1], fmt) # make your parser

end_date = date.today()
start_date = start_date - timedelta(days=7))

with open('logfile.log') as f:
    lines = (line for line in f if start_date < extract_date(line) < end_date)
    # ...
    print list(lines)

score 1 · Accepted Answer

你的问题敲响了警钟。从以下文档heapq.merge：

将多个排序的输入合并到一个排序的输出中（例如，合并来自多个日志文件的时间戳条目）。返回排序值的迭代器。

正如 eyquem 所说，您的问题含糊不清，但是一旦您解析了日志文件（并可能对其进行规范化，以便将它们分类在一起），heapq.merge这听起来像是一个好工具。

score 0 · Accepted Answer

让我们从一个日志文件开始。目的是将两个日期之间的所有日志条目打印到标准输出。到目前为止，这是我所拥有的：

import re

startDate="2013-08-31 06:00:00"
endDate="2013-09-01 05:59:59"

date_re = re.compile(r'(\d+-\d+-\d+ \d+:\d+:\d+)')
with open("logfile.log", "r") as fh:
    for line in fh.readlines():
        match = date_re.search(line)
        if match:
           matchDate = match.group(1)
           if matchDate >= startDate and matchDate <= endDate:
               print match.string.strip()

python - 在 Python 中提取两个日期之间的日志文件中的条目的最快方法是什么？

3 回答 3

Related

Reference