您可以尝试根据“A”是否适合内存并顺序扫描“B”来反转查找。
否则,将日志文件加载到包含 (timestamp, uniq_id, rest_of_line) 的两个表 (log_a, log_b) 的 SQLite3 数据库中,然后在 上执行 SQL 连接uniq_id
,并对结果执行您需要的任何处理。这将保持较低的内存开销,使 SQL 引擎能够进行连接,但当然需要有效地复制磁盘上的日志文件(但这在大多数系统上通常不是问题)
例子
import sqlite3
from datetime import datetime
db = sqlite3.connect(':memory:')
db.execute('create table log_a (timestamp, uniq_id, filesize)')
a = ['[2012-09-12 12:23:33] SOME_UNIQ_ID filesize']
for line in a:
timestamp, uniq_id, filesize = line.rsplit(' ', 2)
db.execute('insert into log_a values(?, ?, ?)', (timestamp, uniq_id, filesize))
db.commit()
db.execute('create table log_b (timestamp, uniq_id)')
b = ['[2012-09-12 13:23:33] SOME_UNIQ_ID']
for line in b:
timestamp, uniq_id = line.rsplit(' ', 1)
db.execute('insert into log_b values(?, ?)', (timestamp, uniq_id))
db.commit()
TIME_FORMAT = '[%Y-%m-%d %H:%M:%S]'
for matches in db.execute('select * from log_a join log_b using (uniq_id)'):
log_a_ts = datetime.strptime(matches[0], TIME_FORMAT)
log_b_ts = datetime.strptime(matches[3], TIME_FORMAT)
print matches[1], 'has a difference of', abs(log_a_ts - log_b_ts)
# 'SOME_UNIQ_ID has a difference of 1:00:00'
# '1:00:00' == datetime.timedelta(0, 3600)
注意:
- sqlite3 上的
.connect
应该是文件名
a
b
应该是你的文件