我对 Python 很陌生,我在这里找到了大多数问题的答案,但这个问题让我很困惑。
我正在使用 Python 处理日志文件,通常每一行都以日期/时间戳开头,例如:
[1/4/13 18:37:37:848 PST]
在 99% 的情况下,我可以逐行阅读,查找感兴趣的项目并相应地处理它们,但有时日志文件中的条目会包含一条消息,其中包含回车符/换行符,因此它将跨越多个线。
有没有一种方法可以轻松地“在时间戳之间”读取文件,以便在发生这种情况时将多行合并为一次读取?例如:
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
got some new line
characters mixed in
[1/4/13 18:37:37:848 PST] The last log entry
将被解读为四行而不是现在的六行。
提前感谢您的帮助。
克里斯,
更新....
myTestFile.log 包含上面的确切文本,这是我的脚本:
import sys, getopt, os, re
sourceFolder = 'C:/MaxLogs'
logFileName = sourceFolder + "/myTestFile.log"
lines = []
def timestamp_split(file):
pattern = re.compile("\[(0?[1-9]|[12][0-9]|3[01])(\/)(0?[1-9]|[12][0-9]|3[01])(\/)([0-9]{2})(\ )")
current = []
for line in file:
if not re.match(pattern,line):
if current:
yield "".join(current)
current == [line]
else:
current.append(line)
yield "".join(current)
print "--- START ----"
with open(logFileName) as file:
for entry in timestamp_split(file):
print entry
print "- Record Separator -"
print "--- DONE ----"
当我运行它时,我得到了这个:
--- START ----
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
- Record Separator -
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
- Record Separator -
[1/4/13 18:37:37:848 PST] A log entry
[1/4/13 18:37:37:848 PST] Another log entry
[1/4/13 18:37:37:848 PST] A log entry that somehow
[1/4/13 18:37:37:848 PST] The last log entry
- Record Separator -
--- DONE ----
我似乎在这些行中迭代了太多次,我期待(希望)的是:
--- START ----
[1/4/13 18:37:37:848 PST] A log entry
- Record Separator -
[1/4/13 18:37:37:848 PST] Another log entry
- Record Separator -
[1/4/13 18:37:37:848 PST] A log entry that somehow got some new line characters mixed in
- Record Separator -
[1/4/13 18:37:37:848 PST] The last log entry
- Record Separator -
--- DONE ----
正如评论中所讨论的,我在测试时不小心将not留在了与正则表达式模式的比较中,如果我删除它,那么我会得到所有让我更加困惑的部分行!
--- START ----
got some new line
characters mixed in
- Record Separator -
got some new line
characters mixed in
- Record Separator -
--- DONE ----