1

我有格式为的日志消息

[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level]    Message Desciption : This is the message.

我想在表单中创建字典

{'time stamp': 2013-Mar-05 18:21:45.415053, 'ThreadId': 4139, 'Module name': ModuleA , 'Message Description': My Message, 'Message' : This is the message }

我尝试在空格上使用 split 来拆分日志消息,然后我可以选择标记并制作列表。像这样的东西:

for i in line1.split(" "):

这将给出这样的令牌

['2013-Mar-05', '18:21:45.415053]', '(ThreadID)', '<Module name>', '[Logging level]',    'Message Desciption', ':', 'This is the message.']

然后挑选令牌并放入所需的列表中。

在这种情况下,有没有更好的方法来提取令牌。这里有一个模式,就像time stamp将在[]括号中,threadId将在里面()module name将在里面<>。我们可以利用这些信息直接提取代币吗?

4

5 回答 5

2

这是与@Oli 非常相似的答案,但是正则表达式更具可读性,我使用groupdict()因此无需形成新字典,因为它是由正则表达式创建的。日志字符串从左到右解析,消耗每个匹配项。

fmt = re.compile(
      r'\[(?P<timestamp>.+?)\]\s+' # Save everything within [] to group timestamp
      r'\((?P<thread_id>.+?)\)\s+' # Save everything within () to group thread_id
      r'\<(?P<module_name>.+?)\>\s+' # Save everything within <> to group module_name
      r'\[(?P<log_level>.+?)\]\s+' # Save everything within [] to group to log_level
      r'(?P<message_desc>.+?)(\s:\s|$)' # Save everything before \s:\s or end of line to           group message_desc,
      r'(?P<message>.+$)?' # if there was a \s:\s, save everything after it to group   message. This last group is optional
      )

log = '[2013-Mar-05 18:21:45.415053] (4139) <ModuleA> [DEBUG]  Message Desciption : An example message!'

match = fmt.search(log)

print match.groupdict()

例子:

log = '[2013-Mar-05 18:21:45.415053] (4139) <ModuleA> [DEBUG]  Message Desciption : An       example message!'
match = fmt.search(log)

print match.groupdict() 
{'log_level': 'DEBUG',
 'message': 'An example message!',
 'module_name': 'ModuleA',
 'thread_id': '4139',
 'timestamp': '2013-Mar-05 18:21:45.415053'}

此答案评论中的第一个测试字符串示例

log = '[2013-Mar-05 18:21:45.415053] (0x7aa5e3a0) <Logger> [Info] Opened settings file : /usr/local/ABC/ABC/var/loggingSettings.ini'

match = fmt.search(log)

print match.groupdict()
{'log_level': 'Info',
 'message': '/usr/local/ABC/ABC/var/loggingSettings.ini',
 'message_desc': 'Opened settings file',
 'module_name': 'Logger',
 'thread_id': '0x7aa5e3a0',
 'timestamp': '2013-Mar-05 18:21:45.415053'}

此答案评论中的第二个测试字符串示例:

log = '[2013-Mar-05 18:21:45.415053] (0x7aa5e3a0) <Logger> [Info] Creating a new settings file'

match = fmt.search(log)

print match.groupdict()
{'log_level': 'Info',
 'message': None,
 'message_desc': 'Creating a new settings file',
 'module_name': 'Logger',
 'thread_id': '0x7aa5e3a0',
 'timestamp': '2013-Mar-05 18:21:45.415053'}

编辑:已修复以使用 OP 的示例。

于 2013-03-06T11:16:43.850 回答
1

使用正则表达式,希望对您有所帮助!

import re

string = '[2013-Mar-05 18:21:45.415053] (4444) <Module name> [Logging level]  Message Desciption : This is the message.'

regex = re.compile(r'\[(?P<timestamp>[^\]]*?)\] \((?P<threadid>[^\)]*?)\) \<(?P<modulename>[^\>]*?)\>[^:]*?\:(?P<message>.*?)$')

for match in regex.finditer(string):
    dict = {'timestamp': match.group("timestamp"), 'threadid': match.group("threadid"), 'modulename': match.group('modulename'), 'message': match.group('message')}

print dict

输出:

{'timestamp': '2013-Mar-05 18:21:45.415053', 'message': ' This is the message.', 'modulename': 'Module name', 'threadid': '4444'}

说明:我正在使用组来标记我的正则表达式的一部分,以便以后在脚本中使用。有关更多信息,请参阅http://docs.python.org/2/library/re.html。基本上我从左到右遍历这条线,寻找分隔符 [,<,( 等。

于 2013-03-06T10:12:30.330 回答
0

如果您有一致的日志格式,为什么不将宏用于索引?

例子

DATE = 0
TIME = 1
TID = 2
MODULE = 3
LOG_LVL = 4
MESSAGE = 5 (or more like 7)

log = ['2013-Mar-05', '18:21:45.415053]', '(ThreadID)', '<Module name>', '[Logging level]',    'Message Desciption', ':', 'This is the message.']

然后只需使用 log[DATE] 访问或不访问?最终,在使用基于索引的访问之前,在要拼接在一起的块上使用“”.join。然后,您可以随心所欲地填充您的字典。

它不像 Oli 的解决方案那样简洁,但它可以完成工作:)

于 2013-03-06T10:44:21.277 回答
0

虽然在这种情况下使用re更简单,但如果你不想使用它,
试试这个,

string = '[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level]    Message Desciption : This is the message.'

# the main function, return the items between start and end.
def get_between(start, end, string):
    in_between = 0
    c_str = ''
    items = []
    indexes = []
    for i in range(len(string)):
        char = string[i]
        if char == start:
            if in_between == 0: indexes.append(i) # if starting bracket
            in_between += 1
        elif char == end:
            in_between -= 1
            if in_between == 0: indexes.append(i) # if ending bracket
        elif in_between > 0:
            c_str += char
        if in_between == 0 and c_str != '': # after ending bracket
            items.append(c_str)
            c_str = ''
    return items, indexes

# As both Time Stamp, and Logging Level are between []s,
# And as message comes after Logging Level,
data,last_indexes = get_between('[',']',string)
time_stamp, logging = data
# We only want the first item in the first list
thread_id = get_between('(',')',string)[0][0]
module = get_between('<','>',string)[0][0]

last = max(last_indexes)
# extracting the message    
message = ''.join(string[last+1:].split(':')[1:]).strip()

mydict = {'Time':time_stamp, 'Thread ID':thread_id,'Module':module,'Logging Level':logging,'Message':message}
print mydict

在这里,我们得到两个“分类器”之间的字符并使用它们......

于 2013-03-06T10:09:50.867 回答
0

下面的呢?(评论解释了发生了什么)

log = '[2013-Mar-05 18:21:45.415053] (ThreadID) <Module name> [Logging level]    Message Description : This is the message.'

# Define functions on how to proces the different kinds of tokens
time_stamp = logging_level = lambda x: x.strip('[ ]')
thread_ID = lambda x: x.strip('( )')
module_name = lambda x: x.strip('< >')
message_description = message = lambda x: x

# Names of the tokens used to make the dictionary keys
keys = ['time stamp', 'ThreadId',
        'Module name', 'Logging level',
        'Message Description', 'Message']
# Define functions on how to process the message
funcs = [time_stamp, thread_ID,
         module_name, logging_level,
         message_description, message]
# Define the tokens at which to split the message
split_on = [']', ')', '>', ']', ':']

msg_dict = {}

for i in range(len(split_on)):
    # Split up the log one token at a time
    temp, log = log.split(split_on[i], 1)
    # Process the token using the defined function
    msg_dict[keys[i]] = funcs[i](temp) 

msg_dict[keys[i]] = funcs[i](log) # Process the last token
print msg_dict
于 2013-03-06T10:03:25.670 回答