-5

嗨,我下面的所有代码都允许我从数据中提取一些特定信息,我希望有人可以帮助我通过使用一段时间来更正确地编写这个,所以我可以对很多行执行此操作,现在我只有两行(数据)我我是初学者,所以如果有人可以帮忙请解释一下,这样我就可以学习,而不仅仅是复制和粘贴 =)

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import re 

tableau = []

data = "00:02:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxx@x.fr mid:6499"

result1 = {}
i = re.findall(r"^.[^\ ]*", data ) 
j = re.findall(r"\d+$", data ) 
k = re.findall(r"O:[^\ ]*", data ) 
r = re.findall(r"R:[^\ ]*", data )

result1 = {'Heure':i,'MID':j,'Source':k,'Destination':r} 

data = "00:03:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxxx@xxxxx.fr mid:6599"

result2 = {}
i = re.findall(r"^.[^\ ]*", data ) 
j = re.findall(r"\d+$", data ) 
k = re.findall(r"O:[^\ ]*", data ) 
r = re.findall(r"R:[^\ ]*", data )

result2 = {'Heure':i,'MID':j,'Source':k,'Destination':r} 

tableau.append(result1)
tableau.append(result2)

print tableau 
4

4 回答 4

6

这实际上用for循环做得更好:

data1 = "00:02:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxx@x.fr mid:6499"
data2 = "00:03:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxxx@xxxxx.fr mid:6599"
data_list = [ data1, data2 ] #store the data in a list so we can iterate over it
tableau = [] #create a list to hold our output
for data in data_list:  #iterate over the list, getting 1 "data" at a time
    #extract info we want
    i = re.findall(r"^.[^\ ]*", data ) 
    j = re.findall(r"\d+$", data ) 
    k = re.findall(r"O:[^\ ]*", data ) 
    r = re.findall(r"R:[^\ ]*", data )

    #create dictionary and append it to tableau
    tableau.append({'Heure':i,'MID':j,'Source':k,'Destination':r})

更高级的用户可能会在这里使用一个函数,它将字符串作为输入并返回所需数据的字典:

def extract(data):
    i = re.findall(r"^.[^\ ]*", data ) 
    j = re.findall(r"\d+$", data ) 
    k = re.findall(r"O:[^\ ]*", data ) 
    r = re.findall(r"R:[^\ ]*", data )
    return {'Heure':i,'MID':j,'Source':k,'Destination':r}

现在您可以在列表理解中使用它:

tableau = [extract(data) for data in data_list]

从评论中,您似乎正在从文件中获取数据行。那更好(谁想输入所有这些字符串?)。现在我们可以将其缩短为:

with open('filename') as fin:
    tableau = [extract(data) for data in fin]

usingwith引入了另一个 python 结构——(上下文管理器)。这有点复杂,但它是打开文件的首选方式。对于文件对象,它在功能上等同于:

fin = open('filename')
tableau = ...
fin.close()
于 2013-04-17T12:49:04.107 回答
3

这里。这会以一种更有效的方法解析您的数据,它使用一个函数,您也可以只提供数据列表。如果你想把它变成一个发电机,这也很容易。

import re

def parser(data):
    result = []
    for p in data:
        ms = re.match(r'(\S+).*?(O:\S+).*(R:\S+).*mid:(\d+)', p)
        if not ms:
            continue
        result.append({'Heure':ms.group(1), 'Source':ms.group(2), 'Destination':ms.group(3), 'MID':ms.group(4)})
    return result


data = ["00:02:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxx@x.fr mid:6499",
        "00:03:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxxx@xxxxx.fr mid:6599"]

print parser(data)

结果:

>>> 
[{'Source': 'O:NVS:FAXG3/', 'Destination': 'R:NVS:SMTP.0/xxxx@x.fr', 'Heure': '00:02:12.935', 'MID': '6499'},
{'Source': 'O:NVS:FAXG3/', 'Destination': 'R:NVS:SMTP.0/xxxxx@xxxxx.fr', 'Heure': '00:03:12.935', 'MID': '6599'}]

作为生成器:

import re

def parser(data):
    for p in data:
        ms = re.match(r'(\S+).*?(O:\S+).*(R:\S+).*mid:(\d+)', p)
        if not ms:
            continue
        yield {'Heure':ms.group(1), 'Source':ms.group(2), 'Destination':ms.group(3), 'MID':ms.group(4)}       

data = ["00:02:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxx@x.fr mid:6499",
        "00:03:12.935 mta         Messages       I Doc O:NVS:FAXG3/ R:NVS:SMTP.0/xxxxx@xxxxx.fr mid:6599"]

for r in parser(data):
    print r

结果:

>>> 
{'Source': 'O:NVS:FAXG3/', 'Destination': 'R:NVS:SMTP.0/xxxx@x.fr', 'Heure': '00:02:12.935', 'MID': '6499'}
{'Source': 'O:NVS:FAXG3/', 'Destination': 'R:NVS:SMTP.0/xxxxx@xxxxx.fr', 'Heure': '00:03:12.935', 'MID': '6599'}

在我的正则表达式中使用@mgilsons 回答想法:

def extract(data):
    ms = re.match(r'(\S+).*?(O:\S+).*(R:\S+).*mid:(\d+)', data)
    if not ms:
        raise Exception('Could not extract data')
    return {'Heure':ms.group(1), 'Source':ms.group(2), 'Destination':ms.group(3), 'MID':ms.group(4)}

tableau = [extract(data) for data in data_list] 
于 2013-04-17T12:57:04.283 回答
0

我不认为 while 是做你所期望的最好的方法。也许你可以使用

for data in dataArray: 

dataArray 包含您的数据字符串的位置。

于 2013-04-17T12:51:35.093 回答
0

感谢 Wooble 启发了这个While功能和示例。这个想法让我思考如何去做。

>>> def While(function, *args, **kwargs):
    while function(*args, **kwargs): pass


>>> def unstack(array):
    print(array.pop())
    return array

>>> While(unstack, ['world!', 'there', 'Hello'])
Hello
there
world!

>>> def fib(state):
    state.append(sum(state))
    print(state.pop(0))
    return state[0] < 1000

>>> While(fib, [0, 1])
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
>>> 

生成器也相当不错,所以WhileGenerator也创建了一个来满足我的好奇心。

>>> def WhileGenerator(function, *args, **kwargs):
    iterator = iter(function(*args, **kwargs))
    while next(iterator):
        yield next(iterator)


>>> import operator, functools, itertools
>>> for value in WhileGenerator(lambda a, b: functools.reduce(operator.add,
        itertools.zip_longest(a, b)),
        (True, True, True, False),
        'Hello there world!'.split()):
    print(value)


Hello
there
world!
>>> def fib_gen(state, limit):
    while True:
        yield state[0] < limit
        state.append(sum(state))
        yield state.pop(0)


>>> for value in WhileGenerator(fib_gen, [0, 1], 1000):
    print(value)


0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
>>> 
于 2013-04-17T15:17:48.667 回答