0

我想从以下格式解析日志文件,有多个重复的 url,我想计算每个 url 的总大小以及计算特定类型 url 的大小,最好的方法是什么?

/images/img81a.jpg 6620
/images/img88a.jpg 6990
/images/img80b.jpg 5909 
/images/swb-30-furniture.gif 6216 
/images/button-arrow.png 498
/images/button-arrow-down.png 484 
/images/img81a.jpg 6620 
/images/img80b.jpg 5909 
/images/back-to-top_off.gif 1506 
/images/new-logo.gif 3377 
/images/img81a.jpg 6620        

Result:

Total size computation: 11503

所有特定类型的 url 大小计算:

 /images/img81a.jpg 19860
 /images/img80b.jpg 11818
 like wise ...

我将所有大小值附加到列表并执行加法以获得总大小结果,但是对于类似类型的 url,大小计算需要二维字典,我猜。我不打算创建它。

 a['/images/img81a.jpg'][6620] = 3
 a['/images/img88a.jpg'][6990] = 1
 a['/images/img80b.jpg'][5909] = 2
 like wise ...
4

1 回答 1

1

假设您将所有行都放在一个列表中:

with open('log.txt') as f:
    dico = dict()
    total_value = 0
    for line in f:
        #Feeding the dictionary
        split_array = line.split()
        possible_key = split_array[0]
        value = int(split_array[1])

        #If url has already been processed, update the value
        #Else, initialize the entry in the dictionary => default=0
        dico[possible_key] = dico.get(possible_key, default=0) + value

        #Updating the global sum
        total_value = total_value + value

用法:

dico['/images/img81a.jpg'] => 19860
于 2013-02-04T23:44:28.107 回答