0
gzip_files=["complete-credit-ctrl-txn-SE06_2013-07-17-00.log.gz","complete-credit-ctrl-txn-SE06_2013-07-17-01.log.gz"]

def input_func():
    num = input("Enter the number of MIN series digits: ")
    return num

for i in gzip_files:
    import gzip
    f=gzip.open(i,'rb')
    file_content=f.read()
    digit = input_func()
    file_content = file_content.split('[')
    series = [] #list of MIN
    for line in  file_content:
        MIN = line.split('|')[13:15]
        for x in MIN:
            n = digit
            x = x[:n]
            series.append(x)
            break


    #count the number of occurences in the list named series
    for i in series:
        print i
    #end count

结果:

63928
63928
63929
63929
63928
63928

这只是结果的一部分。实际结果显示了一个非常长的列表。现在我只想列出唯一的数字并指定它在列表中显示的次数。所以

63928 = 4, 
63929 = 2
4

3 回答 3

4

我会collections.Counter在这里使用一个类。

>>> a = [1, 1, 1, 2, 3, 4, 4, 5]
>>> from collections import Counter
>>> Counter(a)
Counter({1: 3, 4: 2, 2: 1, 3: 1, 5: 1})

只需将您的series变量传递给Counter,您将获得一个字典,其中键是唯一元素,值是它们在列表中的出现次数。

collections.Counter是在 Python 2.7 中引入的。对 2.7 以下的版本使用以下列表推导

>>> [(elem, a.count(elem)) for elem in set(a)]
[(1, 3), (2, 1), (3, 1), (4, 2), (5, 1)]

然后,您可以将其转换为字典以便于访问。

>>> dict((elem, a.count(elem)) for elem in set(a))
{1: 3, 2: 1, 3: 1, 4: 2, 5: 1}
于 2013-07-24T05:39:12.203 回答
1

您可以Counter()为此使用 a 。

因此,这将打印您需要的内容:

from collections import Counter
c = Counter(series)
for item,count in c.items():
    print "%s = %s" % (item,count)
于 2013-07-24T05:38:08.830 回答
0

使用唯一数字作为键编译字典,并将它们的总出现次数作为值:

d = {} #instantiate dictionary

for s in series:
    # set default key and value if key does not exist in dictionary
    d.setdefault(s, 0)
    # increment by 1 for every occurrence of s
    d[s] += 1 

如果这个问题更复杂。map reduce(aka )的实现map fold可能是合适的。

地图缩减: https ://en.wikipedia.org/wiki/MapReduce

Pythonmap函数: http ://docs.python.org/2/library/functions.html#map

Pythonreduce函数: http ://docs.python.org/2/library/functions.html#reduce

于 2013-07-24T06:03:56.850 回答