python - 查找列表中出现次数最多的项目

Question

在 Python 中，我有一个列表：

L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]

我想确定出现次数最多的项目。我能够解决它，但我需要最快的方法来解决它。我知道对此有一个很好的 Pythonic 答案。

score 166 · Accepted Answer

我很惊讶没有人提到最简单的解决方案，max()关键是list.count：

max(lst,key=lst.count)

例子：

>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
>>> max(lst,key=lst.count)
4

这适用于 Python 3 或 2，但请注意，它只返回最频繁的项目，而不是频率。此外，在平局的情况下（即联合最频繁项目），仅返回单个项目。

尽管使用的时间复杂度max()比使用Counter.most_common(1)PM 2Ring注释要差，但该方法受益于快速C实现，我发现这种方法对于短列表最快，但对于较大的列表则较慢（IPython 5.3 中显示的 Python 3.6 时序）：

In [1]: from collections import Counter
   ...: 
   ...: def f1(lst):
   ...:     return max(lst, key = lst.count)
   ...: 
   ...: def f2(lst):
   ...:     return Counter(lst).most_common(1)
   ...: 
   ...: lst0 = [1,2,3,4,3]
   ...: lst1 = lst0[:] * 100
   ...: 

In [2]: %timeit -n 10 f1(lst0)
10 loops, best of 3: 3.32 us per loop

In [3]: %timeit -n 10 f2(lst0)
10 loops, best of 3: 26 us per loop

In [4]: %timeit -n 10 f1(lst1)
10 loops, best of 3: 4.04 ms per loop

In [5]: %timeit -n 10 f2(lst1)
10 loops, best of 3: 75.6 us per loop

score 126 · Accepted Answer

from collections import Counter
most_common,num_most_common = Counter(L).most_common(1)[0] # 4, 6 times

对于较旧的 Python 版本（< 2.7），您可以使用此配方来创建Counter类。

score 33 · Accepted Answer

在您的问题中，您要求最快的方法来做到这一点。正如已经反复证明的那样，尤其是在 Python 中，直觉不是可靠的指南：你需要衡量。

这是对几种不同实现的简单测试：

import sys
from collections import Counter, defaultdict
from itertools import groupby
from operator import itemgetter
from timeit import timeit

L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]

def max_occurrences_1a(seq=L):
    "dict iteritems"
    c = dict()
    for item in seq:
        c[item] = c.get(item, 0) + 1
    return max(c.iteritems(), key=itemgetter(1))

def max_occurrences_1b(seq=L):
    "dict items"
    c = dict()
    for item in seq:
        c[item] = c.get(item, 0) + 1
    return max(c.items(), key=itemgetter(1))

def max_occurrences_2(seq=L):
    "defaultdict iteritems"
    c = defaultdict(int)
    for item in seq:
        c[item] += 1
    return max(c.iteritems(), key=itemgetter(1))

def max_occurrences_3a(seq=L):
    "sort groupby generator expression"
    return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))

def max_occurrences_3b(seq=L):
    "sort groupby list comprehension"
    return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))

def max_occurrences_4(seq=L):
    "counter"
    return Counter(L).most_common(1)[0]

versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]

print sys.version, "\n"

for vers in versions:
    print vers.__doc__, vers(), timeit(vers, number=20000)

我机器上的结果：

2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] 

dict iteritems (4, 6) 0.202214956284
dict items (4, 6) 0.208412885666
defaultdict iteritems (4, 6) 0.221301078796
sort groupby generator expression (4, 6) 0.383440971375
sort groupby list comprehension (4, 6) 0.402786016464
counter (4, 6) 0.564319133759

所以看起来Counter解决方案不是最快的。而且，至少在这种情况下，groupby速度更快。defaultdict很好，但为了方便，你要付一点钱；使用dict带get.

如果列表更大会怎样？添加L *= 10000到上面的测试并将重复计数减少到 200：

dict iteritems (4, 60000) 10.3451900482
dict items (4, 60000) 10.2988479137
defaultdict iteritems (4, 60000) 5.52838587761
sort groupby generator expression (4, 60000) 11.9538850784
sort groupby list comprehension (4, 60000) 12.1327362061
counter (4, 60000) 14.7495789528

现在defaultdict是明显的赢家。所以也许“get”方法的成本和就地加法的损失加起来（对生成的代码的检查留作练习）。

但是使用修改后的测试数据，唯一项目值的数量并没有发生如此大的变化，dict并且defaultdict与其他实现相比具有优势。那么如果我们使用更大的列表但大幅增加独特项目的数量会发生什么？将 L 的初始化替换为：

LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
L = []
for i in xrange(1,10001):
    L.extend(l * i for l in LL)

dict iteritems (2520, 13) 17.9935798645
dict items (2520, 13) 21.8974409103
defaultdict iteritems (2520, 13) 16.8289561272
sort groupby generator expression (2520, 13) 33.853593111
sort groupby list comprehension (2520, 13) 36.1303369999
counter (2520, 13) 22.626899004

所以 nowCounter显然比groupby解决方案快，但仍然比和的版本iteritems慢。dictdefaultdict

这些示例的重点不是产生最佳解决方案。关键是，通常没有一个最优的通用解决方案。此外，还有其他性能标准。解决方案之间的内存需求将有很大差异，并且随着输入大小的增加，内存需求可能成为算法选择中最重要的因素。

底线：这一切都取决于你需要衡量。

score 17 · Accepted Answer

这是一个defaultdict适用于 Python 2.5 及更高版本的解决方案：

from collections import defaultdict

L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
d = defaultdict(int)
for i in L:
    d[i] += 1
result = max(d.iteritems(), key=lambda x: x[1])
print result
# (4, 6)
# The number 4 occurs 6 times

注意是否L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 7, 7, 7, 7, 7, 56, 6, 7, 67] 有六个 4 和六个 7。但是，结果将是(4, 6) 六个 4。

score 7 · Accepted Answer

如果您使用的是 Python 3.8 或更高版本，则可以使用statistics.mode()返回遇到的第一个模式或statistics.multimode()返回所有模式。

>>> import statistics
>>> data = [1, 2, 2, 3, 3, 4] 
>>> statistics.mode(data)
2
>>> statistics.multimode(data)
[2, 3]

如果列表为空，则statistics.mode()抛出 astatistics.StatisticsError并statistics.multimode()返回一个空列表。

请注意，在 Python 3.8 之前，（在 3.4 中引入）如果不完全是一个最常见的值，statistics.mode()则会另外抛出 a 。statistics.StatisticsError

score 2 · Accepted Answer

2

可能是most_common()方法

于 2011-08-08T19:20:15.990 回答

score 2 · Accepted Answer

没有任何库或集合的简单方法

def mcount(l):
  n = []                  #To store count of each elements
  for x in l:
      count = 0
      for i in range(len(l)):
          if x == l[i]:
              count+=1
      n.append(count)
  a = max(n)              #largest in counts list
  for i in range(len(n)):
      if n[i] == a:
          return(l[i],a)  #element,frequency
  return                  #if something goes wrong

score 1 · Accepted Answer

我使用 Python 3.5.2 从带有此功能groupby的模块中获得了最佳结果：itertools

from itertools import groupby

a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]

def occurrence():
    occurrence, num_times = 0, 0
    for key, values in groupby(a, lambda x : x):
        val = len(list(values))
        if val >= occurrence:
            occurrence, num_times =  key, val
    return occurrence, num_times

occurrence, num_times = occurrence()
print("%d occurred %d times which is the highest number of times" % (occurrence, num_times))

输出：

4 occurred 6 times which is the highest number of times

使用timeitfromtimeit模块进行测试。

我将此脚本用于我的测试number= 20000：

from itertools import groupby

def occurrence():
    a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
    occurrence, num_times = 0, 0
    for key, values in groupby(a, lambda x : x):
        val = len(list(values))
        if val >= occurrence:
            occurrence, num_times =  key, val
    return occurrence, num_times

if __name__ == '__main__':
    from timeit import timeit
    print(timeit("occurrence()", setup = "from __main__ import occurrence",  number = 20000))

输出（最好的）：

0.1893607140000313

score 1 · Accepted Answer

简单和最好的代码：

def max_occ(lst,x):
    count=0
    for i in lst:
        if (i==x):
            count=count+1
    return count

lst=[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
x=max(lst,key=lst.count)
print(x,"occurs ",max_occ(lst,x),"times")

输出： 4 出现 6 次

score 1 · Accepted Answer

如果您在解决方案中使用 numpy 来加快计算速度，请使用以下命令：

import numpy as np
x = np.array([2,5,77,77,77,77,77,77,77,9,0,3,3,3,3,3])
y = np.bincount(x,minlength = max(x))
y = np.argmax(y)   
print(y)  #outputs 77

score 0 · Accepted Answer

我想提出另一种看起来不错的解决方案，而且对于短名单来说很快。

def mc(seq=L):
    "max/count"
    max_element = max(seq, key=seq.count)
    return (max_element, seq.count(max_element))

您可以使用 Ned Deily 提供的代码对其进行基准测试，这将为您提供最小测试用例的以下结果：

3.5.2 (default, Nov  7 2016, 11:31:36) 
[GCC 6.2.1 20160830] 

dict iteritems (4, 6) 0.2069783889998289
dict items (4, 6) 0.20462976200065896
defaultdict iteritems (4, 6) 0.2095775119996688
sort groupby generator expression (4, 6) 0.4473949929997616
sort groupby list comprehension (4, 6) 0.4367636879997008
counter (4, 6) 0.3618192010007988
max/count (4, 6) 0.20328268999946886

但请注意，它效率低下，因此对于大型列表来说真的很慢！

score 0 · Accepted Answer

如果字符串中有多个字符都具有最高频率，则以下是我提出的解决方案。

mystr = input("enter string: ")
#define dictionary to store characters and their frequencies
mydict = {}
#get the unique characters
unique_chars = sorted(set(mystr),key = mystr.index)
#store the characters and their respective frequencies in the dictionary
for c in unique_chars:
    ctr = 0
    for d in mystr:
        if d != " " and d == c:
            ctr = ctr + 1
    mydict[c] = ctr
print(mydict)
#store the maximum frequency
max_freq = max(mydict.values())
print("the highest frequency of occurence: ",max_freq)
#print all characters with highest frequency
print("the characters are:")
for k,v in mydict.items():
    if v == max_freq:
        print(k)

输入：“大家好”

输出：

{'o': 2, 'p': 2, 'h': 1, ' ': 0, 'e': 3, 'l': 3}

最高出现频率：3

字符是：

e

l

score 0 · Accepted Answer

我的（简单）代码（三个月学习 Python）：

def more_frequent_item(lst):
    new_lst = []
    times = 0
    for item in lst:
        count_num = lst.count(item)
        new_lst.append(count_num)
        times = max(new_lst)
    key = max(lst, key=lst.count)
    print("In the list: ")
    print(lst)
    print("The most frequent item is " + str(key) + ". Appears " + str(times) + " times in this list.")


more_frequent_item([1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67])

输出将是：

In the list: 
[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
The most frequent item is 4. Appears 6 times in this list.

score -3 · Accepted Answer

可能是这样的：

testList = [1, 2, 3, 4, 2, 2, 1, 4, 4] print(max(set(testList), key = testList.count))

python - 查找列表中出现次数最多的项目

14 回答 14

Related

Reference