在 Python 中,我有一个列表:
L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
我想确定出现次数最多的项目。我能够解决它,但我需要最快的方法来解决它。我知道对此有一个很好的 Pythonic 答案。
我很惊讶没有人提到最简单的解决方案,max()
关键是list.count
:
max(lst,key=lst.count)
例子:
>>> lst = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
>>> max(lst,key=lst.count)
4
这适用于 Python 3 或 2,但请注意,它只返回最频繁的项目,而不是频率。此外,在平局的情况下(即联合最频繁项目),仅返回单个项目。
尽管使用的时间复杂度max()
比使用Counter.most_common(1)
PM 2Ring注释要差,但该方法受益于快速C
实现,我发现这种方法对于短列表最快,但对于较大的列表则较慢(IPython 5.3 中显示的 Python 3.6 时序):
In [1]: from collections import Counter
...:
...: def f1(lst):
...: return max(lst, key = lst.count)
...:
...: def f2(lst):
...: return Counter(lst).most_common(1)
...:
...: lst0 = [1,2,3,4,3]
...: lst1 = lst0[:] * 100
...:
In [2]: %timeit -n 10 f1(lst0)
10 loops, best of 3: 3.32 us per loop
In [3]: %timeit -n 10 f2(lst0)
10 loops, best of 3: 26 us per loop
In [4]: %timeit -n 10 f1(lst1)
10 loops, best of 3: 4.04 ms per loop
In [5]: %timeit -n 10 f2(lst1)
10 loops, best of 3: 75.6 us per loop
在您的问题中,您要求最快的方法来做到这一点。正如已经反复证明的那样,尤其是在 Python 中,直觉不是可靠的指南:你需要衡量。
这是对几种不同实现的简单测试:
import sys
from collections import Counter, defaultdict
from itertools import groupby
from operator import itemgetter
from timeit import timeit
L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
def max_occurrences_1a(seq=L):
"dict iteritems"
c = dict()
for item in seq:
c[item] = c.get(item, 0) + 1
return max(c.iteritems(), key=itemgetter(1))
def max_occurrences_1b(seq=L):
"dict items"
c = dict()
for item in seq:
c[item] = c.get(item, 0) + 1
return max(c.items(), key=itemgetter(1))
def max_occurrences_2(seq=L):
"defaultdict iteritems"
c = defaultdict(int)
for item in seq:
c[item] += 1
return max(c.iteritems(), key=itemgetter(1))
def max_occurrences_3a(seq=L):
"sort groupby generator expression"
return max(((k, sum(1 for i in g)) for k, g in groupby(sorted(seq))), key=itemgetter(1))
def max_occurrences_3b(seq=L):
"sort groupby list comprehension"
return max([(k, sum(1 for i in g)) for k, g in groupby(sorted(seq))], key=itemgetter(1))
def max_occurrences_4(seq=L):
"counter"
return Counter(L).most_common(1)[0]
versions = [max_occurrences_1a, max_occurrences_1b, max_occurrences_2, max_occurrences_3a, max_occurrences_3b, max_occurrences_4]
print sys.version, "\n"
for vers in versions:
print vers.__doc__, vers(), timeit(vers, number=20000)
我机器上的结果:
2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
dict iteritems (4, 6) 0.202214956284
dict items (4, 6) 0.208412885666
defaultdict iteritems (4, 6) 0.221301078796
sort groupby generator expression (4, 6) 0.383440971375
sort groupby list comprehension (4, 6) 0.402786016464
counter (4, 6) 0.564319133759
所以看起来Counter
解决方案不是最快的。而且,至少在这种情况下,groupby
速度更快。defaultdict
很好,但为了方便,你要付一点钱;使用dict
带get
.
如果列表更大会怎样?添加L *= 10000
到上面的测试并将重复计数减少到 200:
dict iteritems (4, 60000) 10.3451900482
dict items (4, 60000) 10.2988479137
defaultdict iteritems (4, 60000) 5.52838587761
sort groupby generator expression (4, 60000) 11.9538850784
sort groupby list comprehension (4, 60000) 12.1327362061
counter (4, 60000) 14.7495789528
现在defaultdict
是明显的赢家。所以也许“get”方法的成本和就地加法的损失加起来(对生成的代码的检查留作练习)。
但是使用修改后的测试数据,唯一项目值的数量并没有发生如此大的变化,dict
并且defaultdict
与其他实现相比具有优势。那么如果我们使用更大的列表但大幅增加独特项目的数量会发生什么?将 L 的初始化替换为:
LL = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
L = []
for i in xrange(1,10001):
L.extend(l * i for l in LL)
dict iteritems (2520, 13) 17.9935798645
dict items (2520, 13) 21.8974409103
defaultdict iteritems (2520, 13) 16.8289561272
sort groupby generator expression (2520, 13) 33.853593111
sort groupby list comprehension (2520, 13) 36.1303369999
counter (2520, 13) 22.626899004
所以 nowCounter
显然比groupby
解决方案快,但仍然比 和 的版本iteritems
慢。dict
defaultdict
这些示例的重点不是产生最佳解决方案。关键是,通常没有一个最优的通用解决方案。此外,还有其他性能标准。解决方案之间的内存需求将有很大差异,并且随着输入大小的增加,内存需求可能成为算法选择中最重要的因素。
底线:这一切都取决于你需要衡量。
这是一个defaultdict
适用于 Python 2.5 及更高版本的解决方案:
from collections import defaultdict
L = [1,2,45,55,5,4,4,4,4,4,4,5456,56,6,7,67]
d = defaultdict(int)
for i in L:
d[i] += 1
result = max(d.iteritems(), key=lambda x: x[1])
print result
# (4, 6)
# The number 4 occurs 6 times
注意是否L = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 7, 7, 7, 7, 7, 56, 6, 7, 67]
有六个 4 和六个 7。但是,结果将是(4, 6)
六个 4。
如果您使用的是 Python 3.8 或更高版本,则可以使用statistics.mode()
返回遇到的第一个模式或statistics.multimode()
返回所有模式。
>>> import statistics
>>> data = [1, 2, 2, 3, 3, 4]
>>> statistics.mode(data)
2
>>> statistics.multimode(data)
[2, 3]
如果列表为空,则statistics.mode()
抛出 astatistics.StatisticsError
并statistics.multimode()
返回一个空列表。
请注意,在 Python 3.8 之前, (在 3.4 中引入)如果不完全是一个最常见的值,statistics.mode()
则会另外抛出 a 。statistics.StatisticsError
可能是most_common()方法
没有任何库或集合的简单方法
def mcount(l):
n = [] #To store count of each elements
for x in l:
count = 0
for i in range(len(l)):
if x == l[i]:
count+=1
n.append(count)
a = max(n) #largest in counts list
for i in range(len(n)):
if n[i] == a:
return(l[i],a) #element,frequency
return #if something goes wrong
我使用 Python 3.5.2 从带有此功能groupby
的模块中获得了最佳结果:itertools
from itertools import groupby
a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
def occurrence():
occurrence, num_times = 0, 0
for key, values in groupby(a, lambda x : x):
val = len(list(values))
if val >= occurrence:
occurrence, num_times = key, val
return occurrence, num_times
occurrence, num_times = occurrence()
print("%d occurred %d times which is the highest number of times" % (occurrence, num_times))
输出:
4 occurred 6 times which is the highest number of times
使用timeit
fromtimeit
模块进行测试。
我将此脚本用于我的测试number= 20000
:
from itertools import groupby
def occurrence():
a = [1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
occurrence, num_times = 0, 0
for key, values in groupby(a, lambda x : x):
val = len(list(values))
if val >= occurrence:
occurrence, num_times = key, val
return occurrence, num_times
if __name__ == '__main__':
from timeit import timeit
print(timeit("occurrence()", setup = "from __main__ import occurrence", number = 20000))
输出(最好的):
0.1893607140000313
简单和最好的代码:
def max_occ(lst,x):
count=0
for i in lst:
if (i==x):
count=count+1
return count
lst=[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
x=max(lst,key=lst.count)
print(x,"occurs ",max_occ(lst,x),"times")
输出: 4 出现 6 次
如果您在解决方案中使用 numpy 来加快计算速度,请使用以下命令:
import numpy as np
x = np.array([2,5,77,77,77,77,77,77,77,9,0,3,3,3,3,3])
y = np.bincount(x,minlength = max(x))
y = np.argmax(y)
print(y) #outputs 77
我想提出另一种看起来不错的解决方案,而且对于短名单来说很快。
def mc(seq=L):
"max/count"
max_element = max(seq, key=seq.count)
return (max_element, seq.count(max_element))
您可以使用 Ned Deily 提供的代码对其进行基准测试,这将为您提供最小测试用例的以下结果:
3.5.2 (default, Nov 7 2016, 11:31:36)
[GCC 6.2.1 20160830]
dict iteritems (4, 6) 0.2069783889998289
dict items (4, 6) 0.20462976200065896
defaultdict iteritems (4, 6) 0.2095775119996688
sort groupby generator expression (4, 6) 0.4473949929997616
sort groupby list comprehension (4, 6) 0.4367636879997008
counter (4, 6) 0.3618192010007988
max/count (4, 6) 0.20328268999946886
但请注意,它效率低下,因此对于大型列表来说真的很慢!
如果字符串中有多个字符都具有最高频率,则以下是我提出的解决方案。
mystr = input("enter string: ")
#define dictionary to store characters and their frequencies
mydict = {}
#get the unique characters
unique_chars = sorted(set(mystr),key = mystr.index)
#store the characters and their respective frequencies in the dictionary
for c in unique_chars:
ctr = 0
for d in mystr:
if d != " " and d == c:
ctr = ctr + 1
mydict[c] = ctr
print(mydict)
#store the maximum frequency
max_freq = max(mydict.values())
print("the highest frequency of occurence: ",max_freq)
#print all characters with highest frequency
print("the characters are:")
for k,v in mydict.items():
if v == max_freq:
print(k)
输入:“大家好”
输出:
{'o': 2, 'p': 2, 'h': 1, ' ': 0, 'e': 3, 'l': 3}
最高出现频率:3
字符是:
e
l
我的(简单)代码(三个月学习 Python):
def more_frequent_item(lst):
new_lst = []
times = 0
for item in lst:
count_num = lst.count(item)
new_lst.append(count_num)
times = max(new_lst)
key = max(lst, key=lst.count)
print("In the list: ")
print(lst)
print("The most frequent item is " + str(key) + ". Appears " + str(times) + " times in this list.")
more_frequent_item([1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67])
输出将是:
In the list:
[1, 2, 45, 55, 5, 4, 4, 4, 4, 4, 4, 5456, 56, 6, 7, 67]
The most frequent item is 4. Appears 6 times in this list.
可能是这样的:
testList = [1, 2, 3, 4, 2, 2, 1, 4, 4]
print(max(set(testList), key = testList.count))