python - collections.Counter: most_common 包括相等的计数

Question

在collections.Counter中，该方法most_common(n)仅返回列表中 n 个最频繁的项目。我正是需要这个，但我也需要包括相同的数量。

from collections import Counter
test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
-->Counter({'A': 3, 'C': 2, 'B': 2, 'D': 2, 'E': 1, 'G': 1, 'F': 1, 'H': 1})
test.most_common(2)
-->[('A', 3), ('C', 2)

我需要[('A', 3), ('B', 2), ('C', 2), ('D', 2)] ，因为在这种情况下它们的计数与 n=2 相同。我的真实数据是关于 DNA 代码的，可能非常大。我需要它有点效率。

score 10 · Accepted Answer

你可以这样做：

from itertools import takewhile

def get_items_upto_count(dct, n):
  data = dct.most_common()
  val = data[n-1][1] #get the value of n-1th item
  #Now collect all items whose value is greater than or equal to `val`.
  return list(takewhile(lambda x: x[1] >= val, data))

test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])

print get_items_upto_count(test, 2)
#[('A', 3), ('C', 2), ('B', 2), ('D', 2)]

score 0 · Accepted Answer

对于较小的集合，只需编写一个简单的生成器：

>>> test = Counter(["A","A","A","B","B","C","C","D","D","E","F","G","H"])
>>> g=(e for e in test.most_common() if e[1]>=2)
>>> list(g)
[('A', 3), ('D', 2), ('C', 2), ('B', 2)]

对于更大的集合，请使用ifilter（或仅filter在 Python 3 上使用）：

>>> list(ifilter(lambda t: t[1]>=2, test.most_common()))
[('A', 3), ('C', 2), ('B', 2), ('D', 2)]

或者，由于most_common已经订购，只需使用 for 循环并在生成器中的所需条件下中断：

def fc(d, f):
    for t in d.most_common():
        if not f(t[1]): 
            break
        yield t

>>> list(fc(test, lambda e: e>=2)) 
[('A', 3), ('B', 2), ('C', 2), ('D', 2)]

python - collections.Counter: most_common 包括相等的计数

2 回答 2

Related

Reference