3

例如,列表to_be包括:3 of "a", 4 of "b", 3 of "c", 5 of "d"...

to_be = ["a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "d", "d", "d", "d", "d", ...]

现在我希望它是这样的:

done = ["a", "b", "c", "d", ... , "a", "b", "c", "d", ... , "b", "d", ...] (notice: some items are more than others as in amounts, but they need to be still in a pre-defined order, alphabetically for example)

最快的方法是什么?

4

5 回答 5

12

假设我理解你想要什么,它可以通过结合itertools.zip_longest,itertools.groupby和相对容易地完成itertools.chain.from_iterable()

我们首先将项目分组("a"s、"b"s 等),我们将它们压缩以按照您想要的顺序获取它们(每组一个),使用链生成单个列表,然后删除None压缩引入的值。

>>> [item for item in itertools.chain.from_iterable(itertools.zip_longest(*[list(x) for _, x in itertools.groupby(to_be)])) if item]
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']

但是,您可能希望分离出一些列表推导以使其更具可读性:

>>> groups = itertools.zip_longest(*[list(x) for _, x in itertools.groupby(to_be)])
>>> [item for item in itertools.chain.from_iterable(groups) if item]
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']

(给定的版本是 3.x,对于 2.x,你会想要izip_longest()。)

与往常一样,如果您期望空字符串、0 等......那么您将想要这样做if item is not None,并且如果您需要保持None值的完整,请创建一个哨兵对象并检查其身份。

您还可以使用文档中给出roundrobin()配方,作为压缩的替代方法,这使得它变得如此简单:

>>> list(roundrobin(*[list(x) for _, x in itertools.groupby(to_be)]))
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']

最后一点,观察者可能会注意到我从groupby()生成器中制作列表,这可能看起来很浪费,原因来自文档

返回的组本身就是一个迭代器,它与 groupby() 共享底层迭代。因为源是共享的,所以当 groupby() 对象前进时,之前的组不再可见。因此,如果以后需要该数据,则应将其存储为列表。

于 2012-11-15T04:05:44.163 回答
2
to_be = ["a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "d", "d", "d", "d", "d"]
counts = collections.Counter(to_be)
answer = []
while counts:
    answer.extend(sorted(counts))
    for k in counts:
        counts[k] -= 1
    counts = {k:v for k,v in counts.iteritems() if v>0}

现在,answer看起来像这样:

['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']
于 2012-11-15T04:07:39.463 回答
1

我不确定这是否最快,但这是我的尝试:

>>> d = defaultdict(int)
>>> def sort_key(a):
...     d[a] += 1
...     return d[a],a
...

>>> sorted(to_be,key=sort_key)
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']

包裹在一个函数中:

def weird_sort(x):
    d = defaultdict(int)
    def sort_key(a):
        d[a] += 1
        return (d[a],a)
    return sorted(x,key=sort_key)

当然,这要求迭代中的元素是可散列的。

于 2012-11-15T04:07:45.227 回答
0

比 Lattyware 的优雅一点:

import collections
def rearrange(l):
    counts = collections.Counter(l)
    output = []
    while (sum([v for k,v in counts.items()]) > 0):
        output.extend(sorted([k for k, v in counts.items() if v > 0))
        for k in counts:
            counts[k] = counts[k] - 1 if counts[k] > 0 else 0
    return counts
于 2012-11-15T04:08:03.643 回答
0

“手动和状态机”执行此操作应该更有效 - 但对于相对较小的列表(<5000),您应该毫无问题地利用 Python 的优势来执行此操作:

to_be = ["a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "d", "d", "d", "d", "d","e", "e"]


def do_it(lst):
    lst = lst[:]
    result = []

    while True:
        group = set(lst)
        result.extend(sorted(group))
        for element in group:
            del lst[lst.index(element)]
        if not lst:
            break
    return result

done = do_it(to_be)

上述函数的“大 O”复杂度应该非常大。我还没有想弄清楚。

于 2012-11-15T04:08:07.590 回答