python - 具有相邻差异的python排序

Question

我有一个项目清单

item = [a,a,a,b,b,b,b,c,c,c,c,c,e,e,e,e,e,e]

我想用混合顺序对其进行排序，所以相邻允许最大重复两次，比如

[a,a,b,a,b,b,c,c,b,b,c,e,c,c,e,e,e,e,e]

因为没有更多的项目可以用 e 洗牌，所以 e 将保持与相邻的重复。

有什么快速的方法来排序吗？

编辑

为了清楚起见，举一个现实生活中的例子，在笔记本电脑类别中，我有 100 种来自 IBM 的产品，10 种来自 Acer 的产品，6 种来自 Apple 的产品，我想对相同的品牌进行分类以尽可能混合。

例如，我有未排序的列表

[{brand:"ibm", "id":1},{brand:"ibm", "id":2},{brand:"ibm", "id":3},{brand:"ibm", "id":4},{brand:"ibm", "id":5},{brand:"ibm", "id":6},{brand:"acer", "id":7},{brand:"acer", "id":8},{brand:"acer", "id":9},{brand:"acer", "id":10},{brand:"apple", "id":11},{brand:"apple", "id":12}]

目标结果，只要同一个品牌不相邻，比如前10个都是同一个品牌，但2-3个同品牌相邻就可以，

[{brand:"ibm", "id":1},,{brand:"acer", "id":7},{brand:"ibm", "id":2},{brand:"ibm", "id":3},{brand:"acer", "id":8},{brand:"apple", "id":12}{brand:"ibm", "id":4},{brand:"acer", "id":9},{brand:"ibm", "id":5},{brand:"ibm", "id":6},{brand:"acer", "id":10}]

最好不要使用随机，而是使用确定性排序，所以每次用户仍然看到相同的顺序，但这不是必须的，因为它可以保存到缓存中。

谢谢

score 6 · Accepted Answer

第二次编辑

好的，现在我明白了。当它真的不是那样时，你让这听起来像一个洗牌。这是一个答案，涉及更多一点。

首先我要介绍一下pprint。这只是print很好地格式化事物的一个版本：

from pprint import pprint
pprint(items)
#>>> [{'brand': 'ibm', 'id': 1},
#>>>  {'brand': 'ibm', 'id': 2},
#>>>  {'brand': 'ibm', 'id': 3},
#>>>  {'brand': 'ibm', 'id': 4},
#>>>  {'brand': 'ibm', 'id': 5},
#>>>  {'brand': 'ibm', 'id': 6},
#>>>  {'brand': 'acer', 'id': 7},
#>>>  {'brand': 'acer', 'id': 8},
#>>>  {'brand': 'acer', 'id': 9},
#>>>  {'brand': 'acer', 'id': 10},
#>>>  {'brand': 'apple', 'id': 11},
#>>>  {'brand': 'apple', 'id': 12}]

有了这个，我们开始吧。

我们要按品牌对商品进行分组：

from collections import defaultdict

brand2items = defaultdict(list)

for item in items:
    brand2items[item["brand"]].append(item)

pprint(brand2items)
#>>> {'acer': [{'brand': 'acer', 'id': 7},
#>>>           {'brand': 'acer', 'id': 8},
#>>>           {'brand': 'acer', 'id': 9},
#>>>           {'brand': 'acer', 'id': 10}],
#>>>  'apple': [{'brand': 'apple', 'id': 11}, {'brand': 'apple', 'id': 12}],
#>>>  'ibm': [{'brand': 'ibm', 'id': 1},
#>>>          {'brand': 'ibm', 'id': 2},
#>>>          {'brand': 'ibm', 'id': 3},
#>>>          {'brand': 'ibm', 'id': 4},
#>>>          {'brand': 'ibm', 'id': 5},
#>>>          {'brand': 'ibm', 'id': 6}]}

然后我们可以获取值，因为我们不关心密钥：

items_by_brand = list(brand2items.values())

pprint(items_by_brand)
#>>> [[{'brand': 'apple', 'id': 11}, {'brand': 'apple', 'id': 12}],
#>>>  [{'brand': 'ibm', 'id': 1},
#>>>   {'brand': 'ibm', 'id': 2},
#>>>   {'brand': 'ibm', 'id': 3},
#>>>   {'brand': 'ibm', 'id': 4},
#>>>   {'brand': 'ibm', 'id': 5},
#>>>   {'brand': 'ibm', 'id': 6}],
#>>>  [{'brand': 'acer', 'id': 7},
#>>>   {'brand': 'acer', 'id': 8},
#>>>   {'brand': 'acer', 'id': 9},
#>>>   {'brand': 'acer', 'id': 10}]]

现在我们要交错结果。基本思想是我们希望更频繁地从最大的池中获取，因为它需要最长的时间才能耗尽。所以每次迭代我们都想取其中最长的pop一个，只是我们不想重复。我们可以通过取两个不同的组（最大的两个）并将它们的结果交错来做到这一点。

当所有组都没有任何物品时，我们停止。

from heapq import nlargest

shufflatored = []
while any(items_by_brand):
    items1, items2 = nlargest(2, items_by_brand, key=len)

    if items1: shufflatored.append(items1.pop())
    if items2: shufflatored.append(items2.pop())

该heapq模块是一个鲜为人知但血腥辉煌的模块。事实上，通过相当多的努力，这可以通过保持items_by_brand为堆来提高效率。然而，这并不值得付出努力，因为用于处理堆的其他工具不需要keys，这需要模糊的解决方法。

就是这样了。如果你想允许加倍，你可以替换

    if items1: shufflatored.append(items1.pop())
    if items2: shufflatored.append(items2.pop())

和

    if items1: shufflatored.append(items1.pop())
    if items1: shufflatored.append(items1.pop())
    if items2: shufflatored.append(items2.pop())
    if items2: shufflatored.append(items2.pop())

！

编辑

你想要确定性的东西吗？那你为什么不这么说呢？

lst = list(range(20))

lst[::2], lst[1::2] = lst[1::2], lst[::2]

lst
#>>> [1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, 17, 16, 19, 18]

魔术，不是吗？

希望您了解这种就地交换值的方法：

a = 1
b = 2

a, b = b, a

a
#>>> 2

b
#>>> 1

嗯，lst[::2]其他的都是值吗

lst[::2]
#>>> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

并且lst[1::2]是所有其他值，

lst[1::2]
#>>> [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

所以lst[::2], lst[1::2] = lst[1::2], lst[::2]将所有其他值与其他所有值交换！

import random

items = [1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,4]

[
    iv[1] for iv in
    sorted(
        enumerate(items),
        key=lambda iv: iv[0]+random.choice([-1, 1])
    )
]

#>>> [1, 1, 2, 1, 2, 2, 3, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]

[
    iv[1] for iv in
    sorted(
        enumerate(range(20)),
        key=lambda iv: iv[0]+random.choice([-1, 1])
    )
]
#>>> [0, 2, 1, 4, 3, 5, 6, 7, 9, 8, 11, 10, 12, 14, 13, 15, 17, 16, 18, 19]

这是随机洗牌，因此第一个列表不会显示大部分洗牌。选择的结果是手工挑选的所有可能性。

基本上，这个算法需要一个列表并索引它：

  items a b c d e f g h i j
indexes 0 1 2 3 4 5 6 7 8 9

然后它按索引 + 随机选择排序[-1, 1]：

  items a b c d e f g h i j
indexes 0 1 2 3 4 5 6 7 8 9
sort by 1 0 3 2 5 4 5 6 9 8

并导致

  items b a d c f e g h j i
indexes 1 0 3 2 5 4 6 7 9 8
sort by 0 1 2 3 4 5 5 6 8 9

它是洗牌的。要改变 shuffle 的类型，比如让它或多或少地 shuffle，改变 list 的细节[-1, 1]。您也可以尝试[-1, 0, 1],[0, 1]和其他变体。

算法步骤：

indexed = enumerate(items)

shuffled = sorted(indexed, key=lambda iv: iv[0]+random.choice([-1, 1]))

# Remove the index, extract the values out again
result = [iv[1] for iv in shuffled]

现在，效率。

如果您非常精明，您可能会意识到排序是传统的O(n log n)。Python 使用了 TimSort，一个很棒的排序算法。尽管任何比较排序（也称为比较值的排序）都必须具有至少的上限O(n log n)，但它们也可以具有低至O(n)！

这是因为只要您检查它是否已排序，对已经排序的列表进行排序是微不足道的。TimSort 有一个本地化的“排序”概念，它会很快检测到值何时排序。这意味着因为它们只是稍微打乱了，所以 TimSort 会执行更接近列表的“打乱”的O(kn)地方k，这比log n!

python - 具有相邻差异的python排序

编辑

1 回答 1

第二次编辑

编辑

Related

Reference