python - 将列表分割成 n 个几乎等长的分区

Question

我正在寻找一种快速、干净、pythonic 的方法来将列表划分为 n 个几乎相等的分区。

partition([1,2,3,4,5],5)->[[1],[2],[3],[4],[5]]
partition([1,2,3,4,5],2)->[[1,2],[3,4,5]] (or [[1,2,3],[4,5]])
partition([1,2,3,4,5],3)->[[1,2],[3,4],[5]] (there are other ways to slice this one too)

这里有几个答案迭代列表切片非常接近我想要的，除了它们专注于列表的大小，我关心列表的数量（其中一些也用无填充）。显然，这些转换很简单，但我正在寻找最佳实践。

同样，人们在这里指出了很好的解决方案如何将列表分成大小均匀的块？对于一个非常相似的问题，但我对分区的数量比具体的大小更感兴趣，只要它在 1 以内。同样，这很容易转换，但我正在寻找一个最佳实践。

score 32 · Accepted Answer

只是一个不同的看法，只有[[1,3,5],[2,4]]在你的例子中是一个可接受的分区时才有效。

def partition ( lst, n ):
    return [ lst[i::n] for i in xrange(n) ]

这满足@Daniel Stutzbach 示例中提到的示例：

partition(range(105),10)
# [[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
# [1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 101],
# [2, 12, 22, 32, 42, 52, 62, 72, 82, 92, 102],
# [3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103],
# [4, 14, 24, 34, 44, 54, 64, 74, 84, 94, 104],
# [5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
# [6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
# [7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
# [8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
# [9, 19, 29, 39, 49, 59, 69, 79, 89, 99]]

score 29 · Accepted Answer

这是一个类似于 Daniel 的版本：它尽可能均匀地划分，但将所有较大的分区放在开头：

def partition(lst, n):
    q, r = divmod(len(lst), n)
    indices = [q*i + min(i, r) for i in xrange(n+1)]
    return [lst[indices[i]:indices[i+1]] for i in xrange(n)]

它还避免了使用浮点算术，因为这总是让我感到不舒服。:)

编辑：一个例子，只是为了展示与 Daniel Stutzbach 的解决方案的对比

>>> print [len(x) for x in partition(range(105), 10)]
[11, 11, 11, 11, 11, 10, 10, 10, 10, 10]

score 23 · Accepted Answer

def partition(lst, n):
    division = len(lst) / float(n)
    return [ lst[int(round(division * i)): int(round(division * (i + 1)))] for i in xrange(n) ]

>>> partition([1,2,3,4,5],5)
[[1], [2], [3], [4], [5]]
>>> partition([1,2,3,4,5],2)
[[1, 2, 3], [4, 5]]
>>> partition([1,2,3,4,5],3)
[[1, 2], [3, 4], [5]]
>>> partition(range(105), 10)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]

Python 3 版本：

def partition(lst, n):
    division = len(lst) / n
    return [lst[round(division * i):round(division * (i + 1))] for i in range(n)]

score 4 · Accepted Answer

下面是一种方法。

def partition(lst, n):
    increment = len(lst) / float(n)
    last = 0
    i = 1
    results = []
    while last < len(lst):
        idx = int(round(increment * i))
        results.append(lst[last:idx])
        last = idx
        i += 1
    return results

如果 len(lst) 不能被 n 整除，则此版本将以大致相等的间隔分配额外的项目。例如：

>>> print [len(x) for x in partition(range(105), 10)]
[11, 10, 11, 10, 11, 10, 11, 10, 11, 10]

如果您不介意所有 11 都在开头或结尾，那么代码可能会更简单。

score 0 · Accepted Answer

这个答案提供了一个功能split(list_, n, max_ratio)，对于想要将他们的列表分成n片段长度最多max_ratio 比例的人。它允许比提问者的“最多1个片段长度差异”更多的变化。

它的工作原理是在所需的比率范围 [1, max_ratio)内对n块长度进行采样，将它们一个接一个地放置以形成一个“断点”之间的距离正确但总长度错误的“断棒”。将断棒缩放到所需的长度为我们提供了我们想要的断点的大致位置。要获得整数断点，需要随后的舍入。

不幸的是，四舍五入可能会使片段太短，并让您超过 max_ratio。有关示例，请参见此答案的底部。

import random

def splitting_points(length, n, max_ratio):
    """n+1 slice points [0, ..., length] for n random-sized slices.

    max_ratio is the largest allowable ratio between the largest and the
    smallest part.
    """
    ratios = [random.uniform(1, max_ratio) for _ in range(n)]
    normalized_ratios = [r / sum(ratios) for r in ratios]
    cumulative_ratios = [
        sum(normalized_ratios[0:i])
        for i in range(n+1)
    ]
    scaled_distances = [
        int(round(r * length))
        for r in cumulative_ratios
    ]

    return scaled_distances


def split(list_, n, max_ratio):
    """Slice a list into n randomly-sized parts.

    max_ratio is the largest allowable ratio between the largest and the
    smallest part.
    """

    points = splitting_points(len(list_), n, ratio)

    return [
        list_[ points[i] : points[i+1] ]
        for i in range(n)
    ]

你可以这样尝试：

for _ in range(10):
    parts = split('abcdefghijklmnopqrstuvwxyz', 4, 2)
    print([(len(part), part) for part in parts])

坏结果的例子：

parts = split('abcdefghijklmnopqrstuvwxyz', 10, 2)

# lengths range from 1 to 4, not 2 to 4
[(3, 'abc'),  (3, 'def'), (1, 'g'),
 (4, 'hijk'), (3, 'lmn'), (2, 'op'),
 (2, 'qr'),  (3, 'stu'),  (2, 'vw'),
 (3, 'xyz')]

python - 将列表分割成 n 个几乎等长的分区

5 回答 5

Related

Reference