python - 如何矢量化这个 python 计数排序，使其尽可能快？

Question

我正在尝试在 python 中编写计数排序以在某些情况下击败内置的 timsort。现在它优于内置的 sorted 函数，但仅适用于非常大的数组（长度为 100 万个整数和更长，我没有尝试过超过 1000 万个）并且仅适用于不大于 10,000 的范围。此外，胜利是狭窄的，计数排序仅在专门为其定制的随机列表中以显着优势获胜。

我已经阅读了有关通过矢量化 python 代码可以获得惊人的性能提升，但我并不特别了解如何做到这一点或如何在此处使用它。我想知道如何对这段代码进行矢量化以加快速度，欢迎提出任何其他性能建议。

当前仅适用于 python 和 stdlibs 的最快版本：

from itertools import chain, repeat

def untimed_countsort(unsorted_list):
    counts = {}
    for num in unsorted_list:
        try:
            counts[num] += 1
        except KeyError:
            counts[num] = 1

    sorted_list = list(
        chain.from_iterable(
            repeat(num, counts[num])
            for num in xrange(min(counts), max(counts) + 1)))
    return sorted_list

这里最重要的是原始速度，因此牺牲更多空间来提高速度是完全公平的游戏。
我意识到代码已经相当简短和清晰，所以我不知道还有多少提高速度的空间。
如果有人对代码进行了更改以使其更短，只要它不会使其变慢，那也很棒。
执行时间减少了近 80%！在我目前的测试中，现在是 Timsort 的三倍！

通过远射来做到这一点的绝对最快的方法是使用这个带有 numpy 的单线：

def np_sort(unsorted_np_array):
    return numpy.repeat(numpy.arange(1+unsorted_np_array.max()), numpy.bincount(unsorted_np_array))

这比纯 python 版本快大约 10-15 倍，比 Timsort 快大约 40 倍。它需要一个 numpy 数组并输出一个 numpy 数组。

score 9 · Accepted Answer

使用 numpy，此功能可简化为以下内容：

def countsort(unsorted):
    unsorted = numpy.asarray(unsorted)
    return numpy.repeat(numpy.arange(1+unsorted.max()), numpy.bincount(unsorted))

当我在区间 [0, 10000) 的 100000 个随机整数上尝试它时，它的运行速度快了大约 40 倍。bincount进行计数，repeat并将计数转换为排序数组。

score 1 · Accepted Answer

在不考虑您的算法的情况下，这将有助于摆脱大多数纯 python 循环（它们非常慢）并将它们变成理解或生成器（总是比常规for块更快）。此外，如果您必须制作一个包含所有相同元素的列表，那么[x]*n语法可能是最快的方法。sum用于展平列表列表。

from collections import defaultdict

def countsort(unsorted_list):
    lmin, lmax = min(unsorted_list), max(unsorted_list) + 1
    counts = defaultdict(int)
    for j in unsorted_list:
        counts[j] += 1
    return sum([[num]*counts[num] for num in xrange(lmin, lmax) if num in counts])

请注意，这不是矢量化的，也没有使用 numpy.

python - 如何矢量化这个 python 计数排序，使其尽可能快？

2 回答 2

Related

Reference