algorithm - 在哪里使用哪种排序算法？

Question

有多种排序算法可用。时间复杂度为 O(n^2) 的排序算法可能更适合 O(nlogn)，因为它是就地的或稳定的。例如：

对于有些排序的东西，插入排序很好。
对几乎排序的数组应用快速排序是愚蠢的。
堆排序在 O(nlogn) 时很好，但不稳定。
合并排序不能用于嵌入式系统，因为在最坏的情况下它需要 O(n) 的空间复杂度。

我想知道哪种排序算法适合什么条件。

哪种排序算法最适合按字母顺序对名称进行排序？
哪种排序算法最适合排序更少的整数？
哪种排序算法最适合排序较少但范围可能很大（98767 - 6734784）的整数？
哪种排序算法最适合对数十亿个整数进行排序？
哪种排序算法最适合在空间和时间都是约束的嵌入式系统或实时系统中进行排序？

请为这些类型的比较建议这些/其他情况、书籍或网站。

score 2 · Accepted Answer

好吧，没有灵丹妙药 - 但这里有一些经验法则：

当元素的范围（让它成为U）与元素的数量（）相比相对较小时，基数排序/计数排序通常很好U<<n（可能适合您的情况 2,4）
插入排序适用于小型（例如n<30）列表，甚至比O(nlogn)算法（经验上）更快。实际上，您可以通过在以下O(nlogn)情况下切换到插入排序来优化自顶向下算法n<30
基数排序的变体也可能是按字母顺序排序字符串的不错选择，因为它是O(|S|*n)，而正常的基于比较的算法是O(|S|*nlogn) [where |S|is the length of your string]。（适合您的情况 1）
在排序输入非常大，太大而无法合并的情况下，使用外部排序的方法是 - 这是一种变体或合并排序，它可以最大限度地减少磁盘读取/写入的次数，并确保这些是按顺序完成的- 因为它极大地提高了性能。（可能适合案例 4）
对于一般大小写排序，快速排序和 timsort（用于 java）提供了良好的性能。

score 0 · Accepted Answer

Merge sort can't be used in embedded systems as in worst case it requires O(n) of space complexity.

You may be interested in the stable_sort function from C++. It tries to allocate the extra space for a regular merge sort, but if that fails it does an in-place stable merge sort with inferior time complexity (n * ((log n)^2) instead of n * (log n)). If you can read C++ you can look at the implementation in your favourite standard library, otherwise I expect you can find the details explained somewhere in language-agnostic terms.

There's a body of academic literature about in-place stable sorting (and in particular in-place merging).

So in C++ the rule of thumb is easy, "use std::stable_sort if you need a stable sort, otherwise use std::sort". Python makes it even easier again, the rule of thumb is "use sorted".

In general, you will find that a lot of languages have fairly clever built-in sort algorithms, and you can use them most of the time. It's rare that you'll need to implement your own to beat the standard library. If you do need to implement your own, there isn't really any substitute for pulling out the textbooks, implementing a few algorithms with as many tricks as you can find, and testing them against each other for the specific case you're worried about for which you need to beat the library function.

Most of the "obvious" advice that you might be hoping for in response to this question is already incorporated into the built-in sort functions of one or more common programming languages. But to answer your specific questions:

Which sorting algo is best for sorting names in alphabetical order?

A radix sort might edge out standard comparison sorts like C++ sort, but that might not be possible if you're using "proper" collation rules for names. For example, "McAlister" used to be alphabetized the same as "MacAlister", and "St. John" as "Saint John". But then programmers came along and wanted to just sort by ASCII value rather than code a lot of special rules, so most computer systems don't use those rules any more. I find Friday afternoon is a good time for this kind of feature ;-) You can still use a radix sort if you do it on the letters of the "canonicalized" name rather than the actual name.

"Proper" collation rules in languages other than English are also entertaining. For example in German "Grüber" sorts like "Grueber", and therefore comes after "Gruber" but before "Gruhn". In English the name "Llewellyn" comes after "Lewis", but I believe in Welsh (using the exact same alphabet but different traditional collation rules) it comes before.

For that reason, it's easier to talk about optimizing string sorts than it is to actually do it. Sorting strings "properly" requires being able to plug in locale-specific collation rules, and if you move away from a comparison sort then you might have to re-write all your collation code.

Which sorting algo is best for sorting less integers?

For a small number of small values maybe a counting sort, but Introsort with a switch to insertion sort when the data gets small enough (20-30 elements) is pretty good. Timsort is especially good when the data isn't random.

Which sorting algo is best for sorting less integers but may be large in range (98767 – 6734784)?

The large range rules out counting sort, so for a small number of widely-ranged integers, Introsort/Timsort.

Which sorting algo is best for sorting billions of integers?

If by "billions" you mean "too many to fit in memory" then that changes the game a bit. Probably you want to divide the data into chunks that do fit in memory, Intro/Tim sort each one, then do an external merge. Of if you're on a 64 bit machine sorting 32 bit integers, you could consider counting sort.

Which sorting algo is best for sorting in embedded systems or real time systems where space and time both are constraints?

Probably Introsort.

For somewhat sorted things insertion sort is good.

True, and Timsort takes advantage of the same situation.

Applying quick sort on nearly sorted array is foolishness.

False. Nobody uses the plain QuickSort originally published by Hoare, you can make better choices of pivot that make the killer cases much less obvious than "sorted data". To deal with the bad cases thoroughly there is Introsort.

Heap sort is good with O(nlogn) but not stable.

True, but Introsort is better (and also not stable).

Merge sort can't be used in embedded systems as in worst case it requires O(n) of space complexity.

Handle this by allowing for somewhat slower in-place merging like std::stable_sort does.

algorithm - 在哪里使用哪种排序算法？

2 回答 2

Related

Reference