另一种考虑的可能性:
- 开始进行(手动)插入排序
- 当你走的时候,计算你执行的反转次数
- 在您完成了一些固定的少量插入后,将您计算的反转次数与如果数据开始反向排序到该点将发生的最大反转次数进行比较:
- 如果该比例接近 0,则(可能)数据接近排序。完成插入排序,它在几乎排序的数据上表现得非常好。如果你不喜欢“可能”的声音,那么继续计算反转,如果它低于阈值,则准备好回退到 Timsort。
- 如果比例接近 1,那么(可能)数据几乎是反向排序的,并且您在开始时有少量排序的元素。将它们移到最后,将它们反转,并使用反向比较器完成插入排序。
- Otherwise the data is random, use your favourite sorting algorithm. I'd say Timsort, but since that does well on nearly-sorted data there must be some other algorithm that does at least a tiny bit better than Timsort does on uniformly-shuffled data. Probably plain merge sort without the Tim.
The "small fixed number" can be a number for which insertion sort is fairly fast even in bad cases. I would guess 10-20 or so. It's possible to work out the probability of a false positive in uniformly shuffled data for any given number of insertions and any given threshold of "close to 0/1", but I'm too lazy.
You say the first and last few array elements typically buck the trend, in which case you could exclude them from the initial test insertion sort.
显然,这种方法在某种程度上受到了 Timsort 的启发。但是 Timsort 对包含运行的数据进行了极其优化——我试图只对接近一次大运行(在任一方向)的数据进行极其优化。Timsort 的另一个特点是它经过了很好的测试,我不声称要分享这一点。