algorithm - 如何优化快速排序

Question

我正在尝试制定一个有效的quicksort算法。它可以正常工作，但是当元素数量很大并且数组的某些部分是预先排序的时需要很长时间才能运行。我正在查找 Wikipedia 上的文章quicksort，我发现它是这样写的：

为了确保最多使用 O(log N) 空间，首先递归到数组的较小一半，然后使用尾调用递归到另一个。

使用插入排序，它具有较小的常数因子，因此在小数组上速度更快，用于对这种小数组的调用（即长度小于实验确定的阈值 t）。这可以通过使此类数组不排序并在最后运行单个插入排序过程来实现，因为插入排序可以有效地处理几乎已排序的数组。每个小段的单独插入排序会增加启动和停止许多小排序的开销，但避免浪费精力比较多个段边界的键，由于快速排序过程的工作，哪些键将是有序的。它还改进了缓存的使用。

我目前正在对两个分区进行递归。知道如何实施第一个技巧吗？首先递归到数组的较小一半，然后使用尾调用递归到另一个是什么意思？其次，如何实现insertion-sort内部快速排序？它会始终提高效率，还是仅在数组的某些部分被预先排序时？如果是第二种情况，那我当然不知道什么时候会发生。那么我什么时候应该包括insertion-sort？

score 13 · Accepted Answer

在 Quick-sort 中，您选择一个将数组分隔为两半的随机枢轴，大多数情况下一个可能更小，

例如，数组大小为 100，pivot 将数组分隔为 40 / 60，40 是较小的大小。

让我们假设您决定使用插入排序为 10 的阈值大小，您需要继续通过 pivot 递归拆分数组，每当一半的一个变得小于或等于 10 时，您可以使用行为类似的插入排序O(n) 在小尺寸数组上。

考虑到如果您的数组被反向排序（最坏的情况），插入排序将表现不佳。

至于递归的东西，您只需要修改快速排序递归的停止情况 -> 数组大小 <= 10 停止递归并使用插入排序对所有数组（在此递归步骤中要小得多）进行排序。

通过尾递归，它们的意思是对前半部分做所有你需要的事情，然后对较小的一半调用插入排序作为最后一个方法，它用于节省空间。

  Quick-sort()
      choose a pivot
      move the smaller elements from left
      move the bigger elements from right
      quick-sort on the bigger half of the array

      if half is less then X
         only then do an insertion sort on the other half <- this is a tail recursion insertion sort 
      else
         quick sort on this half also

据我所知，第二个优化建议不要对每个递归步骤使用插入排序，但要记住对其进行约束的索引，然后在一批中调用插入排序，将所有切片中的项目连接起来，这将确保提高缓存的使用，但实现起来稍微困难一些，

score 8 · Accepted Answer

有多种方法可以使标准快速排序更有效。要实现您帖子中的第一个提示，您应该编写如下内容：

void quicksort(int * tab, int l, int r)
{
   int q;
   while(l < r)
   {
      q = partition(tab, l, r);
      if(q - l < r - q) //recurse into the smaller half
      {
         quicksort(tab, l, q - 1);
         l = q + 1;
      } else
      {
         quicksort(tab, q + 1, r);
         r = q - 1;
      }
   }
}

希望这足够清楚。下一步将是实现您自己的堆栈（或使用您正在使用的任何语言的一些内置），而不是使用递归调用。示例（伪）代码：

void quicksort2(int * tab, int l, int r)
{
    int le, ri, q;
    init stack;
    push(l, r, stack);
    while(!empty(stack))
    {
        //take the top pair of values from the stack and set them to le and ri
        pop(le, ri, stack);
        if(le >= ri)
            continue;
        q = partition(tab, le, ri);
        if(q - le < ri - q) //smaller half goes first
        {
            push(le, q - 1, stack);
            push(q + 1, ri, stack);
        } else
        {
            push(q + 1, ri, stack);
            push(le, q - 1, stack);
        }
    }
    delete stack;
}

然后，您可以继续实施您帖子中的其他提示。为此，您应该设置一些任意常量，我们将其称为 CUT_OFF，大约为 20。这将告诉您的算法何时应该切换到插入排序。更改前面的示例应该相当容易（添加一个 if 语句），以便在到达 CUT_OFF 点后切换到插入排序，所以我将留给您。

至于分区方法，我建议使用 Lomuto 分区而不是 Hoare。

但是，如果您的数据已经预先排序，那么您可以考虑完全使用不同的算法。根据我的经验，如果您的数据是预先排序的，那么在链表上实现的自然系列合并排序是一个非常好的选择。

score 5 · Accepted Answer

我前段时间写了一个基于快速排序的算法，你可以在那里找到（实际上它是一种选择算法，但也可以用作排序算法）：

http://processors.wiki.ti.com/index.php/Top_K_Elements_Sort_For_DSP_With_Index_Ordering

我从这次经历中学到的教训如下：

仔细调整算法的分区循环。这通常被低估了，但是如果您注意编写编译器/CPU 将能够进行软件流水线的循环，您确实会获得显着的性能提升。仅此一项就在 CPU 周期中赢得了约 50% 的胜利。
手工编码小类给你一个主要的性能胜利。当分区中要排序的元素数量低于 8 个元素时，不要费心尝试递归，而是仅使用 ifs 和 swaps 实现硬编码排序（查看此代码中的 fast_small_sort 函数） . 这可以在 CPU 周期中获得大约 50% 的优势，从而使快速排序具有与编写良好的“合并排序”相同的实际性能。
当检测到“差”的枢轴选择时，花时间选择更好的枢轴值。我的实现开始使用“中位数”算法进行枢轴选择，只要枢轴选择导致一侧低于要排序的剩余元素的 16%。这是一种针对快速排序最坏情况性能的缓解策略，有助于确保在实践中上限也是 O(n*log(n)) 而不是 O(n^2)。
优化具有许多相等值的数组（需要时）。如果要排序的数组有很多相等的值，则值得优化，因为它会导致糟糕的枢轴选择。在我的代码中，我通过计算所有等于枢轴值的数组条目来做到这一点。这使我能够以更快的方式处理数组中的枢轴和所有相等的值，并且在不适用时不会降低性能。这是针对最坏情况性能的另一种缓解策略，它通过大幅降低最大递归级别来帮助减少最坏情况下的堆栈使用。

我希望这会有所帮助，劳伦特。

score 1 · Accepted Answer

I've recently have found this optimization. It works faster than std::sort. It uses selection sort on small arrays and median-of-3 as partitioning element.

This is my C++ implementation:

const int CUTOFF = 8;

template<typename T>
bool less (T &v, T &w)
{
    return (v < w);
}

template<typename T>
bool eq (T &v, T &w)
{
    return w == v;
}

template <typename T>
void swap (T *a, T *b)
{
    T t = *a;
    *a = *b;
    *b = t;
}

template<typename T>
void insertionSort (vector<T>& input, int lo, int hi) 
{
    for (int i = lo; i <= hi; ++i)
    {
        for (int j = i; j > lo && less(input[j], input[j-1]); --j)
        {
            swap(&input[j], &input[j-1]);
        }
    }
}


template<typename T>
int median3 (vector<T>& input, int indI, int indJ, int indK)
{
    return (less(input[indI], input[indJ]) ?
            (less(input[indJ], input[indK]) ? indJ : less(input[indI], input[indK]) ? indK : indI) :
            (less(input[indK], input[indJ]) ? indJ : less(input[indK], input[indI]) ? indK : indI));
}


template <typename T>
void sort(vector<T>& input, int lo, int hi) 
{ 
    int lenN = hi - lo + 1;

    // cutoff to insertion sort
    if (lenN <= CUTOFF) 
    {
        insertionSort(input, lo, hi);
        return;
    }

    // use median-of-3 as partitioning element
    else if (lenN <= 40) 
    {
        int median = median3(input, lo, lo + lenN / 2, hi);
        swap(&input[median], &input[lo]);
    }

    // use Tukey ninther as partitioning element
    else  
    {
        int eps = lenN / 8;
        int mid = lo + lenN / 2;
        int mFirst = median3(input, lo, lo + eps, lo + eps + eps);
        int mMid = median3(input, mid - eps, mid, mid + eps);
        int mLast = median3(input, hi - eps - eps, hi - eps, hi); 
        int ninther = median3(input, mFirst, mMid, mLast);
        swap(&input[ninther], &input[lo]);
    }

    // Bentley-McIlroy 3-way partitioning
    int iterI = lo, iterJ = hi + 1;
    int iterP = lo, iterQ = hi + 1;

    for (;; ) 
    {
        T v = input[lo];
        while (less(input[++iterI], v))
        {
            if (iterI == hi) 
                break;
        }
        while (less(v, input[--iterJ]))
        {
            if (iterJ == lo)    
                break;
        }
        if (iterI >= iterJ) 
            break;
        swap(&input[iterI], &input[iterJ]);
        if (eq(input[iterI], v)) 
            swap(&input[++iterP], &input[iterI]);
        if (eq(input[iterJ], v)) 
            swap(&input[--iterQ], &input[iterJ]);
    }
    swap(&input[lo], &input[iterJ]);

    iterI = iterJ + 1;
    iterJ = iterJ - 1;
    for (int k = lo + 1; k <= iterP; ++k) 
    {
        swap(&input[k], &input[iterJ--]);
    }
    for (int k = hi  ; k >= iterQ; --k)
    {
        swap(&input[k], &input[iterI++]);
    }

    sort(input, lo, iterJ);
    sort(input, iterI, hi);
}

score 1 · Accepted Answer

尾递归就是把一个递归调用变成一个循环。对于 QuickSort，它类似于：

QuickSort(SortVar)                                                                     
   Granularity = 10                                                            
   SortMax = Max(SortVar)
   /* Put an element after the last with a higher key than all other elements 
      to avoid that the inner loop goes on forever */
   SetMaxKey(SortVar, SortMax+1)

   /* Push the whole interval to sort on stack */               
   Push 1 SortMax                                                              
   while StackSize() > 0                                                       
      /* Pop an interval to sort from stack */
      Pop SortFrom SortTo                                                     

      /* Tail recursion loop */                           
      while SortTo - SortFrom >= Granularity                                

         /* Find the pivot element using median of 3 */                            
         Pivot = Median(SortVar, SortFrom, (SortFrom + SortTo) / 2, SortTo)             
         /* Put the pivot element in front */                                     
         if Pivot > SortFrom then Swap(SortVar, SortFrom, Pivot)

         /* Place elements <=Key to the left and elements >Key to the right */           
         Key = GetKey(SortVar, SortFrom)                                                
         i = SortFrom + 1                                                      
         j = SortTo                                                            
         while i < j                                                        
            while GetKey(SortVar, i) <= Key; i = i + 1; end                          
            while GetKey(SortVar, j) > Key; j = j - 1; end                           
            if i < j then Swap(SortVar, i, j)                                       
         end                                                                   

         /* Put the pivot element back */                            
         if GetKey(SortVar, j) < Key then Swap(SortVar, SortFrom, j)                                         

         if j - SortFrom < SortTo - j then                                  
            /* The left part is smallest - put it on stack */                     
            if j - SortFrom > Granularity then Push SortFrom j-1               
            /* and do tail recursion on the right part */                           
            SortFrom = j + 1                                                   
         end                                                                   
         else
            /* The right part is smallest - put it on stack */                       
            if SortTo - j > Granularity then Push j+1 SortTo                   
            /* and do tail recursion on the left part */                         
            SortTo = j - 1                                                     
         end                                                                   
      end                                                                      
   end                                                                         

   /* Run insertionsort on the whole array to sort the small intervals */    
   InsertionSort(SortVar)                                                          
return

此外，没有理由在小间隔上调用 InsertionSort，因为当 QuickSort 完成时，数组被粗略排序，因此只剩下小间隔需要排序。而这正是 InsertionSort 的完美案例。

如果您没有堆栈，则可以改用递归 - 但保留尾递归：

QuickSort(SortVar, SortFrom, SortTo)                                                                     
   Granularity = 10                                                            

   /* Tail recursion loop */                           
   while SortTo - SortFrom >= Granularity                                

      /* Find the pivot element using median of 3 */                            
      Pivot = Median(SortVar, SortFrom, (SortFrom + SortTo) / 2, SortTo)             
      /* Put the pivot element in front */                                     
      if Pivot > SortFrom then Swap(SortVar, SortFrom, Pivot)

      /* Place elements <=Key to the left and elements >Key to the right */           
      Key = GetKey(SortVar, SortFrom)                                                
      i = SortFrom + 1                                                      
      j = SortTo                                                            
      while i < j                                                        
         while GetKey(SortVar, i) <= Key; i = i + 1; end                          
         while GetKey(SortVar, j) > Key; j = j - 1; end                           
         if i < j then Swap(SortVar, i, j)                                       
      end                                                                   

      /* Put the pivot element back */                            
      if GetKey(j) < Key then Swap(SortVar, SortFrom, j)                                         

      if j - SortFrom < SortTo - j then                                  
         /* The left part is smallest - recursive call */                     
         if j - SortFrom > Granularity then QuickSort(SortVar, SortFrom, j-1)           
         /* and do tail recursion on the right part */                           
         SortFrom = j + 1                                                   
      end                                                                   
      else
         /* The right part is smallest - recursive call */                       
         if SortTo - j > Granularity then QuickSort(SortVar, j+1, SortTo)                   
         /* and do tail recursion on the left part */                         
         SortTo = j - 1                                                     
      end                                                                   
   end                                                                         

   /* Run insertionsort on the whole array to sort the small intervals */    
   InsertionSort(SortVar)                                                          
return

score 1 · Accepted Answer

您可以看一下 TimSort，对于非完全随机数据，它比快速排序表现更好（它们具有相同的渐近复杂度，但 TimSort 具有较低的常数）

algorithm - 如何优化快速排序

6 回答 6

Related

Reference