algorithm - 对 7 个整数的数组进行排序的最快方法是什么？

Question

这是分析扑克（特别是德州扑克）赔率的程序的一部分。我有一个我很满意的程序，但它需要一些小的优化才能完美。

我使用这种类型（当然还有其他）：

  type
    T7Cards = array[0..6] of integer;

在决定如何对其进行排序时，有关此数组的两件事可能很重要：

每个项目都是从 0 到 51 的值。不可能有其他值。
没有重复项。绝不。

有了这些信息，对这个数组进行排序绝对最快的方法是什么？我使用 Delphi，所以 pascal 代码是最好的，但我可以阅读 C 和伪代码，尽管速度要慢一些 :-)

目前我使用快速排序，但有趣的是这几乎不比冒泡排序快！可能是因为项目数量少。排序占该方法总运行时间的近 50%。

编辑：

Mason Wheeler 询问为什么有必要进行优化。一个原因是该方法将被调用 2118760 次。

基本扑克信息：所有玩家都发两张牌（口袋），然后五张牌发到桌子上（前 3 被称为翻牌，接下来是转牌，最后是河牌。每个玩家挑选五个最好的牌组成他们的手）

如果口袋里有两张牌，P1 和 P2，我将使用以下循环来生成所有可能的组合：

for C1 := 0 to 51-4 do
  if (C1<>P1) and (C1<>P2) then
     for C2 := C1+1 to 51-3 do
       if (C2<>P1) and (C2<>P2) then
         for C3 := C2+1 to 51-2 do
           if (C3<>P1) and (C3<>P2) then
             for C4 := C3+1 to 51-1 do
               if (C4<>P1) and (C4<>P2) then
                 for C5 := C4+1 to 51 do
                   if (C5<>P1) and (C5<>P2) then
                   begin
                     //This code will be executed 2 118 760 times
                     inc(ComboCounter[GetComboFromCards([P1,P2,C1,C2,C3,C4,C5])]);
                   end;

在我写这篇文章时，我还注意到一件事：数组的最后五个元素总是被排序的，所以这只是将前两个元素放在数组中正确位置的问题。这应该会简化一些事情。

所以，新的问题是：当最后 5 个元素已经排序时，对 7 个整数的数组进行排序的最快方法是什么？我相信这可以通过几个（？）if's和swaps来解决:-)

score 14 · Accepted Answer

对于非常小的集合，插入排序通常可以击败快速排序，因为它的开销非常低。

WRT 你的编辑，如果你已经主要按排序顺序（最后 5 个元素已经排序），插入排序绝对是要走的路。在几乎排序的数据集中，它每次都会击败快速排序，即使对于大型数据集也是如此。（特别是对于大集合！这是插入排序的最佳情况和快速排序的最坏情况。）

score 7 · Accepted Answer

不知道你是如何实现的，但你可以做的是有一个 52 而不是 7 的数组，当你拿到它时直接将卡插入它的插槽，因为永远不会有重复，那样你永远不会有对数组进行排序。这可能会更快，具体取决于其使用方式。

score 6 · Accepted Answer

我对德州扑克知之甚少：P1 和 P2 的花色是否重要，或者它们是否相同？如果只有 suit(P1)==suit(P2) 很重要，那么您可以将这两种情况分开，P1/P2 只有 13x12/2 种不同的可能性，并且您可以轻松地为这两种情况预先计算一个表格。

否则，我会建议这样的事情：

(* C1 < C2 < P1 *)
for C1:=0 to P1-2 do 
   for C2:=C1+1 to P1-1 do 
      Cards[0] = C1;
      Cards[1] = C2;
      Cards[2] = P1;
      (* generate C3...C7 *)

(* C1 < P1 < C2 *)
for C1:=0 to P1-1 do 
   for C2:=P1+1 to 51 do 
      Cards[0] = C1;
      Cards[1] = P1;
      Cards[2] = C2;
      (* generate C3...C7 *)

(* P1 < C1 < C2 *)
for C1:=P1+1 to 51 do 
   for C2:=C1+1 to 51 do 
      Cards[0] = P1;
      Cards[1] = C1;
      Cards[2] = C2;
      (* generate C3...C7 *)

（这只是一张卡片 P1 的演示，您必须为 P2 扩展它，但我认为这很简单。虽然打字会很多......）这样，排序不会花费任何时间一点也不。生成的排列已经排序。

score 4 · Accepted Answer

7个元素只有5040个排列。您可以以编程方式生成一个程序，该程序在最少的比较次数中找到您的输入所代表的程序。它将是一棵大if-then-else指令树，每个指令都比较一对固定的节点，例如if (a[3]<=a[6]).

棘手的部分是决定在特定内部节点中比较哪两个元素。为此，您必须考虑祖先节点中从根到特定节点（例如a[0]<=a[1], not a[2]<=a[7], a[2]<=a[5]）的比较结果以及满足比较的可能排列集。比较将集合分成尽可能相等的部分的元素对（最小化较大部分的大小）。

一旦你有了排列，在最小的交换集中对其进行排序是微不足道的。

score 4 · Accepted Answer

由于最后 5 个项目已经排序，因此可以编写代码来重新定位前 2 个项目。由于您使用的是 Pascal，因此我编写并测试了一种排序算法，该算法可以在大约 62 毫秒内执行 2,118,760 次。

procedure SortT7Cards(var Cards: T7Cards);
const
  CardsLength = Length(Cards);
var
  I, J, V: Integer;
  V1, V2: Integer;
begin
  // Last 5 items will always be sorted, so we want to place the first two into
  // the right location.
  V1 := Cards[0];
  V2 := Cards[1];
  if V2 < V1 then
  begin
    I := V1;
    V1 := V2;
    V2 := I;
  end;

  J := 0;
  I := 2;
  while I < CardsLength do
  begin
    V := Cards[I];
    if V1 < V then
    begin
      Cards[J] := V1;
      Inc(J);
      Break;
    end;
    Cards[J] := V;
    Inc(J);
    Inc(I);
  end;
  while I < CardsLength do
  begin
    V := Cards[I];
    if V2 < V then
    begin
      Cards[J] := V2;
      Break;
    end;
    Cards[J] := V;
    Inc(J);
    Inc(I);
  end;
  if J = (CardsLength - 2) then
  begin
    Cards[J] := V1;
    Cards[J + 1] := V2;
  end
  else if J = (CardsLength - 1) then
  begin
    Cards[J] := V2;
  end;
end;

score 2 · Accepted Answer

使用最小排序。一次搜索最小和最大元素并将它们放入结果数组中。重复三遍。（编辑：不，我不会尝试从理论上测量速度:_)）

var
cards,result: array[0..6] of integer;
i,min,max: integer;

begin
   n=0;
   while (n<3) do begin
      min:=-1;
      max:=52;
      for i from 0 to 6 do begin
          if cards[i]<min then min:=cards[i]
          else if cards[i]>max then max:=cards[i]
      end
      result[n]:=min;
      result[6-n]:=max;
      inc(n);
   end
   for i from 0 to 6 do 
       if (cards[i]<52) and (cards[i]>=0) then begin
           result[3] := cards[i];
           break;
       end
    { Result is sorted here! }
end

score 2 · Accepted Answer

这是最快的方法：既然5张卡的列表已经排好序了，就对两张卡的列表进行排序（比较&交换），然后合并两个列表，也就是O(k * (5+2)。在这个case (k) 通常为 5：循环测试 (1)、比较 (2)、复制 (3)、输入列表增量 (4) 和输出列表增量 (5)。即 35 + 2.5。投入循环初始化，总共得到 41.5 条语句。

您还可以展开循环，这可能会为您节省 8 条语句或执行，但会使整个例程长约 4-5 倍，这可能会影响您的指令缓存命中率。

给定 P(0 到 2)、C(0 到 5) 并复制到 H(0 到 6) 且 C() 已经排序（升序）：

If P(0) > P(1) Then
    // Swap:
    T = P(0)
    P(0) = P(1)
    P(1) = T
    // 1stmt + (3stmt * 50%) = 2.5stmt
End

P(2), C(5) = 53    \\ Note these are end-of-list flags
k = 0     \\ P() index
J = 0     \\ H() index
i = 0     \\ C() index
// 4 stmt

Do While (j) < 7 
    If P(k) < C(I) then
        H(j) = P(k)
        k = k+1
    Else
        H(j) = C(i)
        j = j+1
    End if
    j = j+1
    // 5stmt * 7loops = 35stmt
Loop

请注意，如果您必须真正对所有 7 张卡片进行排序，这比“最快”的其他算法要快：使用位掩码 (52) 将所有 7 张卡片映射和位设置到所有可能的 52 范围内卡（位掩码），然后扫描位掩码以查找设置的 7 位。这最多需要 60-120 条语句（但仍然比任何其他排序方法都快）。

score 2 · Accepted Answer

对于七个数字，存在的关于比较次数的最有效算法是 Ford-Johnson 算法。事实上，维基百科引用了一篇在谷歌上很容易找到的论文，声称福特-约翰逊是最多 47 个数字的最佳选择。不幸的是，对福特-约翰逊的引用并不是那么容易找到，而且该算法使用了一些复杂的数据结构。

它出现在 Donald Knuth 的 The Art Of Computer Programming, Volume 3 中，如果您可以访问那本书的话。

有一篇论文描述了 FJ 和一个内存效率更高的版本。

无论如何，由于该算法的内存开销，我怀疑整数是否值得花时间，因为与分配内存和操作指针的成本相比，比较两个整数的成本相当便宜。

现在，您提到已经排序了 5 张卡片，您只需插入两张。您可以像这样最有效地使用插入排序来做到这一点：

Order the two cards so that P1 > P2
Insert P1 going from the high end to the low end
(list) Insert P2 going from after P1 to the low end
(array) Insert P2 going from the low end to the high end

你如何做到这一点将取决于数据结构。使用数组，您将交换每个元素，因此将 P1 放置在第 1、P2 和第 7（从高到低排序），然后将 P1 向上交换，然后将 P2 向下交换。使用列表，您只需要根据需要修复指针。

然而，再一次，由于您的代码的特殊性，最好遵循nikie的建议，并为 P1 和 P2 可以出现在列表中的每个变体适当地生成 for 循环。

例如，对 P1 和 P2 进行排序，使 P1 < P2。让我们将 Po1 和 Po2 设置为列表中 P1 和 P2 从 0 到 6 的位置。然后这样做：

Loop Po1 from 0 to 5
Loop Po2 from Po1 + 1 to 6
If (Po2 == 1) C1start := P2 + 1; C1end := 51 - 4
If (Po1 == 0 && Po2 == 2) C1start := P1+1; C1end := P2 - 1
If (Po1 == 0 && Po2 > 2) C1start := P1+1; C1end := 51 - 5
If (Po1 > 0) C1start := 0; C1end := 51 - 6
for C1 := C1start to C1end
  // Repeat logic to compute C2start and C2end
  // C2 can begin at C1+1, P1+1 or P2+1
  // C2 can finish at P1-1, P2-1, 51 - 3, 51 - 4 or 51 -5
  etc

然后调用传递 Po1、Po2、P1、P2、C1、C2、C3、C4、C5 的函数，并让该函数返回基于 Po1 和 Po2 的所有可能排列（即 36 种组合）。

就个人而言，我认为这是您可以获得的最快速度。您完全避免订购任何东西，因为数据将被预先订购。无论如何，您都会进行一些比较来计算开始和结束，但它们的成本被最小化，因为它们中的大多数将位于最外层的循环中，因此它们不会重复太多。他们甚至可以以更多的代码重复为代价进行更优化。

score 1 · Accepted Answer

对于 7 个元素，只有几个选项。您可以轻松编写一个生成器，该生成器生成对 7 个元素的所有可能组合进行排序的方法。类似于此方法的 3 个元素：

if a[1] < a[2] {
    if a[2] < a[3] {
        // nothing to do, a[1] < a[2] < a[3]
    } else {
         if a[1] < a[3] {
             // correct order should be a[1], a[3], a[2]
             swap a[2], a[3]
         } else {
             // correct order should be a[3], a[1], a[2]
             swap a[2], a[3]
             swap a[1], a[3]
         }
    }
} else {
    // here we know that a[1] >= a[2]
    ...
}

当然 7 个元素的方法会更大，但生成起来并不难。

score 1 · Accepted Answer

下面的代码接近最优。可以通过在制作树时编写一个要遍历的列表来做得更好，但我现在没时间了。干杯!

object Sort7 {
  def left(i: Int) = i * 4
  def right(i: Int) = i * 4 + 1
  def up(i: Int) = i * 4 + 2
  def value(i: Int) = i * 4 + 3

  val a = new Array[Int](7 * 4)
  def reset = {
    0 until 7 foreach { 
      i => {
        a(left(i)) = -1
        a(right(i)) = -1
        a(up(i)) = -1
        a(value(i)) = scala.util.Random.nextInt(52)
      }
    }
  }

  def sortN(i : Int) {
    var index = 0
    def getNext = if (a(value(i)) < a(value(index))) left(index) else right(index)
    var next = getNext
    while(a(next) != -1) {
      index = a(next)
      next = getNext
    }
    a(next) = i
    a(up(i)) = index
  }

  def sort = 1 until 7 foreach (sortN(_))

  def print {
    traverse(0)
    def traverse(i: Int): Unit = {
      if (i != -1) {
        traverse(a(left(i)))
        println(a(value(i)))
        traverse(a(right(i)))
      }
    }
  }
}

score 1 · Accepted Answer

在伪代码中：

int64 temp = 0;
int index, bit_position;

for index := 0 to 6 do
   temp |= 1 << cards[index];

for index := 0 to 6 do
begin
   bit_position = find_first_set(temp);
   temp &= ~(1 << bit_position);
   cards[index] = bit_position;
end;

这是桶排序的一个应用程序，它通常应该比任何建议的比较排序都要快。

注意：第二部分也可以通过在线性时间内迭代位来实现，但实际上它可能不会更快：

index = 0;
for bit_position := 0 to 51 do
begin
   if (temp & (1 << bit_position)) > 0 then
   begin
      cards[index] = bit_position;
      index++;
   end;
end;

score 1 · Accepted Answer

假设你在它的末尾需要一组卡片。

将原始卡片映射到 64 位整数（或任何 >= 52 位的整数）中的位。

如果在初始映射期间对数组进行了排序，请不要更改它。

将整数分成半字节 - 每个字节对应于 0x0 到 0xf 的值。

使用半字节作为相应排序子数组的索引。您将需要 13 组 16 个子数组（或仅 16 个子数组并使用第二个间接，或者执行位操作而不是查找答案；更快的将因平台而异）。

将非空子数组连接到最终数组中。

如果你愿意，你可以使用比 nibbles 更大的；bytes 将提供 7 组 256 个数组，并使非空数组更可能需要连接。

这假设分支很昂贵并且缓存的数组访问很便宜。

#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>

// for general case of 7 from 52, rather than assuming last 5 sorted
uint32_t card_masks[16][5] = {
    { 0, 0, 0, 0, 0 },
    { 1, 0, 0, 0, 0 },
    { 2, 0, 0, 0, 0 },
    { 1, 2, 0, 0, 0 },
    { 3, 0, 0, 0, 0 },
    { 1, 3, 0, 0, 0 },
    { 2, 3, 0, 0, 0 },
    { 1, 2, 3, 0, 0 },
    { 4, 0, 0, 0, 0 },
    { 1, 4, 0, 0, 0 },
    { 2, 4, 0, 0, 0 },
    { 1, 2, 4, 0, 0 },
    { 3, 4, 0, 0, 0 },
    { 1, 3, 4, 0, 0 },
    { 2, 3, 4, 0, 0 },
    { 1, 2, 3, 4, 0 },
};

void sort7 ( uint32_t* cards) {
    uint64_t bitset = ( ( 1LL << cards[ 0 ] ) | ( 1LL << cards[ 1LL ] ) | ( 1LL << cards[ 2 ] ) | ( 1LL << cards[ 3 ] ) | ( 1LL << cards[ 4 ] ) | ( 1LL << cards[ 5 ] ) | ( 1LL << cards[ 6 ] ) ) >> 1;

    uint32_t*   p    = cards;
    uint32_t    base = 0;

    do {
        uint32_t* card_mask = card_masks[ bitset & 0xf ];

        // you might remove this test somehow, as well as unrolling the outer loop
        // having separate arrays for each nibble would save 7 additions and the increment of base
        while ( *card_mask )
            *(p++) = base + *(card_mask++);

        bitset >>= 4;
        base += 4;
    } while ( bitset );
}

void print_cards ( uint32_t* cards ) {
    printf ( "[ %d %d %d %d %d %d %d ]\n", cards[0], cards[1], cards[2], cards[3], cards[4], cards[5], cards[6] );
}

int main ( void ) {
    uint32_t cards[7] = { 3, 9, 23, 17, 2, 42, 52 };

    print_cards ( cards );
    sort7 ( cards );
    print_cards ( cards );

    return 0;
}

score 1 · Accepted Answer

使用排序网络，就像在这个 C++ 代码中一样：

template<class T> 
inline void sort7(T data) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
//DD = Define Data, create a local copy of the data to aid the optimizer.
#define DD1(a)   register auto data##a=*(data+a);
#define DD2(a,b) register auto data##a=*(data+a);register auto data##b=*(data+b);
//CB = Copy Back
#define CB1(a)   *(data+a)=data##a;
#define CB2(a,b) *(data+a)=data##a;*(data+b)=data##b;
  DD2(1,2) SORT2(1,2)
  DD2(3,4) SORT2(3,4)
  DD2(5,6) SORT2(5,6)
  DD1(0) SORT2(0,2)
  SORT2(3,5) 
  SORT2(4,6) 
  SORT2(0,1)
  SORT2(4,5) 
  SORT2(2,6) CB1(6)
  SORT2(0,4) 
  SORT2(1,5)
  SORT2(0,3) CB1(0) 
  SORT2(2,5) CB1(5)
  SORT2(1,3) CB1(1) 
  SORT2(2,4) CB1(4)
  SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}

如果要向其传递迭代器或指针，请使用上面的函数；如果要向其一一传递七个参数，请使用下面的函数。template<>顺便说一句，使用模板允许编译器生成真正优化的代码，所以除非你想要 C 代码（或其他语言的代码），否则不要搭便车。

template<class T>
inline void sort7(T& e0, T& e1, T& e2, T& e3, T& e4, T& e5, T& e6) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
#define DD1(a)   register auto data##a=e##a;
#define DD2(a,b) register auto data##a=e##a;register auto data##b=e##b;
#define CB1(a)   e##a=data##a;
#define CB2(a,b) e##a=data##a;e##b=data##b;
  DD2(1,2) SORT2(1,2)
  DD2(3,4) SORT2(3,4)
  DD2(5,6) SORT2(5,6)
  DD1(0) SORT2(0,2)
  SORT2(3,5)
  SORT2(4,6)
  SORT2(0,1)
  SORT2(4,5)
  SORT2(2,6) CB1(6)
  SORT2(0,4)
  SORT2(1,5)
  SORT2(0,3) CB1(0)
  SORT2(2,5) CB1(5)
  SORT2(1,3) CB1(1)
  SORT2(2,4) CB1(4)
  SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}

score 0 · Accepted Answer

看看这个：

http://en.wikipedia.org/wiki/Sorting_algorithm

您需要选择一个具有稳定的最坏情况成本的...

另一种选择可能是始终保持数组排序，因此添加卡片将使数组保持自动排序，这样您就可以跳到排序...

score 0 · Accepted Answer

JRL 指的是桶排序。由于您有一组有限的离散可能值，因此您可以声明 52 个存储桶并在 O(1) 时间内将每个元素放入存储桶中。因此桶排序是 O(n)。在不保证有限数量的不同元素的情况下，最快的理论排序是 O(n log n)，类似于合并排序和快速排序。这只是最好和最坏情况的平衡。

但长答案短，使用桶排序。

score 0 · Accepted Answer

如果您喜欢上面提到的保持 52 元素数组始终保持数组排序的建议，那么您可以保留另一个 7 元素列表，该列表将引用 52 元素数组中的 7 个有效元素。这样我们甚至可以避免解析 52 个元素的数组。

我想这要真正有效，我们需要有一个支持操作的链表类型的结构：InsertAtPosition() 和 DeleteAtPosition() 并且要高效。

score 0 · Accepted Answer

答案中有很多循环。考虑到他的速度要求和数据集的小规模，我不会做任何循环。

我没有尝试过，但我怀疑最好的答案是完全展开的冒泡排序。它也可能从组装中获得相当大的优势。

不过，我想知道这是否是正确的方法。你将如何分析 7 张牌？我认为无论如何您最终都会将其转换为其他表示形式进行分析。4x13 数组不是更有用的表示吗？（无论如何，它会使排序问题变得毫无意义。）

score 0 · Accepted Answer

考虑到最后 5 个元素总是排序的：


for i := 0 to 1 do begin
  j := i;
  x := array[j];
  while (j+1 <= 6) and (array[j+1] < x) do begin
    array[j] := array[j+1];
    inc(j);
  end;
  array[j] := X;
end;

score 0 · Accepted Answer

0

冒泡排序是你的朋友。其他种类的开销代码太多，不适合少量元素

干杯

于 2009-09-12T20:04:48.153 回答

score 0 · Accepted Answer

这是您的基本 O(n) 排序。我不确定它与其他人相比如何。它使用展开的循环。

char card[7]; // the original table of 7 numbers in range 0..51
char table[52]; // workspace

// clear the workspace
memset(table, 0, sizeof(table));

// set the 7 bits corresponding to the 7 cards
table[card[0]] = 1;
table[card[1]] = 1;
...
table[card[6]] = 1;

// read the cards back out
int j = 0;
if (table[0]) card[j++] = 0;
if (table[1]) card[j++] = 1;
...
if (table[51]) card[j++] = 51;

score 0 · Accepted Answer

如果您正在寻找一个开销非常低的最佳排序，您应该创建一个排序网络。您可以使用 Bose-Nelson 算法为 7 整数网络生成代码。

在最坏的情况下，这将保证固定数量的比较和相同数量的交换。

生成的代码很丑陋，但它是最优的。

score 0 · Accepted Answer

您的数据位于排序数组中，如果需要，我假设您交换新的两个数据，因此也已排序，所以 a. 如果您想将其保留在适当的位置，请使用插入排序的形式；湾。如果你想让它在另一个数组中的结果通过复制进行合并。

对于小数字，二进制印章是多余的，而三元印章无论如何都是合适的：一张新卡大多喜欢分成两个和三个，即。2+3 或 3+2，两张牌分为单张和对子，例如 2+1+2。

所以放置较小的新卡片最节省时间空间的方法是与 a[1] 比较（即跳过 a[0]），然后向左或向右搜索以找到它应该替换的卡片，然后交换并向右移动（移动而不是冒泡），与较大的新卡进行比较，直到找到它的去向。在此之后，您将向前移动（两张卡已插入）。持有新卡（和交换）的变量应该是寄存器。

查找方法会更快，但会使用更多内存。

algorithm - 对 7 个整数的数组进行排序的最快方法是什么？

22 回答 22

Related

Reference