algorithm - 如何在线性时间内“排序”两个可能值的元素？

Question

假设我有一个函数f和元素数组。

该函数返回A或B为任何元素；您可以通过这种方式可视化元素ABBAABABAA。

我需要根据函数对元素进行排序，所以结果是：AAAAAABBBB

值的数量A不必等于B值的数量。元素的总数可以是任意的（不是固定的）。请注意，您不对字符进行排序，而是对具有单个字符表示的对象进行排序。

还有几件事：

排序应该花费线性时间 - O(n),
它应该在原地执行，
它应该是一个稳定的排序。

有任何想法吗？

注意：如果上述情况不可行，您是否对牺牲上述要求之一的算法有想法？

score 4 · Accepted Answer

如果它必须是线性的和就地的，你可以做一个半稳定的版本。半稳定的意思是A或者B可能是稳定的，但不是两者兼而有之。与 Dukeling 的答案类似，但您将两个迭代器从同一侧移动：

a = first A
b = first B
loop while next A exists
    if b < a
        swap a,b elements
        b = next B
        a = next A
    else
        a = next A

使用示例字符串ABBAABABAA，您将获得：

ABBAABABAA
AABBABABAA
AAABBBABAA
AAAABBBBAA
AAAAABBBBA
AAAAAABBBB

在每一回合，如果您进行交换，则两者都移动，否则，您只需移动a。这将保持A稳定，但B会失去其顺序。为了保持B稳定，请从头开始，然后向左工作。

可以完全稳定地做到这一点，但我不知道如何。

score 2 · Accepted Answer

使用其他给定的约束可能无法进行稳定的排序，因此这里有一个类似于quick-sort的分区步骤的不稳定排序。

有 2 个迭代器，一个从左侧开始，一个从右侧开始。
当B右边有一个迭代器时，递减迭代器。
当A左边有一个迭代器时，增加迭代器。
如果迭代器没有相互交叉，交换它们的元素并从 2 开始重复。

score 0 · Accepted Answer

Firstly, assuming the array of A's and B's is either generated or read-in, I wonder why not avoid this question entirely by simply applying f as the list is being accumulated into memory into two lists that would subsequently be merged.

Otherwise, we can posit an alternative solution in O(n) time and O(1) space that may be sufficient depending on Sir Bohumil's ultimate needs:

Traverse the list and sort each segment of 1,000,000 elements in-place using the permutation cycles of the segment (once this step is done, the list could technically be sorted in-place by recursively swapping the inner-blocks, e.g., ABB AAB -> AAABBB, but that may be too time-consuming without extra space). Traverse the list again and use the same constant space to store, in two interval trees, the pointers to each block of A's and B's. For example, segments of 4,

ABBAABABAA => AABB AABB AA + pointers to blocks of A's and B's

Sequential access to A's or B's would be immediately available, and random access would come from using the interval tree to locate a specific A or B. One option could be to have the intervals number the A's and B's; e.g., to find the 4th A, look for the interval containing 4.

For sorting, an array of 1,000,000 four-byte elements (3.8MB) would suffice to store the indexes, using one bit in each element for recording visited indexes during the swaps; and two temporary variables the size of the largest A or B. For a list of one billion elements, the maximum combined interval trees would number 4000 intervals. Using 128 bits per interval, we can easily store numbered intervals for the A's and B's, and we can use the unused bits as pointers to the block index (10 bits) and offset in the case of B (20 bits). 4000*16 bytes = 62.5KB. We can store an additional array with only the B blocks' offsets in 4KB. Total space under 5MB for a list of one billion elements. (Space is in fact dependent on n but because it is extremely small in relation to n, for all practical purposes, we may consider it O(1).)

Time for sorting the million-element segments would be - one pass to count and index (here we can also accumulate the intervals and B offsets) and one pass to sort. Constructing the interval tree is O(nlogn) but n here is only 4000 (0.00005 of the one-billion list count). Total time O(2n) = O(n)

score 0 · Accepted Answer

可以说， Object_Array[1...N]

Type_A objs are A1,A2,...Ai

Type_B objs are B1,B2,...Bj

i+j = N

FOR i=1 :N
    if Object_Array[i] is of Type_A
       obj_A_count=obj_A_count+1
    else
       obj_B_count=obj_B_count+1
LOOP

用它们各自的计数填充结果数组，obj_A具体obj_B取决于obj_A > obj_B

score 0 · Accepted Answer

如果您的数据结构是链表而不是数组，您应该能够满足所有三个约束条件。您只需浏览列表并累积和移动“B”将是微不足道的指针更改。伪代码如下：

sort(list) {
    node = list.head, blast = null, bhead = null
    while(node != null) {
        nextnode = node.next
        if(node.val == "a") { 
            if(blast != null){              
                //move the 'a' to the front of the 'B' list
                bhead.prev.next = node, node.prev = bhead.prev
                blast.next = node.next, node.next.prev = blast
                node.next = bhead, bhead.prev = node
            }
        }
        else if(node.val == "b") { 
            if(blast == null)
                bhead = blast = node
            else //accumulate the "b"s.. 
                blast = node
        }

3

        node = nextnode
    }
}

因此，您可以在数组中执行此操作，但是模拟列表交换的 memcopies 会使大型数组的速度变慢。

score 0 · Accepted Answer

对于双向链表，以下内容应在线性时间内起作用。因为涉及多达 N 次插入/删除，这可能会导致数组的二次时间。

在“排序”之后找到第一个 B 应该在的位置。这可以通过计数 As 在线性时间内完成。
从3个迭代器开始：iterA从容器的开头开始，iterB从上述As和Bs应该相遇的位置开始，iterMiddle从iterB之前的一个元素开始。
使用 iterA 跳过 As，找到第一个 B，然后将对象从 iterA 移动到 iterB->previous 位置。现在 iterA 指向被移动元素之前所在位置之后的下一个元素，而被移动元素现在就在 iterB 之前。
继续第 3 步，直到到达 iterMiddle。之后 first() 和 iterB-1 之间的所有元素都是 As。
现在将 iterA 设置为 iterB-1。
用 iterB 跳过 B。当找到 A 时，将其移动到 iterA 之后并增加 iterA。
继续第 6 步，直到 iterB 到达 end()。

这将作为任何容器的稳定排序。该算法包括 O(N) 插入/删除，对于具有 O(1) 次插入/删除的容器，这是线性时间，但是，对于数组来说，O(N^2) 是线性时间。在您的情况下的适用性取决于容器是否是数组而不是列表。

score -1 · Accepted Answer

这应该可以通过一些动态编程来实现。

它的工作原理有点像计数排序，但有一个关键区别。为 a 和 b count_a[n] 和 count_b[n] 创建大小为 n 的数组。用索引 i 之前有多少个 As 或 B 填充这些数组。

仅在一个循环之后，我们就可以使用这些数组来查找 O(1) 中任何元素的正确索引。像这样：

int final_index(char id, int pos){
    if(id == 'A')
      return count_a[pos];
    else
      return count_a[n-1] + count_b[pos];
}

最后，为了满足总 O(n) 要求，交换需要以智能顺序完成。一个简单的选择是使用递归交换过程，该过程实际上不执行任何交换，直到两个元素都被放置在正确的最终位置。编辑：这实际上是不正确的。即使是简单的交换也会有 O(n) 的交换。但是执行这种递归策略将为您提供绝对最低要求的交换。

请注意，在一般情况下，这将是非常糟糕的排序算法，因为它具有 O(n * 元素值范围) 的内存需求。

algorithm - 如何在线性时间内“排序”两个可能值的元素？

7 回答 7

Related

Reference