algorithm - 通过删除具有替代字符的子序列将二进制字符串减少为空字符串

Question

这是纳斯达克实习编码轮中提出的一个问题。

节目说明：

该程序将二进制字符串作为输入。我们必须连续删除所有字符交替的子序列，直到字符串为空。任务是找到这样做所需的最少步骤数。

示例1：
让字符串为：0111001
Removed-0101, Remaining-110
Removed-10 , Remaining-1
Removed-1
No of steps = 3

示例2 ：
让字符串为：111000111
Removed-101, Remaining-110011
Removed-101, Remaining-101
Removed-101
No of steps = 3

示例 3：
让字符串为：11011
Removed-101, Remaining-11
Removed-1 , Remaining-1
Removed-1
No of steps = 3

示例4 ：
让字符串为：10101
Removed-10101
No of steps = 1

我尝试的解决方案将二进制字符串的第一个字符视为我的子序列的第一个字符。然后创建一个新字符串，如果下一个字符不是交替序列的一部分，则将在其中附加下一个字符。新字符串成为我们的二进制字符串。以这种方式，循环继续，直到新字符串为空。（有点 O(n^2) 算法）。正如预期的那样，它给了我一个超时错误。在 C++ 中添加一个与我尝试过的代码有点相似的代码，它是用 Java 编写的。

    #include<bits/stdc++.h>
    using namespace std;
    
    int main() {
        string str, newStr;
        int len;
        char c;
        int count = 0;
        getline(cin, str);
        len = str.length();
    
        //continue till string is empty
        while(len > 0) {
            len = 0;
            c = str[0];
            for(int i=1; str[i] != '\0';i++) {
                //if alternative characters are found, set as c and avoid that character
                if(c != str[i]) 
                    c = str[i];
                //if next character is not alternate, add the character to newStr
                else {
                    newStr.push_back(str[i]);
                    len++;
                }
            }
            str = newStr;
            newStr = "";
            count++;
        }
        cout<<count<<endl;
        return 0;
    }

我还尝试了诸如查找相同连续字符的最大子序列的长度之类的方法，这显然不能满足所有情况，例如 example3。

希望有人可以帮助我为这个问题提供最优化的解决方案。最好是 C、C++ 或 python 中的代码。甚至算法也可以。

score 3 · Accepted Answer

我通过维护 Min-Heap 和 Look-up hashMap找到了更优化的O(NlogN)解决方案。

counts我们从0、1交替的初始数组开始。

也就是说，对于string= 0111001; 让我们假设我们的输入数组S=[1,3,2,1]

基本思路：

堆积计数数组
提取最小计数节点 => 添加到 num_steps
现在使用查找映射从堆中提取它的两个邻居（在节点类中维护）
合并这两个邻居并插入到堆中
重复步骤 2-4，直到堆中没有条目

Python中的代码实现

class Node:
    def __init__(self, node_type: int, count: int):
        self.prev = None
        self.next = None
        self.node_type = node_type
        self.node_count = count

    @staticmethod
    def compare(node1, node2) -> bool:
        return node1.node_count < node2.node_count


def get_num_steps(S: list): ## Example: S = [2, 1, 2, 3]
    heap = []
    node_heap_position_map = {} ## Map[Node] -> Heap-index
    prev = None
    type = 0
    for s in S:
        node: Node = Node(type, s)
        node.prev = prev
        if prev is not None:
            prev.next = node
        prev = node
        type = 1 - type

        # Add element to the map and also maintain the updated positions of the elements for easy lookup
        addElementToHeap(heap, node_heap_position_map, node)

    num_steps = 0
    last_val = 0
    while len(heap) > 0:
        # Extract top-element and also update the positions in the lookup-map
        top_heap_val: Node = extractMinFromHeap(heap, node_heap_position_map)
        num_steps += top_heap_val.node_count - last_val
        last_val = top_heap_val.node_count

        # If its the corner element, no merging is required
        if top_heap_val.prev is None or top_heap_val.next is None:
            continue

        # Merge the nodes adjacent to the extracted-min-node:
        prev_node = top_heap_val.prev
        next_node = top_heap_val.next

        removeNodeFromHeap(prev_node, node_heap_position_map)
        removeNodeFromHeap(next_node, node_heap_position_map)
        del node_heap_position_map[prev_node]
        del node_heap_position_map[next_node]
        
        # Created the merged-node for neighbours and add to the Heap; and update the lookup-map
        merged_node = Node(prev_node.node_type, prev_node.node_count + next_node.node_count)
        merged_node.prev = prev_node.prev
        merged_node.next = next_node.next
        addElementToHeap(heap, node_heap_position_map, merged_node)

    return num_steps

PS：我还没有实现上面的最小堆操作，但是函数方法名称是同名的。

score 2 · Accepted Answer

我们可以在O(n)时间和O(1)空间上解决这个问题。

这根本与秩序无关。当您考虑它时，实际任务是如何将字符串划分为由交替字符组成的最少数量的子序列（允许单个字符）。只需维护两个队列或堆栈；一个代表 1，另一个代表 0，其中角色弹出他们的直接替代前辈。记录迭代期间任意时间队列的长度（不包括替换移动）。

例子：

(1)

0111001

   queues
1  1   -
0  -   0
0  -   00
1  1   0
1  11  -
1  111 -  <- max 3
0  11  0

对于O(1)空间，队列可以只是代表当前计数的两个数字。

(2)

111000111
   queues (count of 1s and count of 0s)
1  1  0
1  2  0
1  3  0  <- max 3
0  2  1
0  1  2
0  0  3  <- max 3
1  1  2
1  2  1
1  3  0  <- max 3

(3)

11011
   queues
1  1  0
1  2  0
0  1  1
1  2  0
1  3  0  <- max 3

(4)

10101

   queues
1  1  0  <- max 1
0  0  1  <- max 1
1  1  0  <- max 1
0  0  1  <- max 1
1  1  0  <- max 1

score 2 · Accepted Answer

我不会写完整的代码。但是我有一个想法，它可能会足够快（肯定比构建所有中间字符串要快）。

读取输入并将其更改为由相同字符的序列长度组成的表示。因此 11011 用一个指定它的结构来表示，例如[{length: 2, value: 1}, {length: 1, value: 0}, {length: 2, value: 1}]. 通过一些聪明的做法，您可以完全放弃这些值并将其表示为[2, 1, 2]- 我将把它作为练习留给读者。

使用该表示，您知道您可以从每个“步骤”中相同字符的每个已识别序列中删除一个值。您可以执行此操作的次数等于任何这些序列的最小长度。

因此，您确定最小序列长度，将其添加到您正在跟踪的操作总数中，然后从每个序列的长度中减去它。

之后，您需要处理长度为 0 的序列。- 删除它们，然后如果有任何相同值的相邻序列，合并它们（将长度加在一起，删除一个）。如果您要使用忘记值的表示，则此合并步骤需要小心。

一直重复这个直到什么都没有。它应该比处理字符串操作运行得快一些。

可能还有更好的在进行此表示之后根本不迭代步骤的方法，只是检查从开始到结束的序列的长度。我还没有弄清楚这种方法到底是什么，但我有理由相信它会存在。在尝试了我上面概述的内容之后，解决这个问题是个好主意。我有一种感觉 - 从 0 开始总计，跟踪最小和最大总计范围。从字符串的开头扫描每个值，每遇到一个 1 就将总数加 1，遇到每个 0 就减去 1。答案是总计达到的最小值和最大值的绝对值中的较大者。- 我还没有证实，这只是一种预感。

score 0 · Accepted Answer

时间复杂度 - O(n)

void solve(string s) {
    int n = s.size();
    int zero = 0, One = 0, res = 0;
    
    for (int i = 0; i < n; i++) 
    {
        if (s[i] == '1') 
        {
            if (zero > 0) 
                zero--;
            else 
                res++;
            
            One++;
        }
        
        else
        {
            if (One > 0) 
                One--;
            else 
                res++;
            
            zero++;
        }
    }
    cout << res << endl;
}

algorithm - 通过删除具有替代字符的子序列将二进制字符串减少为空字符串

4 回答 4

基本思路：

Python中的代码实现

Related

Reference