string - 从字符串中删除所有出现的子字符串

Question

给定一个字符串S和一组n子字符串。n从中删除这些子字符串的每个实例，S使其S具有最小长度并输出此最小长度。

示例 1

S = ccdaabcdbb
n = 2
substrings = ab, cd

输出

解释：

ccdaabcdbb -> ccdacdbb -> cabb -> cb (length=2)

示例 2

S = abcd
n = 2
substrings = ab,bcd

输出

我该如何解决这个问题？

score 3 · Accepted Answer

一个简单的蛮力搜索算法是：

对于每个子字符串，尝试所有可能的方法将其从字符串中删除，然后递归。

在伪代码中：

def min_final_length (input, substrings):
    best = len(input)
    for substr in substrings:
        beg = 0
        // find all occurrences of substr in input and recurse
        while (found = find_substring(input, substr, from=beg)):
            input_without_substr = input[0:found]+input[found+len(substr):len(input)]
            best = min(best, min_final_length(input_without_substr,substrings))
            beg = found+1
    return best

让复杂度是F(S,n,l)字符串的S长度，是集合的基数，是子字符串的“特征长度”。然后inputnsubstringsl

F(S,n,l) ~ n * ( S * l + F(S-l,n,l) )

看起来最多O(S^2*n*l)。

score 1 · Accepted Answer

以下解决方案的复杂度为 O(m * n)，其中 m = len(S)，n 是子字符串的数量

def foo(S, sub):
    i = 0
    while i < len(S):
        for e in sub:
            if S[i:].startswith(e):
                S = S[:i] + S[i+len(e):]
                i -= 1
                break
        else: i += 1
    return S, i

score 0 · Accepted Answer

如果您是为了原始性能并且您的字符串非常大，那么您可以做得比蛮力更好。使用后缀 trie（例如，Ukkonnen trie）来存储您的字符串。然后找到每个子串（我们在 O(m) 时间内完成，m 是子串长度），并将子串的偏移量和长度存储在一个数组中。然后使用偏移量和长度信息通过用 \0 （在 C 中）或其他占位符字符填充这些区域来实际删除子字符串。通过计算所有非 Null 字符，您将获得字符串的最小长度。

这也将处理重叠的子字符串，例如说您的字符串是“abcd”，并且您有两个子字符串“ab”和“abcd”。

score 0 · Accepted Answer

我使用 trie+dp 解决了它。
首先在 trie 中插入您的子字符串。然后将 dp 的状态定义为某个字符串，遍历该字符串并将每个 i (for i =0 .. s.length()) 视为某个子字符串的开始。让 j=i 并增加 j，只要你在 trie 中有一个后缀（这肯定会让你至少找到一个子字符串，如果你在某些子字符串之间有共同的后缀，可能会更多，例如“abce”和“abdd” )，每当遇到某个子字符串的结尾时，就去解决新的子问题并找到所有子字符串缩减之间的最小值。

这是我的代码。不用担心代码的长度。只需阅读求解函数并忘记路径，我将其包括在内以打印形成的字符串。

struct node{
    node* c[26];
    bool str_end;
    node(){
        for(int i= 0;i<26;i++){
            c[i]=NULL;
        }
        str_end= false;
    }
};
class Trie{
public:
    node* root;
    Trie(){
        root = new node();
    }
    ~Trie(){
        delete root;
    }
};
class Solution{
public:
    typedef pair<int,int>ii;
    string get_str(string& s,map<string,ii>&path){
        if(!path.count(s)){
            return s;
        }
        int i= path[s].first;
        int j= path[s].second;
        string new_str =(s.substr(0,i)+s.substr(j+1));
        return get_str(new_str,path);
    }
    int solve(string& s,Trie* &t, map<string,int>&dp,map<string,ii>&path){
        if(dp.count(s)){
            return dp[s];
        }
        int mn= (int)s.length();
        for(int i =0;i<s.length();i++){
            string left = s.substr(0,i);
            node* cur = t->root->c[s[i]-97];
            int j=i;
            while(j<s.length()&&cur!=NULL){
                if(cur->str_end){
                    string new_str =left+s.substr(j+1);
                    int ret= solve(new_str,t,dp,path);
                    if(ret<mn){
                        path[s]={i,j};
                    }
                }
                cur = cur->c[s[++j]-97];
            }
        }
        return dp[s]=mn;
    }
    string removeSubstrings(vector<string>& substrs, string s){
        map<string,ii>path;
        map<string,int>dp;
        Trie*t = new Trie();
        for(int i =0;i<substrs.size();i++){
            node* cur = t->root;
            for(int j=0;j<substrs[i].length();j++){
                if(cur->c[substrs[i][j]-97]==NULL){
                    cur->c[substrs[i][j]-97]= new node();
                }
                cur = cur->c[substrs[i][j]-97];
                if(j==substrs[i].length()-1){
                    cur->str_end= true;
                }
            }
        }
        solve(s,t,dp,path);
        return get_str(s, path);
    }
};

int main(){
    vector<string>substrs;
    substrs.push_back("ab");
    substrs.push_back("cd");
    Solution s;
    cout << s.removeSubstrings(substrs,"ccdaabcdbb")<<endl;
    return 0;
}

string - 从字符串中删除所有出现的子字符串

4 回答 4

Related

Reference