string - 查找包含另一个字符串的一些变位词作为子序列的字符串的子字符串数

Question

我们必须找到包含另一个字符串的一些字谜作为子序列的字符串的子字符串的数量。

仅当开始或结束位置不同时，子字符串才被认为是不同的。

String="aba"
anotherString="a"

Occurence of "a" in "aba" is as follows :

a     at index 0..0
ab    at index 0..1
aba   at index 0..2
ba    at index 1..2
a     at index 2..2

i.e total of 5 times...so o/p=5
(the start and end points here, are inclusive)

我认为这个问题是“字符串中子序列的出现次数”和“在包含另一个字符串的所有字符的字符串中查找最小窗口”的应用之一。

但即使在组合代码中进行了多次更改后，我也无法提出解决方案。粘贴我的代码是没有用的，因为我知道我错了。我想知道的是我们如何在没有暴力解决方案的情况下有效地解决这个问题。

代码：

public static void findLengthAndSequence(String str1,String str2){

    int begin=0,biginWith=0,endWith=0,count=0,minLen=Integer.MAX_VALUE,len=0;
    int l=0;

    int [] hasFound=new int[256];
    int [] toFound=new int[256];

    for(int i=0;i<str2.length();++i){           
        toFound[(int)str2.charAt(i)]++;
    }

    for(int end=0;end<str1.length();++end){
        if(toFound[(int)str1.charAt(end)]==0)
            continue;
        hasFound[(int)str1.charAt(end)]++;
        if(hasFound[(int)str1.charAt(end)]<=toFound[(int)str1.charAt(end)]){
            count++;
        }

        if(count==str2.length()){
            l++;        //add this to find number of such anagram in string
            System.out.println("l= "+l+" "+begin+" "+end); 
            while(toFound[(int)str1.charAt(begin)]==0 || hasFound[(int)str1.charAt(begin)]>toFound[(int)str1.charAt(begin)]  )
            {
                if(hasFound[(int)str1.charAt(begin)]>toFound[(int)str1.charAt(begin)]){
                    hasFound[(int)str1.charAt(begin)]-=1;                       
                }
                begin++;
            }//while
        len=end-begin+1;
        if(minLen>len){
            minLen=len;
            endWith=end;
            biginWith=begin;
        }
    }//if   
    }//end

    for(int i=biginWith;i<=endWith;++i){
        System.out.print(""+str1.charAt(i));
    }
}

此代码为上述问题提供输出=3。我知道一旦到达第一个字符串的末尾，我就无法检查其中的每个子字符串。

e.g in "aba" my code checks for a,ab,aba.but once I reach the end it will not check   
ba,a .since we need to count this also as they are having different index values.

除了指数时间复杂度的蛮力检查每个可能的子字符串之外，还有其他方法吗？

score 5 · Accepted Answer

这是一个具有时间复杂度的简单解决方案O(n + m)（我假设字母大小是一个常数（n第一个字符串的长度（我们想要计算子字符串的字符串）m是第二个字符串的长度（字谜字符串））。我将调用包含第二个字符串“好”的字谜的子字符串。

让我们定义为字符串count(x, y)中字符的出现次数。然后一个任意字符串包含一个字符串的变位词作为子序列当且仅当对于所有（证明很简单，所以我将省略它）。yxstcount(s, c) >= count(t, c)c
让我们将子字符串定义firstRight(L)为最小R的[L, R]子字符串（可能没有这样的子字符串R）。然后firstRight(L) <= firstRight(L + 1)对于所有有效L（因为 1. 和count(x, y)函数的属性）。
陈述 1. 意味着任何字符串都可以表示为带有alphabetSize元素的向量，其中i该向量的第 - 个元素是字符的出现次数i。陈述 2. 暗示我们可以使用两个指针。

所以这个算法的伪代码是这样的：

def getCharacterVector(string s):
    result = a vector filled with zeros
    for c in s
        result[c]++
    return result

// Checks that all elements of the first vector
// are greater than or equal to the corresponding
// elements of the second vector
def isGreaterOrEqual(first, second)
    for i = 0 ... length(first)
        if first[i] < second[i]
            return false
    return true

def countSubstrings(string s, string t)
    vT = getCharacterVector(t)
    vS = a vector filled with zeros
    right = 0
    // computes firstRight(0)
    while (right < length(s) and not isGreaterOrEqual(vS, vT))
        vS[s[right]]++
        right++
    if not isGreaterOrEqual(vS, vT) // firstRight(0) is undefined
        return 0 // there are no such substrings
    res = length(s) - right + 1
    for left = 1 ... length(s) - 1
        vS[s[left - 1]]--
        // computes firstRight(left)
        while right < length(s) and vS[s[left - 1]] < vT[s[left - 1]]
            vS[s[right]]++
            right++
        if vS[s[left - 1]] < vT[s[left - 1]] // firstRight(left) is undefined
            break // we are done
         res += length(s) - right + 1
    return res

这里的想法是两个计算从固定位置开始并在任何地方结束的好子字符串的数量，并使用两个指针来有效地调整右边框。这个实现的时间复杂度是O(N * ALPHABET_SIZE + M)（O(N + M)如果我们将字母大小视为一个常数），但实际上可以firstRight(0)通过跟踪和向量中的“坏”位置vS并将vT这个向量表示为哈希表来更有效地进行计算O(N + M)无论字母大小如何，都可以实现复杂性。

string - 查找包含另一个字符串的一些变位词作为子序列的字符串的子字符串数

1 回答 1

Related

Reference