c - 查找包含另一个字符串中的所有字符的最小窗口的长度

Question

最近我接受了采访。我做得不好，因为我被以下问题困住了

假设给定一个序列： ADCBDABCDACD 并且搜索序列如下： ACD

任务是在给定的字符串中找到开始和结束索引，其中包含搜索字符串的所有字符，保留顺序。

输出：假设索引从 1 开始：

开始索引 10 结束索引 12

解释：

1.start/end 索引分别不是 1/3，因为它们虽然包含字符串，但没有保持顺序

2.开始/结束索引分别不是1/5，因为它们虽然包含顺序中的字符串但长度不是最佳的

3.开始/结束索引分别不是6/9，因为虽然它们包含顺序中的字符串但长度不是最佳的

请通过如何找到包含给定字符串中所有字符的最小子字符串？.

但是上面的问题是不同的，因为订单没有被维护。我仍在努力维护索引。任何帮助，将不胜感激。谢谢

score 4 · Accepted Answer

我试着写一些简单的c代码来解决这个问题：

更新：

我编写了一个search函数，它以正确的顺序查找所需的字符，返回窗口的长度并将窗口起点存储到ìnt * startAt. int start该函数处理从指定起点到终点的给定干草的子序列

算法的其余部分位于main所有可能的子序列都经过小优化测试的地方：我们在前一个窗口的起点之后开始寻找下一个窗口，因此我们跳过了一些不必要的转弯。在此过程中，我们会跟踪“迄今为止的最佳解决方案”

复杂度为 O(n*n/2)

更新2：

不必要的依赖已被删除，不必要的后续调用strlen(...)已被传递给的大小参数替换search(...)

#include <stdio.h>

// search for single occurrence
int search(const char hay[], int haySize, const char needle[], int needleSize, int start, int * startAt)
{
    int i, charFound = 0;

    // search from start to end
    for (i = start; i < haySize; i++)
    {
        // found a character ?
        if (hay[i] == needle[charFound])
        {               
            // is it the first one?
            if (charFound == 0) 
                *startAt = i;   // store starting position
            charFound++;    // and go to next one
        }
        // are we done?
        if (charFound == needleSize)
            return i - *startAt + 1;    // success
    }
    return -1;  // failure
}

int main(int argc, char **argv)
{

    char hay[] = "ADCBDABCDACD";
    char needle[] = "ACD";

    int resultStartAt, resultLength = -1, i, haySize = sizeof(hay) - 1, needleSize = sizeof(needle) - 1;

    // search all possible occurrences
    for (i = 0; i < haySize - needleSize; i++)
    {
        int startAt, length;

        length = search(hay, haySize, needle, needleSize, i, &startAt);

        // found something?
        if (length != -1)
        {
            // check if it's the first result, or a one better than before
            if ((resultLength == -1) || (resultLength > length))
            {
                resultLength = length;
                resultStartAt = startAt;
            }
            // skip unnecessary steps in the next turn
            i = startAt;
        }
    }

    printf("start at: %d, length: %d\n", resultStartAt, resultLength);

    return 0;
}

score 2 · Accepted Answer

从字符串的开头开始。

如果遇到 A，则标记该位置并将其压入堆栈。之后，继续按顺序检查字符直到
1。如果遇到 A，将 A 的位置更新为当前值。
2. 如果遇到 C，将其压入堆栈。

遇到 C 后，再次继续按顺序检查字符，直到，
1. 如果遇到 D，擦除包含 A 和 C 的堆栈，并为该子序列标记从 A 到 D 的分数。
2.如果遇到A，则开始另一个Stack并标记该位置。
2a。如果现在遇到 C，则擦除较早的堆栈并保留最新的堆栈。
2b。如果遇到 D，则擦除较旧的堆栈并标记分数并检查它是否小于当前的最佳分数。

继续这样做，直到到达字符串的末尾。

伪代码可以是这样的：

Initialize stack = empty;
Initialize bestLength = mainString.size() + 1; // a large value for the subsequence.
Initialize currentLength = 0;
for ( int i = 0; i < mainString.size(); i++ ) {

  if ( stack is empty ) {
    if ( mainString[i] == 'A' ) {
      start a new stack and push A on it.
      mark the startPosition for this stack as i.
    }
    continue;
  }

  For each of the stacks ( there can be at most two stacks prevailing, 
                           one of size 1 and other of size 0 ) {
    if ( stack size == 1 ) // only A in it {
      if ( mainString[i] == 'A' ) {
        update the startPosition for this stack as i.
      }
      if ( mainString[i] == 'C' ) {
        push C on to this stack.
      }
    } else if ( stack size == 2 ) // A & C in it {
      if ( mainString[i] == 'C' ) {
        if there is a stack with size 1, then delete this stack;// the other one dominates this stack.
      }
      if ( mainString[i] == 'D' ) {
        mark the score from startPosition till i and update bestLength accordingly.
        delete this stack.
      }
    }

  }

}

score 0 · Accepted Answer

我使用单个队列修改了我之前的建议，现在我相信这个算法会随着O(N*m)时间的推移而运行：

FindSequence(char[] sequenceList)
{
    queue startSeqQueue;
    int i = 0, k;
    int minSequenceLength = sequenceList.length + 1;
    int startIdx = -1, endIdx = -1;

    for (i = 0; i < sequenceList.length - 2; i++)
    {
        if (sequenceList[i] == 'A')
        {
            startSeqQueue.queue(i);
        }
    }

    while (startSeqQueue!=null)
    {
        i = startSeqQueue.enqueue();
        k = i + 1;

        while (sequenceList.length < k && sequenceList[k] != 'C')
            if (sequenceList[i] == 'A') i = startSeqQueue.enqueue();
            k++;

        while (sequenceList.length < k && sequenceList[k] != 'D')
            k++;

        if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
        {
            startIdx = i;
            endIdx = j;
            minSequenceLength = k - i + 1;
        }
    }

    return startIdx & endIdx
}

我之前的（O（1）内存）建议：

FindSequence(char[] sequenceList)
{
    int i = 0, k;
    int minSequenceLength = sequenceList.length + 1;
    int startIdx = -1, endIdx = -1;

    for (i = 0; i < sequenceList.length - 2; i++)
        if (sequenceList[i] == 'A')
            k = i+1;
            while (sequenceList.length < k && sequenceList[k] != 'C')
                k++;
            while (sequenceList.length < k && sequenceList[k] != 'D')
                k++;

            if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
            {
                startIdx = i;
                endIdx = j;
                minSequenceLength = k - i + 1;
            }

    return startIdx & endIdx;
}

score 0 · Accepted Answer

这是我的版本。它跟踪可能的候选者以获得最佳解决方案。对于干草中的每个字符，它检查该字符是否在每个候选者的序列中。然后它选择最短的候选人。很简单。

class ShortestSequenceFinder
{
    public class Solution
    {
        public int StartIndex;
        public int Length;
    }

    private class Candidate
    {
        public int StartIndex;
        public int SearchIndex;
    }

    public Solution Execute(string hay, string needle)
    {
        var candidates = new List<Candidate>();
        var result = new Solution() { Length = hay.Length + 1 };
        for (int i = 0; i < hay.Length; i++)
        {
            char c = hay[i];
            for (int j = candidates.Count - 1; j >= 0; j--)
            {
                if (c == needle[candidates[j].SearchIndex])
                {
                    if (candidates[j].SearchIndex == needle.Length - 1)
                    {
                        int candidateLength = i - candidates[j].StartIndex;
                        if (candidateLength < result.Length)
                        {
                            result.Length = candidateLength;
                            result.StartIndex = candidates[j].StartIndex;
                        }
                        candidates.RemoveAt(j);
                    }
                    else
                    {
                        candidates[j].SearchIndex += 1;
                    }
                }
            }
            if (c == needle[0])
                candidates.Add(new Candidate { SearchIndex = 1, StartIndex = i });
        }
        return result;
    }
}

它在 O(n*m) 中运行。

score 0 · Accepted Answer

这是我在 Java 中的 O(m*n) 算法：

class ShortestWindowAlgorithm {

    Multimap<Character, Integer> charToNeedleIdx; // Character -> indexes in needle, from rightmost to leftmost | Multimap is a class from Guava
    int[] prefixesIdx; // prefixesIdx[i] -- rightmost index in the hay window that contains the shortest found prefix of needle[0..i]
    int[] prefixesLengths; // prefixesLengths[i] -- shortest window containing needle[0..i]

    public int shortestWindow(String hay, String needle) {
        init(needle);
        for (int i = 0; i < hay.length(); i++) {
            for (int needleIdx : charToNeedleIdx.get(hay.charAt(i))) {
                if (firstTimeAchievedPrefix(needleIdx) || foundShorterPrefix(needleIdx, i)) {
                    prefixesIdx[needleIdx] = i;
                    prefixesLengths[needleIdx] = getPrefixNewLength(needleIdx, i);
                    forgetOldPrefixes(needleIdx);
                }
            }
        }
        return prefixesLengths[prefixesLengths.length - 1];
    }

    private void init(String needle) {
        charToNeedleIdx = ArrayListMultimap.create();
        prefixesIdx = new int[needle.length()];
        prefixesLengths = new int[needle.length()];
        for (int i = needle.length() - 1; i >= 0; i--) {
            charToNeedleIdx.put(needle.charAt(i), i);
            prefixesIdx[i] = -1;
            prefixesLengths[i] = -1;
        }
    }

    private boolean firstTimeAchievedPrefix(int needleIdx) {
        int shortestPrefixSoFar = prefixesLengths[needleIdx];
        return shortestPrefixSoFar == -1 && (needleIdx == 0 || prefixesLengths[needleIdx - 1] != -1);
    }

    private boolean foundShorterPrefix(int needleIdx, int hayIdx) {
        int shortestPrefixSoFar = prefixesLengths[needleIdx];
        int newLength = getPrefixNewLength(needleIdx, hayIdx);
        return newLength <= shortestPrefixSoFar;
    }

    private int getPrefixNewLength(int needleIdx, int hayIdx) {
        return needleIdx == 0 ? 1 : (prefixesLengths[needleIdx - 1] + (hayIdx - prefixesIdx[needleIdx - 1]));
    }

    private void forgetOldPrefixes(int needleIdx) {
        if (needleIdx > 0) {
            prefixesLengths[needleIdx - 1] = -1;
            prefixesIdx[needleIdx - 1] = -1;
        }
    }
}

它适用于每个输入，还可以处理重复的字符等。

这里有些例子：

public class StackOverflow {

    public static void main(String[] args) {
        ShortestWindowAlgorithm algorithm = new ShortestWindowAlgorithm();
        System.out.println(algorithm.shortestWindow("AXCXXCAXCXAXCXCXAXAXCXCXDXDXDXAXCXDXAXAXCD", "AACD")); // 6
        System.out.println(algorithm.shortestWindow("ADCBDABCDACD", "ACD")); // 3
        System.out.println(algorithm.shortestWindow("ADCBDABCD", "ACD")); // 4
    }

score 0 · Accepted Answer

我没有在这里阅读所有答案，但我认为没有人注意到这只是局部成对序列对齐的受限版本，其中我们只允许插入字符（而不是删除或替换它们）。因此，它将通过简化Smith-Waterman算法来解决，该算法仅考虑每个顶点 2 种情况（通过精确匹配字符或通过插入字符到达顶点）而不是 3 种情况。这个算法是 O(n^2)。

score 0 · Accepted Answer

这是我在 Python 中的解决方案。它返回假设 0 索引序列的索引。因此，对于给定的示例，它返回(9, 11)而不是(10, 12). 显然，(10, 12)如果您愿意，可以很容易地对其进行变异以返回。

def solution(s, ss):
    S, E = [], []
    for i in xrange(len(s)):
        if s[i] == ss[0]:
            S.append(i)
        if s[i] == ss[-1]:
            E.append(i)
    candidates = sorted([(start, end) for start in S for end in E
                        if start <= end and end - start >= len(ss) - 1],
                        lambda x,y: (x[1] - x[0]) - (y[1] - y[0]))
    for cand in candidates:
        i, j = cand[0], 0
        while i <= cand[-1]:
            if s[i] == ss[j]:
                j += 1
            i += 1
        if j == len(ss):
            return cand

用法：

>>> from so import solution
>>> s = 'ADCBDABCDACD'
>>> solution(s, 'ACD')
(9, 11)
>>> solution(s, 'ADC')
(0, 2)
>>> solution(s, 'DCCD')
(1, 8)
>>> solution(s, s)
(0, 11)
>>> s = 'ABC'
>>> solution(s, 'B')
(1, 1)
>>> print solution(s, 'gibberish')
None

我认为时间复杂度是 O(p log(p)) 其中 p 是序列中引用的索引对的数量，search_sequence[0]并且search_sequence[-1]索引search_sequence[0]小于索引，search_sequence[-1]因为它使用 O( n log n) 算法。但话又说回来，我最后的子字符串迭代可能完全掩盖了排序步骤。我不太确定。

它可能具有最坏情况的时间复杂度，以 O(n*m) 为界，其中 n 是序列的长度，m 是搜索序列的长度，但目前我想不出一个最坏情况的例子.

score 0 · Accepted Answer

这是我的解决方案。它遵循模式匹配解决方案之一。如果我错了，请评论/纠正我。

给定问题中的输入字符串 A D C B D A B C D A C D。让我们首先计算A出现的索引。假设从零开始的索引应该是[0,5,9].

现在伪代码如下。

    Store the indices of A in a list say *orders*.// orders=[0,5,9]
    globalminStart, globalminEnd=0,localMinStart=0,localMinEnd=0;
    for (index: orders)
     {
       int i =index;
       Stack chars=new Stack();// to store the characters
      i=localminStart;
     while(i< length of input string)
       { 
           if(str.charAt(i)=='C') // we've already seen A, so we look for C
           st.push(str.charAt(i));
           i++;
           continue;
           else if(str.charAt(i)=='D' and st.peek()=='C')
           localminEnd=i; // we have a match! so assign value of i to len
           i+=1;
           break;
           else if(str.charAt(i)=='A' )// seen the next A
           break;
    }
     if (globalMinEnd-globalMinStart<localMinEnd-localMinStart)
     {
       globalMinEnd=localMinEnd;
       globalMinStart=localMinStart;
     }
   }

    return [globalMinstart,globalMinEnd]
    }

PS：这是伪代码和粗略的想法。我很乐意纠正它并理解是否有问题。

AFAIC 时间复杂度 -O(n)。空间复杂度 O(n)

c - 查找包含另一个字符串中的所有字符的最小窗口的长度

8 回答 8

Related

Reference