java - 找到最小的子段

Question

我给出了一个长句子和一些单词（要在句子中搜索），我必须找到句子的最小部分，其中包含要在该句子中搜索的所有单词并打印该部分。

我试过了，1.首先从给定的句子中获取所有单词的所有位置（索引）。2.然后尝试从这些词的索引中找到最小的部分。

但我在实施第二部分时遇到问题。所以我想要一些建议，或者如果你建议任何其他可以让它快速的算法。

import java.util.*;
import java.io.*;
public class ShotestSubSegment2 
{
static SearchStr[] search;
static String copystr;
public static void main(String s[])
{
try
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        String str = in.readLine();
        copystr = str.substring(0).toLowerCase();
        int k = Integer.parseInt(in.readLine());
        search = new SearchStr[k];
        for(int i=0;i<k;i++)
        {
            search[i] = new SearchStr(in.readLine().toLowerCase());
            getIndicesOf(search[i]);
            if(search[i].noOfElements()==0)
            {
                System.out.println("No Segments Found");
                return;
            }
        }
        searchSmallestPart();//Dont getting Idea Of this

    }
    catch(Exception x){}
}

public static void getIndicesOf(SearchStr searchS) 
{
    String searchStr = searchS.getName();
    int startIndex = 0, searchStrLen=0;
    int index;
    searchStr = searchStr.toLowerCase();
    searchStrLen = searchStr.length();
    while ((index = copystr.indexOf(searchStr, startIndex)) > -1) 
    {
        searchS.add(index);
        startIndex = index + searchStrLen;
    }
}

}

score 0 · Accepted Answer

临时变量：

bestseq包含当前最佳序列的开始/结束的对象/集合（最初是 eg null）
currently_closestHashMap< word, index >（用适当的类型替换 word 和 index，最初都是特殊值，例如 -1）
current_start, current_end（索引，最初例如 -1）

“算法”：

遍历字符串中的单词
如果当前单词是单词，则将当前索引存储在 current_closest[word] 中，调整 current_start 和 current_end 以反映 current_closest 中的新最大和最小键
if ( current_end-current_start<bestseq.end-bestseq.startor bestseqis eg null) 并且所有单词都有非特殊索引（即不是-1） set => set bestseq to current_start, current_end-sequence

我想这应该在 O(length_of_sentence * number_of_words) 时间内运行。

score 0 · Accepted Answer

使用此类：

class FoundToken  {
   int start;
   end start;
   String word;
   int endOfCompleteSequence;
}

1）将所有找到的令牌与开始索引和结束索引一起存储在列表中

2) 对于这些列表项中的每一个，获取从以下标记（在列表中）构建并包含所有所需标记的第一个完整序列

3) 取最短的那些序列（基于endOfCompleteSequence-start).

score 0 · Accepted Answer

这是我的算法。可能有点外行，但这是我想出的最基本的方法。

输入后，循环抛出单词并检查哪些与列出的单词匹配。继续使用将存储列出的单词的数组。
一旦找到匹配，标记该位置并从该位置开始另一次扫描并检查匹配。并行从列表中删除匹配的单词并检查直到找到列表中的所有单词。直到找到下一个单词，将所有单词（也包括中间单词）添加到一个字符串中。这个特定的循环一直持续到列出的单词数组的所有元素都为空。
最里面的扫描完成，存储字符串，因此在另一个数组中找到（比如字符串 sol_array）。并继续前一个循环。（前一个循环运行（original_string.length（）-listed_word_array.length）次）
最外层循环完成后，运行 sol_array 扫描并检查最小字符串的长度，该字符串就是答案。

score 0 · Accepted Answer

将每个单词的出现位置存储到列表中。

word1 - 找到 word1 的位置的 list1 word2 - 找到 word2 的位置的 list2 ...

您必须最小化（Pend-Pstart），其中 Pstart 是所有单词的有效位置组合的位置列表中的最小位置，而 Pend 是最大的位置。要为在文本中找到的所有单词生成组合，请使用回溯。

我希望我说清楚了。

java - 找到最小的子段

4 回答 4

Related

Reference