1

例如:如果给出一个句子:
My name is not eugene. my pet name is not eugene.
并且我们必须搜索包含给定单词 myeugene的句子中最小的部分, 那么答案将是 eugene. my
无需检查大写或小写或特殊字符或数字。
我已经粘贴了我的代码,但是对于某些测试用例得到了错误的答案。

任何人都可以知道代码有什么问题。我没有错误的测试用例。

import java.io.*;
import java.util.*;
public class ShortestSegment 
{
static String[] pas;
static String[] words;
static int k,st,en,fst,fen,match,d;
static boolean found=false;
static int[] loc;
static boolean[] matches ;
public static void main(String s[]) throws IOException
{
    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
    pas = in.readLine().replaceAll("[^A-Za-z ]", "").split(" ");
    k = Integer.parseInt(in.readLine());
    words = new String[k];
    matches = new boolean[k];
    loc = new int[k];
    for(int i=0;i<k;i++)
    {
        words[i] = in.readLine();
    }
    en = fen = pas.length;
    find(0);
    if(found==false)
    System.out.println("NO SUBSEGMENT FOUND");
    else
    {
        for(int j=fst;j<=fen;j++)
            System.out.print(pas[j]+" ");
    }

}
private static void find(int min)
{
    if(min==pas.length)
        return;
    for(int i=0;i<k;i++)
    {
        if(pas[min].equalsIgnoreCase(words[i]))
        {
            if(matches[i]==false)
            {
                loc[i]=min;
                matches[i] =true;
                match++;
            }
            else
            {
                    loc[i]=min;
            }
            if(match==k)
            {
                en=min;
                st = min();
                found=true;
                if((fen-fst)>(en-st))
                {
                    fen=en;
                    fst=st;
                }
                match--;
                matches[getIdx()]=false;
            }
        }
    }
    find(min+1);
}
private static int getIdx()
{
    for(int i=0;i<k;i++)
    {
        if(words[i].equalsIgnoreCase(pas[st]))
            return i;
    }
    return -1;
}
private static int min()
{
    int min=loc[0];
    for(int i=1;i<loc.length;i++)
        if(min>loc[i])
            min=loc[i];
    return min;
}


}
4

4 回答 4

0

您给出的代码将为以下输入产生不正确的输出。我假设,当您想要“查找包含给定单词的句子的最短部分”时,单词长度也很重要

字符串:'我的名字是尤金。我的名字是尤金。
搜索字符串数:2
string1: 'my'
string2: 'is'
你的解决方案是:'My firstname is'
正确答案是:'My fn is'

您的代码中的问题是,它认为 'firstname' 和 'fn' 的长度相同。在比较中(fen-fst)>(en-st),您只考虑字数是否最小化,而不是字长是否缩短。

于 2012-07-02T21:17:52.707 回答
0

以下代码(junit):

@Test
public void testIt() {
    final String s = "My name is not eugene. my pet name is not eugene.";
    final String tmp = s.toLowerCase().replaceAll("[^a-zA-Z]", " ");//here we need the placeholder (blank)
    final String w1 = "my "; // leave a blank at the end to avoid those words e.g. "myself", "myth"..
    final String w2 = "eugene ";//same as above
    final List<Integer> l1 = getList(tmp, w1); //indexes list
    final List<Integer> l2 = getList(tmp, w2);
    int min = Integer.MAX_VALUE;
    final int[] idx = new int[] { 0, 0 };

    //loop to find out the result
    for (final int i : l1) {
        for (final int j : l2) {
            if (Math.abs(j - i) < min) {
                final int x = j - i;
                min = Math.abs(j - i);
                idx[0] = j - i > 0 ? i : j;
                idx[1] = j - i > 0 ? j + w2.length() + 2 : i + w1.length() + 2;
            }
        }

    }

    System.out.println("indexes: " + Arrays.toString(idx));
    System.out.println("result: " + s.substring(idx[0], idx[1]));
}

private List<Integer> getList(final String input, final String search) {
    String t = new String(input);
    final List<Integer> list = new ArrayList<Integer>();
    int tmp = 0;
    while (t.length() > 0) {
        final int x = t.indexOf(search);

        if (x < 0 || x > t.length()) {
            break;
        }
        tmp += x;
        list.add(tmp);
        t = t.substring(search.length() + x);

    }
    return list;

}

给出输出:

indexes: [15, 25]
result: eugene. my

我认为带有内联注释的代码很容易理解。基本上,玩索引+字长。

笔记

  • “未找到”案例未实施。
  • 代码只是展示这个想法,它可以被优化。例如,至少可以保存一个 abs()。ETC...

希望能帮助到你。

于 2012-07-02T22:44:01.000 回答
0

我认为可以换一种方式处理:首先找到一个匹配结果,并最小化与当前结果的绑定,然后从当前结果中找到一个匹配结果。可以编码如下:

/**This method intends to check the shortest interval between two words
 * @param s : the string to be processed at
 * @param first : one of the words
 * @param second : one of the words
 */
public static void getShortestInterval(String s , String first , String second)
{
    String situationOne = first + "(.*?)" + second;
    String situationTwo = second + "(.*?)" + first;

    Pattern patternOne = Pattern.compile(situationOne,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
    Pattern patternTwo = Pattern.compile(situationTwo,Pattern.DOTALL|Pattern.CASE_INSENSITIVE);

    List<Integer> result = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
    /**first , test the first choice*/
    Matcher matcherOne = patternOne.matcher(s);
    findTheMax(first.length(),matcherOne, result);
    /**then , test the second choice*/
    Matcher matcherTwo = patternTwo.matcher(s);
    findTheMax(second.length(),matcherTwo,result);

    if(result.get(0)!=Integer.MAX_VALUE)
    {
        System.out.println("The shortest length is " + result.get(0));
        System.out.println("Which start @ " + result.get(1));
        System.out.println("And end @ " + result.get(2));
    }else
        System.out.println("No matching result is found!");
}

private static void findTheMax(int headLength , Matcher matcher , List<Integer> result) 
{
    int length = result.get(0);
    int startIndex = result.get(1);
    int endIndex = result.get(2);

    while(matcher.find())
    {
        int temp = matcher.group(1).length();
        int start = matcher.start();
        List<Integer> minimize = new ArrayList<Integer>(Arrays.asList(Integer.MAX_VALUE,-1,-1));
        System.out.println(matcher.group().substring(headLength));
        findTheMax(headLength, matcher.pattern().matcher(matcher.group().substring(headLength)), minimize);
        if(minimize.get(0) != Integer.MAX_VALUE)
        {
            start = start + minimize.get(1) + headLength;
            temp = minimize.get(0);
        }

        if(temp<length)
        {
            length = temp;
            startIndex = start;
            endIndex = matcher.end();
        }
    }

    result.set(0, length);
    result.set(1, startIndex);
    result.set(2, endIndex);
}

请注意,这可以处理两种情况,无论两个单词的顺序如何!

于 2012-07-03T13:57:02.447 回答
0

您可以使用Knuth Morris Pratt算法来查找文本中每个给定单词的所有出现的索引。假设您有长度为 N 和 M 个单词的文本(w1 ... wM)。使用KMP算法,您可以获得数组:

occur = string[N];
occur[i] = 1, if w1 starts at position i
...
occur[i] = M, if wM starts at position i
occur[i] = 0, if no word from w1...wM starts at position i

你遍历这个数组并从每个非零位置向前搜索其他 M-1 个单词。

这是近似的伪代码。只是为了理解这个想法。如果你只是在java上重新编码它肯定不会工作:

for i=0 to N-1 {
 if occur[i] != 0 {
  for j = i + w[occur[i] - 1].length - 1 { // searching forward
   if occur[j] != 0 and !foundWords.contains(occur[j]) {
    foundWords.add(occur[j]);
    lastWordInd = j;
    if foundWords.containAllWords() break;
   }
   foundTextPeaceLen = j + w[occur[lastWordInd]].length - i;
   if foundTextPeaceLen < minTextPeaceLen {
    minTextPeaceLen = foundTextPeaceLen;
    // also remember start and end indexes of text peace
   }
  }
 }
}
于 2012-10-08T10:45:41.650 回答