9

我想获取字符串中某个位置周围的单词。例如两个词之后和两个词之前。

例如考虑字符串:

String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";

for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1))
{
    System.out.println(index);
}

这会写出单词“I”所在位置的索引。但我希望能够获得围绕这些位置的单词的子字符串。

我希望能够打印出“约翰和我喜欢”和“远足我有两个”。

不仅应该能够选择单个单词串。搜索“John and”将返回“name is John and I like”。

有没有任何巧妙、聪明的方法来做到这一点?

4

5 回答 5

11

一个字:

您可以使用String'ssplit()方法来实现。这个解决方案是O(n)

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and "+
                         "hiking I have two sisters and one brother.";
    String find = "I";

    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        if (sp[i].equals(find)) {
            // have to check for ArrayIndexOutOfBoundsException
            String surr = (i-2 > 0 ? sp[i-2]+" " : "") +
                          (i-1 > 0 ? sp[i-1]+" " : "") +
                          sp[i] +
                          (i+1 < sp.length ? " "+sp[i+1] : "") +
                          (i+2 < sp.length ? " "+sp[i+2] : "");
            System.out.println(surr);
        }
    }
}

输出:

John and I like to
and hiking I have two

多字:

正则表达式是一个很好的、干净的解决方案,用于处理find多词的情况。但是,由于其性质,它会忽略周围单词也匹配find的情况(请参见下面的示例)。

下面的算法处理所有情况(所有解决方案的空间)。请记住,由于问题的性质,这个解决方案在最坏的情况下是O(n*m) (with nbeing str's length 和mbeing find's length)

public static void main(String[] args) {
    String str = "Hello my name is John and John and I like to go...";
    String find = "John and";

    String[] sp = str.split(" +"); // "+" for multiple spaces

    String[] spMulti = find.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        int j = 0;
        while (j < spMulti.length && i+j < sp.length 
                                  && sp[i+j].equals(spMulti[j])) {
            j++;
        }           
        if (j == spMulti.length) { // found spMulti entirely
            StringBuilder surr = new StringBuilder();
            if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); }
            if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); }
            for (int k = 0; k < spMulti.length; k++) {
                if (k > 0){ surr.append(" "); }
                surr.append(sp[i+k]);
            }
            if (i+spMulti.length < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length]);
            }
            if (i+spMulti.length+1 < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length+1]);
            }
            System.out.println(surr.toString());
        }
    }
}

输出:

name is John and John and
John and John and I like
于 2013-05-05T19:10:20.813 回答
2

这是我发现使用正则表达式的另一种方法:

        String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";

        String find = "I";

        Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)");
        Matcher matcher = pattern.matcher(str);

        while (matcher.find())
        {
            System.out.println(matcher.group(1));
            System.out.println(matcher.group(2));
        }

输出:

John and
like to 
and hiking
have two 
于 2013-05-05T19:24:00.893 回答
1

使用 String.split() 将文本拆分为单词。然后搜索“I”并将这些单词重新连接在一起:

String[] parts=str.split(" ");

for (int i=0; i< parts.length; i++){
   if(parts[i].equals("I")){
     String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc..
   }
}

当然你需要检查 i-2 是否是一个有效的索引,如果你有很多数据,使用 StringBuffer 会很方便。

于 2013-05-05T19:12:57.250 回答
1
// Convert sentence to ArrayList
String[] stringArray = sentence.split(" ");
List<String> stringList = Arrays.asList(stringArray);

// Which word should be matched?
String toMatch = "I";

// How much words before and after do you want?
int before = 2;
int after = 2;

for (int i = 0; i < stringList.size(); ++i) {
    if (toMatch.equals(stringList.get(i))) {
        int index = i;
        if (0 <= index - before && index + after <= stringList.size()) {
            StringBuilder sb = new StringBuilder();

            for (int i = index - before; i <= index + after; ++i) {
                sb.append(stringList.get(i));
                sb.append(" ");
            }
            String result = sb.toString().trim();
            //Do something with result
        }
    }
}

这会提取匹配前后的两个单词。可以扩展为在前后最多打印两个单词,而不是恰好两个单词。

编辑该死的..减慢和没有花哨的三元运算符的方法:/

于 2013-05-05T19:21:16.830 回答
0
public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";
    String find = "I";
    int countWords = 3;
    List<String> strings = countWordsBeforeAndAfter(str, find, countWords);
    strings.stream().forEach(System.out::println);
}

public static List<String> countWordsBeforeAndAfter(String paragraph, String search, int countWordsBeforeAndAfter){
    List<String> searchList = new ArrayList<>();
    String str = paragraph;
    String find = search;
    int countWords = countWordsBeforeAndAfter;
    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 0; i < sp.length; i++) {
        if (sp[i].equals(find)) {

            String before = "";
            for (int j = countWords; j > 0; j--) {
                if(i-j >= 0) before += sp[i-j]+" ";
            }

            String after = "";
            for (int j = 1; j <= countWords; j++) {
                if(i+j < sp.length) after += " " + sp[i+j];
            }
            String searhResult = before + find + after;
           searchList.add(searhResult);
        }
    }
    return searchList;
}
于 2016-09-05T09:23:17.940 回答